[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMS, (very) remotely: a question.
For a while I had been searching my grey cells for
an algorithm capable of distinguishing homonyms
(e.g. "plant" as in "trees and grasses" versus
"plant" as in "industry and factories").
Inspiration came two days ago, and it looks like
there will be very little perspiration. However,
before start developing it, I wonder if I am not
reinventing the wheel (to _my_ knowledge, I am not,
by note the _my_). The algorithm is just a
clustering algorithm which allows one point to
belong to more than one cluster at once.
Now for my question:
Does anyone here know of such an algorithm?
What I have in mind (I have not even written it
down yet) will be very fast. I used to be worried
about overfitting the data, but that is taken care
of now.
And no, it is not applicable to the VMS. It would be
if we knew where the words begin and end. The
old problem of segmenting continuous text. I had
a few ideas too, but I want the solution to that to
be applicable to bitmaps of texts. A nice problem is:
given a picture of the Phaistos disk, how do you
identify its 45 different signs? (The problem is nice
because the writing is in a spiral)
Back to my thinking cap.