[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: VMs: VMS Word context similarities
My context algorithm is single-sided; i.e. it compares
the set of words which follow wordA to those which
follow wordB and produces a score representing the
degree of intersection.
It is "soft" in that I use an arbitrary score threshold
to decide that a significant link exists between two
words. Then groups are formed using the links that
were considered significant, and this does allow a
word to be attached to more than one group.
It does require a relatively large input text to produce
focused results, so it may not be feasible to run it on
just the herbal-A section, although I will give it a go.
It is interesting that "daiin" doesn't figure in the
output from the VMS run. I tried reducing the threshold
of significance until it did appear, and it looks most
closely allied to the group:
(ol,chol,chedy,shedy,qokeey,qokeedy,qokedy)
...although with this threshold value there are >2000
significant links, and when combined they tend to form
one big group, so this may not be valid.
The '!' char is in the EVA transcription I'm using;
the explanation is:
# The "!" filler is used for all other purposes. It denotes a
# character or word break that was either skipped or lost, or that
# (according to the transcriber) does not exist. In particular "!" is
# generally used where other versions have {}-comments.
It is a valid question to ask, what happens if you run
this program on a file in which we know there is little
or no contextual relationship between tokens; to what
extent is it possible to find links that are not there?
I will try this too.
I have to admit though, the thing that bugs me about this
approach is that I'm not sure I really believe that
VMS-words are truly words. The vocabulary size seems too
small; the common words are too common and the others
too rare, especially when you start contemplating the
existence of nulls, or noise, or multiple languages,
and all that jazz.
Cheers,
Marke
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list