[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: VMs: VMS Word context similarities



My context algorithm is single-sided; i.e. it compares
the set of words which follow wordA to those which
follow wordB and produces a score representing the
degree of intersection.   

It is "soft" in that I use an arbitrary score threshold 
to decide that a significant link exists between two 
words.   Then groups are formed using the links that 
were considered significant, and this does allow a 
word to be attached to more than one group.

It does require a relatively large input text to produce 
focused results, so it may not be feasible to run it on
just the herbal-A section, although I will give it a go. 

It is interesting that "daiin" doesn't figure in the 
output from the VMS run.   I tried reducing the threshold
of significance until it did appear, and it looks most
closely allied to the group:
 (ol,chol,chedy,shedy,qokeey,qokeedy,qokedy)

...although with this threshold value there are >2000 
significant links, and when combined they tend to form
one big group, so this may not be valid.

The '!' char is in the EVA transcription I'm using;
the explanation is:
# The "!" filler is used for all other purposes. It denotes a
# character or word break that was either skipped or lost, or that
# (according to the transcriber) does not exist. In particular "!" is
# generally used where other versions have {}-comments.

It is a valid question to ask, what happens if you run
this program on a file in which we know there is little
or no contextual relationship between tokens; to what 
extent is it possible to find links that are not there?
I will try this too.

I have to admit though, the thing that bugs me about this 
approach is that I'm not sure I really believe that 
VMS-words are truly words.   The vocabulary size seems too 
small; the common words are too common and the others 
too rare, especially when you start contemplating the 
existence of nulls, or noise, or multiple languages, 
and all that jazz.

Cheers,
Marke
 


 
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list