[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Detecting "hands" automatically
Hi Bruce,
At 14:41 18/12/2003 -0500, Bruce Grant wrote:
I have been reading about algorithms for grouping points into clusters
with similar characteristics. I was curious whether an algorithm like this
could detect the difference between A and B hands in the VMS based on the
relative letter frequencies. After a test, it appears that it can do so
pretty well.
Excellent! BTW, which were the (possibly anomalous) 7 "Hand B" pages which
your algorithm thought were in Cluster 1? Any commonalities between these
might point to a deeper pattern... :-)
I'd also be interested to know what would happen if you recursively passed
it each set it emits, to form a binary tree (a B-tree). Even the topmost
results from the tree (ie, what are the topmost sub-clusters for each of
your first-pass Cluster 1 and Cluster 2?) would be interesting too. :-)
Finally (on my ever-expanding wish-list), as you've got the K-means process
up and running it might also be revealing to apply it to a de-pairified
transcription, where [for example] "qo", "dy", "ol" and "or" (and possibly
"eo" as well?) are each converted into new tokens. My strong suspicion is
that, because of the ubiquity of these pairs in the text, these comprised a
"back-end coder", applied as a final stage - and that therefore many
statistical tests might give more reliable results if applied to
de-pairifed text-streams (ie to a real alphabet and not to a fake alphabet).
Cheers, .....Nick Pelling.....
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list