[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Detecting "hands" automatically

To: vms-list@xxxxxxxxxxx
Subject: Re: VMs: Detecting "hands" automatically
From: Nick Pelling <incoming@xxxxxxxxxxxxxxxxx>
Date: Thu, 18 Dec 2003 20:04:59 +0000
In-reply-to: <3FE202D1.2060304@mail.msen.com>
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx

Hi Bruce,

At 14:41 18/12/2003 -0500, Bruce Grant wrote:

I have been reading about algorithms for grouping points into clusters with similar characteristics. I was curious whether an algorithm like this could detect the difference between A and B hands in the VMS based on the relative letter frequencies. After a test, it appears that it can do so pretty well.

Excellent! BTW, which were the (possibly anomalous) 7 "Hand B" pages which your algorithm thought were in Cluster 1? Any commonalities between these might point to a deeper pattern... :-)

I'd also be interested to know what would happen if you recursively passed it each set it emits, to form a binary tree (a B-tree). Even the topmost results from the tree (ie, what are the topmost sub-clusters for each of your first-pass Cluster 1 and Cluster 2?) would be interesting too. :-)

Finally (on my ever-expanding wish-list), as you've got the K-means process up and running it might also be revealing to apply it to a de-pairified transcription, where [for example] "qo", "dy", "ol" and "or" (and possibly "eo" as well?) are each converted into new tokens. My strong suspicion is that, because of the ubiquity of these pairs in the text, these comprised a "back-end coder", applied as a final stage - and that therefore many statistical tests might give more reliable results if applied to de-pairifed text-streams (ie to a real alphabet and not to a fake alphabet).

Cheers, .....Nick Pelling.....


______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

Follow-Ups:
- Re: VMs: Detecting "hands" automatically
  - From: Bruce Grant

References:
- VMs: Detecting "hands" automatically
  - From: Bruce Grant

Prev by Date: Re: VMs: A comment on Jacques Guy's table
Next by Date: Re: VMs: A comment on Jacques Guy's table
Previous by thread: VMs: Detecting "hands" automatically
Next by thread: Re: VMs: Detecting "hands" automatically
Index(es):
- Date
- Thread