[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Detecting "hands" automatically



Hello Bruce,

you wrote:

> I have been reading about algorithms for grouping
> points into clusters with similar characteristics.
> I was curious whether an algorithm like 
> this could detect the difference between A and B
> hands in the VMS based on the relative letter 
> frequencies. 

We tended to call A and B 'languages' and '1'
and '2' the 'hands', but that's just nitpicking ...

> Using a version of the interlinear VMS 
> transcription, 

Can you tell me which file you used? This is
very important.

>  the algorithm I used (called "K-means") classified
> 145 pages identified  as hand A or B as follows:

Did you have only 145 pages, or did you select 
only 145 pages?

> For the K-means algorithm, you start by chosing the
> number of clusters 
> you are looking for (2 in this test) and choosing
> that many points as 
> first guesses for the centers of the clusters.
> (Typically you just use 
> the first N points in the list. as I did.)
> 
> Then, you repeatedly do the following steps, until
> cluster assignments 
> don't change anymore:
> 1.    Assign each point (page) to the cluster whose
> center it is nearest to.
> 2.    For each cluster, re-estimate its center point
> as the average of 
> all the points in the cluster.

Does the algorithm tell you in the end whether the
number (2) was a good choice or not?

Cheers, Rene

__________________________________
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list