[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: VMS Word context similarities



On Thursday 08 September 2005 09:58, Marke Fincher wrote:
> It is a valid question to ask, what happens if you run
> this program on a file in which we know there is little
> or no contextual relationship between tokens;

Yes, one could try scambling the words in a known text and see what comes out.
Also one can scramble the vms words and see if you get any differences in the 
thresholds needed to find any groupings. (which is what one would expect if 
there is any structure).

> I have to admit though, the thing that bugs me about this
> approach is that I'm not sure I really believe that
> VMS-words are truly words.

Yes, that is a problem.

> The vocabulary size seems too  small; 
> the common words are too common and the others  
> too rare, 

Hm... Zipf's law tells us that it is not too small for the size of the vms and 
that the proportions of words are about what one would expect from a natural 
language. 
What size were you expecting?

Cheers,

G.
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list