[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: provable?



On Wednesday 29 September 2004 15:49, Marke Fincher wrote:
> RE: whopping great chains (=63!!!)Consider two documents.   They are both
> created by piecing together 'chunks' selected from a small underlying text.    
> In one case it is an English document and the chunks are English words
> taken from an English dictionary.  In the  other  case the chunks are
> selected randomly with a window moving over some 
> master page.
> In the first case the choice of selection is dictated by an 
> intended  meaning, and in the second it is random. 
> But how do you tell the difference?

Like Elmar, I have the feeling that 1 or 2 occurrences of a substring in a 
corpus of the size of the vms may ot particularly mean much. This accounts 
for about 3-4 words on a row.

But this still does not answer my question: when you starting describing the 
presence of these sub-strings, what were you expecting to find? That they 
were too common or too rare?

> P.P.S   I think it is vital to bear in mind at all times that over 6000 of
> the 8700 VMs words occur only once in the whole manuscript.

No mystery there... This is what one expects for the vast majority of other 
languages as a consequence of Zipf's law (in my reading of the vms there are 
5691 words that appear once).

Cheers,

Gabriel
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list