[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: provable?
On Wednesday 29 September 2004 15:49, Marke Fincher wrote:
> RE: whopping great chains (=63!!!)Consider two documents. They are both
> created by piecing together 'chunks' selected from a small underlying text.
> In one case it is an English document and the chunks are English words
> taken from an English dictionary. In the other case the chunks are
> selected randomly with a window moving over some
> master page.
> In the first case the choice of selection is dictated by an
> intended meaning, and in the second it is random.
> But how do you tell the difference?
Like Elmar, I have the feeling that 1 or 2 occurrences of a substring in a
corpus of the size of the vms may ot particularly mean much. This accounts
for about 3-4 words on a row.
But this still does not answer my question: when you starting describing the
presence of these sub-strings, what were you expecting to find? That they
were too common or too rare?
> P.P.S I think it is vital to bear in mind at all times that over 6000 of
> the 8700 VMs words occur only once in the whole manuscript.
No mystery there... This is what one expects for the vast majority of other
languages as a consequence of Zipf's law (in my reading of the vms there are
5691 words that appear once).
Cheers,
Gabriel
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list