[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: word database and binomial distribution



That took a while to get through my skull. I hope it got through. Losing a few tokens or even one token might mean losing a word. Maybe losing part of the book could be the cause of the long tail of big words (?) in addition to seldom used names of plants, places, demons, etc.

Regards,

Knox



Jorge Stolfi wrote:

> [Knox:] Opps -- you said mutate to another same length word. I was
> thinking of losing a few words. If I understand it, we are only
> looking at word lengths so still would have no effect, so I think.


Well, consider a codebook that uses Roman numerals.

A single-letter mutation XLVII->XLLII would create a new word that did
not exist in the original text. On the other hand XLVII->XLIII could
destroy the only instance of XLVII in the text (and many words do
occur only once, per Zipf). So a single letter mutation may increment,
decrement, or preserve the WLD.

The same argument applies to the "binomial" code proposed in my page.
Namely, the codewords consist of a marker "#" followed by some subset
of the digits 1-9, in increasing order. Then the mutation #1278 ->
#1378 may decrement the WLD, #1278 -> #1298 may increment it.

All the best,

--stolfi
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list