[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: On the word length distribution



Stolfi also wrote:

> Well, on second thoughts, the binomial distribution of word
> lengths is a bit less remarkable than what I thought. It will
> be observed in any code or spelling system that has the
> following properties:

> (1) each word has nine distinguished slots;

>  (2) each slot can be either empty, or filled with one
> different symbol;

> (3) all possible choices in (2) result in distinct words.

> (4) all possible choices in (2) do occur in the text.

>  Note that we need no assumptions on probabilities,
> only on possibilities.

I haven't been able to give this too much thought yet, but
shouldn't there be some constraint on the  probabilities
in order to have the peak in the middle? Also, I think that in
(2) it should be allowed to have various different symbols
in each slot (e.g. <empty>, Eva-ch or Eva-sh).

Also, having fewer slots, where some can contain 0, 1 or
2 letters could result in a binomial ditribution, so my doubt
about the dependence of initial 4- on following -O- is
unfounded.

Indeed, I would not at all be surprised if the VMs contained
nothing but numbers. Numbers would make a lot of sense
for the labels near the zodiac nymphs, and these do fit in
the standard word paradigms.

Furthermore, having a binomial word length distribution
but not a binomial token length distribution is completely
logical if the text is a word for word encoding of some
plaintext.

Cheers, Rene