[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: On the word length distribution
Stolfi also wrote:
> Well, on second thoughts, the binomial distribution of word
> lengths is a bit less remarkable than what I thought. It will
> be observed in any code or spelling system that has the
> following properties:
> (1) each word has nine distinguished slots;
> (2) each slot can be either empty, or filled with one
> different symbol;
> (3) all possible choices in (2) result in distinct words.
> (4) all possible choices in (2) do occur in the text.
> Note that we need no assumptions on probabilities,
> only on possibilities.
I haven't been able to give this too much thought yet, but
shouldn't there be some constraint on the probabilities
in order to have the peak in the middle? Also, I think that in
(2) it should be allowed to have various different symbols
in each slot (e.g. <empty>, Eva-ch or Eva-sh).
Also, having fewer slots, where some can contain 0, 1 or
2 letters could result in a binomial ditribution, so my doubt
about the dependence of initial 4- on following -O- is
Indeed, I would not at all be surprised if the VMs contained
nothing but numbers. Numbers would make a lot of sense
for the labels near the zodiac nymphs, and these do fit in
the standard word paradigms.
Furthermore, having a binomial word length distribution
but not a binomial token length distribution is completely
logical if the text is a word for word encoding of some