[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMS words and Roman numerals
On 29 Dec 2000, at 1:24, Jorge Stolfi wrote:
> So the only plausible explanation for the symmetry of the WLD, I
> think, is that the number H_k of *possible* words of length k is
> indeed close to C*binom(9,k-1), for some constant C.
Even I have not contributed anything at all in this thread, I have
been enjoying it very much.
I agree with John G, this seems a complicated (which is of course
no reason for rejecting it) way of generating a dictionary.
The discussion seems to be directed in finding a way to fit a model
to the data, but what is is desperately missing is some evidence
that this sort of distribution does not happen naturally in another
language. If one can come with this sort of evidence, then the
code/nomenclator/etc. theories would gain some ground.
Note that the stats are very much constrained, as Jorge pointed
out, by the alphabet used. In EVA, the word length distribution (and
the token as well) has a tail extending to the longer words. This
seems to be the case in English and Latin, although I have no idea
whether these fit a binomial model too.
Another item. If this was a numerical nomenclator (words are
numbers), perhaps the easiest way of "naming" (numbering) the
words would be to increase the count each time you create a new
But then one has to sort them by the plain text (!) to keep writing,
and sort them on their nomenclator entry to read it. So unless this
is done sequentially, I guess that it would take a fair bit of time...
Moreover, if it is done sequentially, then one would be expecting for
all (?) the numbers from 1 to 6000, and we know that -at least in
EVA- we run out of roman numbers very quickly.