[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMS words and Roman numerals



On 29 Dec 2000, at 1:24, Jorge Stolfi wrote:
> So the only plausible explanation for the symmetry of the WLD, I
> think, is that the number H_k of *possible* words of length k is
> indeed close to C*binom(9,k-1), for some constant C.

Hi all,

Even I have not contributed anything at all in this thread, I have 
been enjoying it very much.

I agree with John G, this seems a complicated (which is of course 
no reason for rejecting it) way of generating a dictionary.
The discussion seems to be directed in finding a way to fit a model 
to the data, but what is is desperately missing is some evidence 
that this sort of distribution does not happen naturally in another 
language. If one can come with this sort of evidence, then the 
code/nomenclator/etc. theories would gain some ground.

Note that the stats are very much constrained, as Jorge pointed 
out, by the alphabet used. In EVA, the word length distribution (and 
the token as well) has a tail extending to the longer words. This 
seems to be the case in English and Latin, although I have no idea 
whether these fit a binomial model too.

Another item. If this was a numerical nomenclator (words are 
numbers), perhaps the easiest way of "naming" (numbering) the 
words would be to increase the count each time you create a new 
word.
But then one has to sort them by the plain text (!) to keep writing, 
and sort them on their nomenclator entry to read it. So unless this 
is done sequentially, I guess that it would take a fair bit of time...

Moreover, if it is done sequentially, then one would be expecting for 
all (?) the numbers from 1 to 6000, and we know that -at least in 
EVA- we run out of roman numbers very quickly.

Regards,
Gabriel