[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Re: Word Length Distribution



Consonant & vowel separation. I have used a paragraph from Jorge Stolfi's
site. Why alphabetically sort it? It looks like it could be repetative
enough already. With a few token
pair substitutions for common vowel end groupings or consonant start groups
it would
dramatically shorten word length.

Ntoe thta fi the dcmleia cdsoe wree ssgndaie to the wrdso ta rndmao
ro ni lphbtclaaeia rdroe, the tknoe lngthe dstrbtniiuio wldou be
frlyai symmtrcleia, nda smlriia to the wrdo lngthe dstrbtniiuio.
nO the throe hnda, fi a nwe cdoe si ssgndaie ni sqnceuee to chea
nwe wrdo thta pprsaea ni smoe plntxtaie, thne the msto cmmnoo wrdso
wlli tnde to hvae shrtroe cdsoe, nda the tknoe lngthe dstrbtniiuio
wlli be bsdiae twrdsoa the lfte --- sa ni fgriue 1 bvaoe

__________________________________________________

Note that if the decimal codes were assigned to the words at random, or in
alphabetical order, the token length distribution would be fairly
symmetrical, and similar to the word length distribution. On the other hand,
if a new code is assigned in sequence to each new word that appears in some
plaintext, then the most common words will tend to have shorter codes, and
the token length distribution will be biased towards the left --- as in
figure 1 above.

----- Original Message -----
From: <knoxmix@xxxxxxxxxxxxx>
To: <vms-list@xxxxxxxxxxx>
Sent: 13 April 2004 00:07
Subject: VMs: Word Length Distribution


> Maybe this has been exhausted on the list in the past but I will
> bring it up again anyway. What could account for the binomial
> distribution of vocabulary words as shown by Jorge Stolfi?
>
> http://www.dcc.unicamp.br/~stolfi/voynich/00-12-21-word-length-distr/
>
> I think this is an acid test for any scheme that someone might
> devise. Does it show that the Currier transcription is pretty close
> to the mark or is it a function of the transcription? I ran a check
> on a long section, almost 8000 tokens, of an unmodified EVA
> transcription that I have been working with and it showed almost
> identical results. It will  be interesting to see whether this holds
> with shorter sections and if not, where and how it breaks. Does it
> vary from one section to another?  Is it consistent with a real
> vocabulary in the writing about certain subjects or with any known
> specific cipher? Who would have such a vocabulary? Could (the)
> elimination of certain (sometimes) unessential parts of speech or
> letters explain it? (Shorthand, Nick?) Pidgin? Maybe there was a
> European Pidgin that became obsolete.
>
> Ciao ......... Knox
>
> ______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
> unsubscribe vms-list

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list