Re: VMs: Word Length Distribution

Hi Knox,

At 18:07 12/04/2004 -0500, Knox wrote:
Maybe this has been exhausted on the list in the past but I will bring it up again anyway. What could account for the binomial distribution of vocabulary words as shown by Jorge Stolfi?


Could (the) elimination of certain (sometimes) unessential parts of speech or letters explain it? (Shorthand, Nick?)

As mentioned before, I strongly suspect that both (word-end) truncation and (word-middle) abbreviation are mechanisms involved in the VMs' "plaintext", which might serve to explain much of the "binomial-like" distribution observed. Essentially, the explanation for the distribution would be that words (somehow) get reduced to (roughly speaking) the shortest string which uniquely identifies them (or, rather, the string that takes the least amount of effort to write) - ie, short[ened]-hand - and that this involved building up (probably unconsciously) some kind of tree-like choice structure.

However, this is still only part of the story: one thing I've meant to do (but never got around to) is to test whether verbose ciphers reduce or enhance this binomial-like distribution effect. Could I suggest you try this out? IMO, the most likely verbose cipher groups are:-
ee, eee
ii, iii, iiii
qo, dy, or, ol, al

Oh, and don't forget to convert EVA to a more obviously glyph-based representation:-
ch, sh, cfh, ckh, cph, cth

Pidgin? Maybe there was a European Pidgin that became obsolete.

Lingua Franca is probably the most famous European pidgin - for example, "Ferengi" is a Lingua Franca loan word, still in use several centuries hence, apparently :-) - though assiduous trawling of the mailing list archive should reveal 8 to 10 others, pulled in from the footnotes and marginalia of European history.

AIUI, there are actually quite a few (though admittedly typically short) Lingua Franca documents out there, many dating from the early Middle Ages.

Cheers, .....Nick Pelling.....

