[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Curious coincidence
Hi,
Over the past few weeks I have been counting VMS beans of various
shapes and colors, extracted from the almost complete, not-so-bad,
majority-vote transcription in EVA.
I just noticed a curious coincidence:
total *occurrences* of words (tokens) with
0 gallows .... 17363 (49.4%)
1 gallows .... 17443 (49.6%)
2 gallows .... 323 (0.9%)
3 gallows .... 3
These numbers look more suspicious than the elections in Peru. 8-)
Many (if not all) of the 2- and 3-gallows words are probably due to
omission of word spaces by the transcribers. Other data errors may
have injected a few percent of noise in these figures.
Still, the coincidence is intriguing. It seems safe to assume that a
"correct" Voynichese word can have at most one gallows; so we have
almost exact 50-50 split between 0-g and 1-g words.
Maybe this is merely an amazing linguistic coincidence. Perhaps the
presence of gallows indicates an independent binary phonetic
attribute (say, voiced vs. unvoiced, high/low register, front/back); and
Voynichese happens to be an extremely efficient language, that makes
full use of that available bit.
Or could this be something else? Three possibilities that I can
think of:
* Voynichese "words" are actually keys into a codebook-style cipher,
encoded in a notation resembling Roman numerals (only more complicated);
* Voynichese is a complex "randomizing" code à la Vigenère,
where the encrypted numeric text is further scrambled
with a second, complicated encoding responsible for the peculiar
word structure;
* Voynichese "words" are generated, at least in part, by throwing
dice; and the gallows belong to the random part.
In all these scenarios, the presence/absence of gallows would be a
low-order bit in the encoding. That would explain the precise 50-50
split ---- in spite of the fact that the VMS word frequencies are as
irregular as those of any natural language
Comments, anyone?
All the best,
--stolfi
PS. I hope to post a summary of my bean-counting over the
weekend.