[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Curious coincidence



Maybe the gallows letters indicate the real word breaks. Maybe that's why lots
of paragraphs start with a gallows letter - everything up to the next gallows
letter could be a "real" word. (I'm not sure what a double gallows letter would
mean, or gallows letters inside others - maybe elided letters.)

To speculate further, the fact that lots of words end with the same few
characters may be because, as in Arabic, certain letters are not allowed to join
the following letter. What appear to be word gaps would just be do to the ramdom
choice of letters in the text.

This could also account for the high frequency (or so it seems) of very short
words, when two of these "final" letters come together..

Bruce Grant

Jorge Stolfi wrote:

> Hi,
>
> Over the past few weeks I have been counting VMS beans of various
> shapes and colors, extracted from the almost complete, not-so-bad,
> majority-vote transcription in EVA.
>
> I just noticed a curious coincidence:
>
>   total *occurrences* of words (tokens) with
>
>      0 gallows .... 17363  (49.4%)
>      1 gallows .... 17443  (49.6%)
>      2 gallows ....   323   (0.9%)
>      3 gallows ....     3
>
> These numbers look more suspicious than the elections in Peru. 8-)
>
> Many (if not all) of the 2- and 3-gallows words are probably due to
> omission of word spaces by the transcribers. Other data errors may
> have injected a few percent of noise in these figures.
>
> Still, the coincidence is intriguing. It seems safe to assume that a
> "correct" Voynichese word can have at most one gallows; so we have
> almost exact 50-50 split between 0-g and 1-g words.
>
> Maybe this is merely an amazing linguistic coincidence. Perhaps the
> presence of gallows indicates an independent binary phonetic
> attribute (say, voiced vs. unvoiced, high/low register, front/back); and
> Voynichese happens to be an extremely efficient language, that makes
> full use of that available bit.
>
> Or could this be something else?  Three possibilities that I can
> think of:
>
>   * Voynichese "words" are actually keys into a codebook-style cipher,
>     encoded in a notation resembling Roman numerals (only more complicated);
>
>   * Voynichese is a complex "randomizing" code à la Vigenère,
>     where the encrypted numeric text is further scrambled
>     with a second, complicated encoding responsible for the peculiar
>     word structure;
>
>   * Voynichese "words" are generated, at least in part, by throwing
>     dice; and the gallows belong to the random part.
>
> In all these scenarios, the presence/absence of gallows would be a
> low-order bit in the encoding. That would explain the precise 50-50
> split ---- in spite of the fact that the VMS word frequencies are as
> irregular as those of any natural language
>
> Comments, anyone?
>
> All the best,
>
> --stolfi
>
> PS. I hope to post a summary of my bean-counting over the
> weekend.