[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Curious coincidence

Stolfi wrote:

>   total *occurrences* of words (tokens) with
>      0 gallows .... 17363  (49.4%)
>      1 gallows .... 17443  (49.6%)
> [...] the coincidence is intriguing. It seems safe to assume that a
> "correct" Voynichese word can have at most one gallows; so we have
> almost exact 50-50 split between 0-g and 1-g words.
> Maybe this is merely an amazing linguistic coincidence.

If there really is a 50% chance of having a gallows or not,
how close are the numbers allowed to be? A difference of 
80 seems almost too small. 

By the way, I presume that 'gallows' also include the pedestalled
Does your count include the labels (and other non-flowing text)?

I am reminded of an equally striking coincidence related to the
gallows. I once produced some character statistics of 25 Herbal-A
and 25 Herbal-B pages which are collocated in the MS. The nr.
of EVA-k and EVA-t was the same three times and for the fourth
count it was either twice or half that number. If only I could
find those counts again... I'll post them if I do.

> Or could this be something else?  Three possibilities that I can
> think of:
>   * Voynichese "words" are actually keys into a codebook-style cipher,
>     encoded in a notation resembling Roman numerals (only more complicated);
>   * Voynichese is a complex "randomizing" code à la Vigenère,
>     where the encrypted numeric text is further scrambled
>     with a second, complicated encoding responsible for the peculiar
>     word structure;
>   * Voynichese "words" are generated, at least in part, by throwing
>     dice; and the gallows belong to the random part.
> In all these scenarios, the presence/absence of gallows would be a
> low-order bit in the encoding. That would explain the precise 50-50
> split ---- in spite of the fact that the VMS word frequencies are as
> irregular as those of any natural language

That is very difficult to imagine. The three options above don't really
explain why it should be 50/50 and not, say, 40/60, unless you go to
some kind of binary encoding, as you suggest also.
It would have to mean also, that each word is 'constructed'. Assuming
for the moment a word-by-word (or by syllable) translation of some 
source text, then whether or not a gallows appears depends on some
property of the original word.
A 50% chance could appear in many circumstances, e.g. depending on the
number of characters in the original word (odd/even), stress on odd
or even syllable, etc, etc. (This will not always lead to 50% chance

> Comments, anyone?

There could be more fundamental ways that cause this, e.g. if every word
with (or without) a gallows is a dummy word and the writer made
sure that there was one dummy word for each real word.
The gallows could be dummy letters themselves and the writer made sure

In any case, this is a truly surprising statistic which requires a
look. It feels like another crack that is just waiting for a wedge.

Well spotted,