[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Voynich research needs



    > [Seth Morabito:]

    > [1] Either that we're all missing something very important which
    > is right in front of our noses,
    >
    > [2] or that we lack some vital piece of the puzzle without which
    > we are lost,

My feeling is that the answer is somewhere between [1] and [2].

I have long been convinced that the manuscript is written in an
"exotic" (meaning non-European) language, with a peculiar spelling
system that was devised by the author --- probably a "phonetic" spelling
(i.e based on the spoken language, not on its native script.)

Until recently I was quite convinced that the language was some Asian
"syllabic" language like Chinese, Vietnamese, Tibetan, Thai, etc. (an
idea that I stole from Jacques, although he himself did not seem to
take it too seriously). I got that conviction chiefly because of the
peculiar structure of Voynichese "words", which is quite unlike that
of Indo-European or Semitic words, but fairly similar to that of
syllables in a phnetically rich language. 

There were other arguments too, such as the shape of the word length
distribution, the apparent lack of syntactic patterns, the apparent
absence of numerals (or letters used as numerals), and the general
"weirdness" of the book. And that theory could explain why crypto
heavyweights like Friedman did not get anywhere.

In my recollection, the main arguments that were raised against the
"Chinese theory" were (1) the supposed historical impossibility of an
obviously European book being written in Chinese, at the required
time; (2) the lack of any oriental-looking features in the
illustrations; and (3) an apparent mismatch between the statistics of
Chinese syllables and Voynichese words.

I believe that the first argument is easily dismissed: there were
enough channels and contacts between Europe and East Asia to make such
a book possible, in many scenarios. The second argument could also be
explained in several ways --- an European author, an European copyst,
or an Asian author who was trying to imitate the looks of European
books. And the third argument was almost surely based on inadequate data:
the "pinyin Tao" sample that was used for comparison, besides being a
rather special text, is not a phonetic rendering of spoken
Chinese---of the 1500's, or any other period---but merely a
conventional transcription of the ideograms. 

So I kept believing
in some variant of the Chinese theory---until a few months ago,
when I happened to run a simple statistical test (suggested by
Bradley Schaefer).  If we map each Voynichese word to the number of gallows
that it contains, we get the following text:

    ?1110110           110000000001       11110001      
    00000110           11111110           1110100       
    1000?1110          00?10011110        ??100100      
    1110100001         01010000011        00000000      
    01010000011        010010?10                        
    111011111          110000101          11110101100   
    111010001          000001000          1110001000100 
    00??1101           10010?10           010101000101  
    11011??0?          000000001          ?11111001010  
    0100110110         10100000           011100110     
    1101011001         00100010?          ...

>From previous analysis, I already knew the sequence would contain
almost exclusively 1s and 0s; but I was expecting them to be randomly
interleaved. Instead, the 1s and 0s tend to be clustered in runs of
same value. Said another way, there is a strong correlation between
the presence or absence of gallows in adjacent words.

This unexpected feature of Voynichese throws a monkey wrench into the
Chinese theory. (My only consolation is that it is bad news for many
other theories as well.) I find it hard to imagine a spelling system
for a Chinese-like language that would produce this effect.

It is true that, in spoken Mandarin, the tone of a syllable affects
the pronounciation of the next syllable -- an effect called "tone
sandhi" by linguists. However, as far as I know, this effect happens
with only a fraction of the syllables, and the resulting correlation
does not seem to extend very far.  Perhaps the effect is more
pronounced in some other Asian language (Tibetan? Thai?), but I am 
not optimistic.

So now I am not so keen on the Chinese theory any more. I still
believe in some of its premises, though. First, to me, the
"natural-looking" word statistics say that Voynichese is not a
cryptographic system, but merely an exotic language rendered in an
original alphabet. Second, I take the the rigid structure of the
"words" as proof that they are actually single syllables.
But the strong correlation of the "gallows bit" seems to 
rule out East Asian languages.

That feature does not seem to fit any Indo-European language, either.
Bradley himself suggested that the gallows could be a marker for
gender or number; the correlation would then be a shadow of typical IE
case agreement rules. However, this explanation would require the text
to consist mainly of strings of 3-4 adjectives modifying the same noun
--- which seems unlikely, even for a "madman's rant". Moreover, the
gallows letters usually occur near the beginning of the word; why
would an IE-speaking author do that? And, finally, the Voynichese
words seem too short and too rigidly structured to be IE words.

These same arguments seem to rule out Semitic languages, and Finnish
as well.  So what is left?

There is one language family that may fit the bill.  In Turkish and
other related languages (e.g. Uzbek), many concepts that are expressed
by separate words in English are realized as suffixes (usually one syllable
long) attached to some "head" word.  Moreover, the vowels are divided into 
two sets, "front" and "back"; and the suffixes must always use vowels
of the same class as those of the head word. 

Thus, it looks like we can explain the observed features of Voynichese
by assuming that it is actually Turkish, with each "word" being either
a head word or a suffix. The gallows letters would then denote one
class of vowels; the other class would be represented by some
non-gallows letters or combinations.

This "Turkish theory" has a few more arguments in its favor. For one
thing, Turkish generally avoids vowel-vowel pairs and isolated vowels,
which fits with the absence of gallows-gallows pairs and isolated
gallows in Voynichese. Moreover, while suffixes are usually one
syllable long, and thus contain only one vowel, head words may have
two or more syllables; and, as John Grove once pointed out,
two-gallows words are often found at the beginning of lines, and in
labels --- where head words should be found.

It is curious that the correlation appears to be confined to the
"gallows bit"; other binary attributes that I have tried do not show
this effect --- and, as far as I know, there is no obvious
multi-syllable correlation in Turkish other than the front/back
vowel harmony.

Turkish is historically plausible as well (far more so than Chinese or
Tibetan, I must admit). In the Middle Ages, Turkey was a major player
in the Muslim world, and had a flourishing alchemical and medical
literature of its own. Indeed, there are many Turkish manuscripts in
the Prague National Library.

Moreover, until the 1920's, Turkish was written in the Arabic script,
which is quite unsuited to the language, While Arabic has only three
vowels, Turkish has 8; and while Arabic vowels usually denote
inflections, in Turkish (as in IE languages) they are part of the
lexical entry. Thus, an European (or a transplanted Turk) who had to
write a text in Turkish would have had a good motivation for inventing
a new script for the language.

Finally, plump nymphs payfully splashing in ponds and tubs is a 
scene that does not seem exactly out of line with a Turkish origin.
Public baths (with separate days for men and women) were a big thing in 
Turkish society, as among the Romans. And I recall seeing
images os a famous hot spring place in Turkey, where the pools
are surrounded by travertino formations that resemble some
of the VMs "tubs".

So, where do we go from here? To follow up on this lead, one would need 
knowledge of medieval Turkish, which I obviously do not have.
Perhaps we can find lists of turkish star names and match them to the 
labels in the astro/zodiac section...

    > [3] or that there is no answer to be had and the manuscript is a
    > hoax.

This may seem an easy way out, but it is not. The VMs text has
remarkable structure and consistency, with roughly the expected amount
of variation across and within sections. Its word statistics are
remarkably similar to those of natural languages, while the
distribution of letters within words is quite peculiar but not
obviously mechanical.

So anyone who proposes solution [3] has the burden of explaining how
the author could have produced a text with those features. As far as I
can see, the word statistics and the peculiar word structure cannot be
reproduced by a simple mechanical method, with or without
dice-throwing steps --- even if we allow for context-dependent
word-breaking rules. A sufficiently contrived scheme could perhaps
generate the small-scale structure, but not the long-range one (such
as page- and section-specific words).

So, if the manuscript is a hoax, it must be the result of feeding some
natural language text through a (partly?) mechanical scrambling
procedure. But this kind of "hoax" would not be much different from a
bona-fide undeciphered text (except perhaps that under the "hoax"
theory the text doesn't have to be related to the pictures, and the
percentage of "coding mistakes" may be quite high).

Moreover, there would remain the question of why the author went to so
much trouble, when he could have created a much more exciting hoax
with much less effort.

All the best,

--stolfi

PS