[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Back to basics - or musings of an old bore




    > [Adam McLean:] It seems to me that many of the skilled
    > cryptographers on this group have puzzled and worked over the
    > Voynich now for many years and yet seem no nearer to cracking
    > the code.

I would take some exception to that.

It is true that much work was spent over the manuscript, but 
a large fraction of it was wasted because it was based on 
certain assumptions about the nature of the "code" that 
are now seen to be dubious at best.

The "new" evidence against those assumptions includes mainly the
structure of the words, as hinted by the various "paradigms"
(Tiltman's, Firth's, Roe's) and expanded in the crust-mantle-core
model. It also includes the fact that gallows and non-gallows words
occur with essentially the same frequency, and ditto for bench and
non-bench words; and the fact that these two traits are essentially
independent. Futher evidence is the symmetric distribution of word
lengths.

This evidence, in particular, seems to rule out the assumption of an
alphabetic substitution scheme applied to a "major" European or
Near-Eastern language -- even allowing for multiple alphabets or
Vigenere-like schemes, redundant substitution, null characters, etc.
--- which was the starting point of many past crptographic analysis.
To explain the peculiar features of Voynichese listed above, we must
now assume either an underlying language that exhibits those same
features, or an encoding scheme that generates them as an artifact
(or, of course, some combination of the two).

As many have remarked, the rigid word structure could mean that the
Voynichese "words" are in fact syllables. Thus either we have a
"major" language written syllable-by-syllable, or a language with
monosyllabic words. (As some of you unfortunately still remember, I
once bet some pizzas on this last horse.) One problem with these
theories is that there are about 6000 distinct Voynichese words (if we
exclude tokens with dubious readings) --- which is way too large for
any "major" language, and uncomfortably large even for the most
syllable-rich East Asian languages. This particular problem could be
solved by assuming a less than perfect encoding, e.g. with pitch marks
instead of tone marks, or a fair amount of spelling variation. But it
seems hard to believe that a natural language would exibit the
observed peculiarties in the gallows and bench frequencies, or the
symmetric word-length distribution.

So it seems more likely that all those peculiar features of Voynichese
words are side effects of the encoding. Substition schemes which
insert spaces inside words, such as Gabriel's "daiin dain Latin", may
explain the absence of long words, but do not seem able to explain the
other features --- unless the substitution strings were specifically
chosen to produce those features, which seems most unlikely (they are
rather hard to "see", even with computers).

At this point, the only encoding I can think of that seems to meet all
the constraints is some codebook scheme with "word codes" assigned
systematically in a manner resembling to the Roman number system. (The
assumption that each Voynichese "word" is indeed a word of the
language is supported by Zipf's law analysis, by the "lumpy" and
section-dependent distribution of words along the manuscript, and by
the observation that labels have the same structure as single words
and tend to occur in the text, at roughly the expected places.)

The encoding need not be a pure codebook scheme, where each word is
independently assigned to a different code. The scheme may, for
instance, assign to each lexical *stem* an arbitrary code, which is
then modified in some systematic way to indicate gender, number, case,
tense, etc. (Note that such a scheme could make the code much easier
to read and write.) There are hints, for example, that the EVA <q>
glyph may be such a modifier --- since it hardly occurs in labels, and
often a label <XXX> occurs as <qXXX> in the text nearby.

    > It also seems unlikely to me that someone in the 16th century
    > could devise a code that could defeat 21st century methods.

If the codebook hypothesis is correct, then it is no wonder that all
methods that were designed to crack alphabetic substitution schemes
have failed so far.  

    > But how else can we proceed ? I know I must sound like an old
    > bore, always coming back to the same theme, but it seems to me
    > that we have not yet exhausted an approach based on seeing the
    > context of the manuscript - and relating it to other similar
    > material.
    
Indeed. I suppose that to crack a codebook scheme (or to decipher a
logographic script) one needs to identify the meaning of a few key
words, and proceed from there. That does not seem to be an easy task,
even though we happen to have a whole illustrated book whose general
subject we vaguely know. (Just try "decoding" an illustrated Chinese
herbal, to see how hard that can be.)

For instance, there is page f67r2 which, as Robert Firth once pointed out,
seems to list the names of the seven planets:

  <okal>  <okain am>  <opcholdy>  <ofar oeoldan>  <ytoaiin>  <yfain> 
 
So why can't we go on from there? Well, for starters, we do not know
which planet is which. Moreover, of those seven words, <okal> alone is
very common throughout the text (~140 occurrences), while the others
hardly occur at all (although <okain> alone has ~110 occurrences, and
<ofar> ~4). So what does <okal> mean: Moon? Earth? Sun? Venus? Or do
we have a case of homonymy here, like between "mercury" the planet and
"mercury" the metal? (Other examples abound in other languages, e.g.
"mars" is planet and month in French, and "water star" is some planet's
name in Chinese.) Or perhaps the seven words are not the planets'
names, but some other attributes?

There is also the "fallopian tubes" illustration on f77v, where the 
left and right tubes are labeled <otol shedy> and <otolor> --- both 
fairly common through the text.   Note that <or> by itself is a very
common word (~360 occurrences), and <otolor> also occurs as a label in
the seven planets' page, f67r2.   

So we have no lack of clues; all we need now is a really smart brain...

    > http://www.alchemy.dial.pipex.com/tetrabiblos.jpg
    > The women figures in the Vatican manuscript are 
    > coloured. Are they coloured on the similar
    > Voynich drawing ?

Indeed! Actually, the figurines in both of the two inner circles
(clothed and naked) resemble those in the VMs.

    > 3. The Voynich script itself. No other example of 
    > this has yet been found, though some characters
    > seem very familiar.
    
The familiarity, I believe, is mostly a coincidence.

In fact, most of the VMs alphabet seems to have been generated by
systematic combination of a "base stroke" (either <e> or <i>) with one
of five or six "plume strokes". Similarly the gallows are combinations
of a "left leg" and a "left leg", each chosen among 2 (or 3?)
possibilities. Therefore, the occasional resemblances with Roman
characters are probably meaningless coincindences: after all, there
aren't many different glyph shapes that can be drawn with two simple
pen strokes.
    
Speaking of which: a while ago I posted a message calling attention to
the above, and to the apparent correlation between the right stroke of
each glyph and the left stroke of the following glyph. Has anyone else
checked that claim? (I still haven't had time to follow it up
properly; perhaps over the weekend...).

All the best,

--stolfi