[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: The Mystery of the Voynich Manuscript (Scientific American)
June 21, 2004
The Mystery of the Voynich Manuscript
New analysis of a famously cryptic medieval document suggests that it
contains nothing but gibberish
By Gordon Rugg
In 1912 Wilfrid Voynich, an American rare-book dealer, made the find of a
lifetime in the library of a Jesuit college near Rome: a manuscript some 230
pages long, written in an unusual script and richly illustrated with bizarre
images of plants, heavenly spheres and bathing women. Voynich immediately
recognized the importance of his new acquisition. Although it superficially resembled
the handbook of a medieval alchemist or herbalist, the manuscript appeared to
be written entirely in code. Features in the illustrations, such as hairstyles,
suggested that the book was produced sometime between 1470 and 1500, and a
17th-century letter accompanying the manuscript stated that it had been
purchased by Rudolph II, the Holy Roman Emperor, in 1586. During the 1600s, at least
two scholars apparently tried to decipher the manuscript, and then it
disappeared for nearly 250 years until Voynich unearthed it.
Voynich asked the leading cryptographers of his day to decode the odd script,
which did not match that of any known language. But despite 90 years of
effort by some of the world's best code breakers, no one has been able to decipher
Voynichese, as the script has become known. The nature and origin of the
manuscript remain a mystery. The failure of the code-breaking attempts has raised
the suspicion that there may not be any cipher to crack. Voynichese may contain
no message at all, and the manuscript may simply be an elaborate hoax.
Critics of this hypothesis have argued that Voynichese is too complex to be
nonsense. How could a medieval hoaxer produce 230 pages of script with so many
subtle regularities in the structure and distribution of the words? But I have
recently discovered that one can replicate many of the remarkable features of
Voynichese using a simple coding tool that was available in the 16th century.
The text generated by this technique looks much like Voynichese, but it is
merely gibberish, with no hidden message. This finding does not prove that the
Voynich manuscript is a hoax, but it does bolster the long-held theory that an
English adventurer named Edward Kelley may have concocted the document to
defraud Rudolph II. (The emperor reportedly paid a sum of 600 ducats--equivalent
to about $50,000 today--for the manuscript.)
Perhaps more important, I believe that the methods used in this analysis of
the Voynich mystery can be applied to difficult questions in other areas.
Tackling this hoary puzzle requires expertise in several fields, including
cryptography, linguistics and medieval history. As a researcher into expert
reasoning--the study of the processes used to solve complex problems--I saw my work on
the Voynich manuscript as an informal test of an approach that could be used to
identify new ways of tackling long-standing scientific questions. The key
step is determining the strengths and weaknesses of the expertise in the relevant
Baby God's Eye?
The first purported decryption of the Voynich manuscript came in 1921.
William R. Newbold, a professor of philosophy at the University of Pennsylvania,
claimed that each character in the Voynich script contained tiny pen strokes that
could be seen only under magnification and that these strokes formed an
ancient Greek shorthand. Based on his reading of the code, Newbold declared that
the Voynich manuscript had been written by 13th-century philosopher-scientist
Roger Bacon and described discoveries such as the invention of the microscope.
Within a decade, however, critics debunked Newbold's solution by showing that
the alleged microscopic features of the letters were actually natural cracks in
The Voynich manuscript appeared to be either an unusual code, an unknown
language or a sophisticated hoax.
Newbold's attempt was just the start of a string of failures. In the 1940s
amateur code breakers Joseph M. Feely and Leonell C. Strong used substitution
ciphers that assigned Roman letters to the characters in Voynichese, but the
purported translations made little sense. At the end of World War II the U.S.
military cryptographers who cracked the Japanese Imperial Navy's codes passed
some spare time tackling ciphertexts--encrypted texts--from antiquity. The team
deciphered every one except the Voynich manuscript.
In 1978 amateur philologist John Stojko claimed that the text was written in
Ukrainian with the vowels removed, but his translation--which included
sentences such as "Emptiness is that what Baby God's Eye is fighting for"--did not
jibe with the manuscript's illustrations nor with Ukrainian history. In 1987 a
physician named Leo Levitov asserted that the document had been produced by the
Cathars, a heretical sect that flourished in medieval France, and was written
in a pidgin composed of words from various languages. Levitov's translation,
though, was at odds with the Cathars' well-documented theology.
Furthermore, all these schemes used mechanisms that allowed the same
Voynichese word to be translated one way in one part of the manuscript and a different
way in another part. For example, one step in Newbold's solution involved the
deciphering of anagrams, which is notoriously imprecise: the anagram ADER,
for instance, can be interpreted as READ, DARE or DEAR. Most scholars agree that
all the attempted decodings of the Voynich manuscript are tainted by an
unacceptable degree of ambiguity. Moreover, none of these methods could encode
plaintext--that is, a readable message--into a ciphertext with the striking
properties of Voynichese.
If the manuscript is not a code, could it be an unidentified language? Even
though we cannot decipher the text, we know that it shows an extraordinary
amount of regularity. For instance, the most common words often occur two or more
times in a row. To represent the words, I will use the European Voynich
Alphabet (EVA), a convention for transliterating the characters of Voynichese into
Roman letters. An example from folio 78R of the manuscript reads: qokedy qokedy
dal qokedy qokedy. This degree of repetition is not found in any known
language. Conversely, Voynichese contains very few phrases where two or three
different words regularly occur together. These characteristics make it unlikely
that Voynichese is a human language--it is simply too different from all other
The third possibility is that the manuscript was a hoax devised for monetary
gain or that it is some mad alchemist's meaningless ramblings. The linguistic
complexity of the manuscript seems to argue against this theory. In addition
to the repetition of words, there are numerous regularities in the internal
structure of the words. The common syllable qo, for instance, occurs only at the
start of words. The syllable chek may appear at the start of a word, but if it
occurs in the same word as qo, then qo always comes before chek. The common
syllable dy usually appears at the end of a word and occasionally at the start
but never in the middle.
A simple "pick and mix" hoax that combines the syllables at random could not
produce a text with so many regularities. Voynichese is also much more complex
than anything found in pathological speech caused by brain damage or
psychological disorders. Even if a mad alchemist did construct a grammar for an
invented language and then spent years writing a script that employed this grammar,
the resulting text would not share the various statistical features of the
Voynich manuscript. For example, the word lengths of Voynichese form a binomial
distribution--that is, the most common words have five or six characters, and
the occurrence of words with greater or fewer characters falls off steeply from
that peak in a symmetric bell curve. This kind of distribution is extremely
unusual in a human language. In almost all human languages, the distribution of
word lengths is broader and asymmetric, with a higher occurrence of
relatively long words. It is very unlikely that the binomial distribution of Voynichese
could have been a deliberate part of a hoax, because this statistical concept
was not invented until centuries after the manuscript was written.
In summary, the Voynich manuscript appeared to be either an extremely unusual
code, a strange unknown language or a sophisticated hoax, and there was no
obvious way to resolve the impasse. It so happened that my colleague Joanne Hyde
and I were looking for just such a puzzle a few years ago. We had been
developing a method for critically reevaluating the expertise and reasoning used in
the investigation of difficult research problems. As a preliminary test, I
applied this method to the research on the Voynich manuscript. I started by
determining the types of expertise that had previously been applied to the problem.
The assessment that the features of Voynichese are inconsistent with any
human language was based on substantial relevant expertise from linguistics. This
conclusion appeared sound, so I proceeded to the hoax hypothesis. Most people
who have studied the Voynich manuscript agreed that Voynichese was too complex
to be a hoax. I found, however, that this assessment was based on opinion
rather than firm evidence. There is no body of expertise on how to mimic a long
medieval ciphertext, because there are hardly any examples of such texts, let
alone hoaxes of this genre.
Several researchers, such as Jorge Stolfi of the University of Campinas in
Brazil, had wondered whether the Voynich manuscript was produced using random
text-generation tables. These tables have cells that contain characters or
syllables; the user selects a sequence of cells--perhaps by throwing dice--and
combines them to form a word. This technique could generate some of the
regularities within Voynichese words. Under Stolfi's method, the table's first column
could contain prefix syllables, such as qo, that occur only at the start of
words; the second column could contain midfixes (syllables appearing in the middle
of words) such as chek, and the third column could contain suffix syllables
such as y. Choosing a syllable from each column in sequence would produce words
with the characteristic structure of Voynichese. Some of the cells might be
empty, so that one could create words lacking a prefix, midfix or suffix.
English adventurer Edward Kelley may have concocted the document to defraud
Rudolph II, the Holy Roman Emperor.
Other features of Voynichese, however, are not so easily reproduced. For
instance, some characters are individually common but rarely occur next to each
other. The characters transcribed as a, e and l are common, as is the
combination al, but the combination el is very rare. This effect cannot be produced by
randomly mixing characters from a table, so Stolfi and others rejected this
approach. The key term here, though, is "randomly." To modern researchers,
randomness is an invaluable concept. Yet it is a concept developed long after the
manuscript was created. A medieval hoaxer probably would have used a different
way of combining syllables that might not have been random in the strict
statistical sense. I began to wonder whether some of the features of Voynichese
might be side effects of a long-obsolete device.
The Cardan Grille
It looked as if the hoax hypothesis deserved further investigation. My next
step was to attempt to produce a hoax document to see what side effects
emerged. The first question was, Which techniques to use? The answer depended on the
date when the manuscript was produced. Having worked in archaeology, a field
in which dating artifacts is an important concern, I was wary of the general
consensus among Voynich researchers that the manuscript was created before 1500.
It was illustrated in the style of the late 1400s, but this attribute did not
conclusively pin down the date of its origin; artistic works are often
produced in the style of an earlier period, either innocently or to make the
document look older. I therefore searched for a coding technique that was available
during the widest possible range of origin dates--between 1470 and 1608.
A promising possibility was the Cardan grille, which was introduced by
Italian mathematician Girolamo Cardano in 1550. It consists of a card with slots cut
in it. When the grille is laid over an apparently innocuous text produced
with another copy of the same card, the slots reveal the words of the hidden
message. I realized that a Cardan grille with three slots could be used to select
permutations of prefixes, midfixes and suffixes from a table to generate
A typical page of the Voynich manuscript contains about 10 to 40 lines, each
consisting of about eight to 12 words. Using the three-syllable model of
Voynichese, a single table of 36 columns and 40 rows would contain enough syllables
to produce an entire manuscript page with a single grille. The first column
would list prefixes, the second midfixes and the third suffixes; the following
columns would repeat that pattern. You can align the grille to the upper left
corner of the table to create the first word of Voynichese and then move it
three columns to the right to make the next word. Or you can move the grille to
a column farther to the right or to a lower row. By successively positioning
the grille over different parts of the table, you can create hundreds of
Voynichese words. And the same table could then be used with a different grille to
make the words of the next page.
I drew up three tables by hand, which took two or three hours per table. Each
grille took two or three minutes to cut out. (I made about 10.) After that, I
could generate text as fast as I could transcribe it. In all, I produced
between 1,000 and 2,000 words this way.
I found that this method could easily reproduce most of the features of
Voynichese. For example, you can ensure that some characters never occur together
by carefully designing the tables and grilles. If successive grille slots are
always on different rows, then the syllables in horizontally adjacent cells in
the table will never occur together, even though they may be very common
individually. The binomial distribution of word lengths can be generated by mixing
short, medium-length and long syllables in the table. Another characteristic
of Voynichese--that the first words in a line tend to be longer than later
ones--can be reproduced simply by putting most of the longer syllables on the left
side of the table.
The Cardan grille method therefore appears to be a mechanism by which the
Voynich manuscript could have been created. My reconstructions suggest that one
person could have produced the manuscript, including the illustrations, in just
three or four months. But a crucial question remains: Does the manuscript
contain only meaningless gibberish or a coded message?
I found two ways to employ the grilles and tables to encode and decode
plaintext. The first was a substitution cipher that converted plaintext characters
to midfix syllables that are then embedded within meaningless prefixes and
suffixes using the method described above. The second encoding technique assigned
a number to each plaintext character and then used these numbers to specify
the placement of the Cardan grille on the table. Both techniques, however,
produce scripts with much less repetition of words than Voynichese. This finding
indicates that if the Cardan grille was indeed used to make the Voynich
manuscript, the author was probably creating cleverly designed nonsense rather than a
ciphertext. I found no evidence that the manuscript contains a coded message.
This absence of evidence does not prove that the manuscript was a hoax, but
my work shows that the construction of a hoax as complex as the Voynich
manuscript was indeed feasible. This explanation dovetails with several intriguing
historical facts: Elizabethan scholar John Dee and his disreputable associate
Edward Kelley visited the court of Rudolf II during the 1580s. Kelley was a
notorious forger, mystic and alchemist who was familiar with Cardan grilles. Some
experts on the Voynich manuscript have long suspected that Kelley was the
My undergraduate student Laura Aylward is currently investigating whether
more complex statistical features of the manuscript can be reproduced using the
Cardan grille technique. Answering this question will require producing large
amounts of text using different table and grille layouts, so we are writing
software to automate the method.
This study yielded valuable insights into the process of reexamining
difficult problems to determine whether any possible solutions have been overlooked. A
good example of such a problem is the question of what causes Alzheimer's
disease. We plan to examine whether our approach could be used to reevaluate
previous research into this brain disorder. Our questions will include: Have the
investigators neglected any field of relevant expertise? Have the key
assumptions been tested sufficiently? And are there subtle misunderstandings between
the different disciplines that are involved in this work? If we can use this
process to help Alzheimer's researchers find promising new directions, then a
medieval manuscript that looks like an alchemist's handbook may actually prove to
be a boon to modern medicine.
GORDON RUGG became interested in the Voynich manuscript about four years ago.
At first he viewed it as merely an intriguing puzzle, but later he saw it as
a test case for reexamining complex problems. He earned his Ph.D. in
psychology at the University of Reading in 1987. Now a senior lecturer in the School of
Computing and Mathematics at Keele University in England, Rugg is editor in
chief of Expert Systems: The International Journal of Knowledge Engineering and
Neural Networks. His research interests include the nature of expertise and
the modeling of information, knowledge and beliefs.
© 1996-2004 Scientific American, Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: