[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: NEW SCIENTIST 11/2001 ZANDONELLA



Hi Dana and Rene and all,
 
Here is Zandonella's article.
Best regards,
 
Jean
 

Book of Riddles

New Scientist vol 172 issue 2317 - 17 November 2001, page 36

 

Are we on the brink of decoding the most mysterious document in the world, asks Catherine Zandonella

 

IT HAS been called the most mysterious manuscript in the world. Each page is filled with strange illustrations of medicinal plants, astrological diagrams and naked women. The whole book is bursting with long-forgotten secrets.

There's just one snag. Its 234 pages are written in an unknown script representing an unknown tongue, and despite nearly a century of study by the world's best cryptanalysts, academics and hobbyists, no one has verifiably deciphered a single word. What has become known as the Voynich manuscript could be the ravings of a madman, a made-up language or an elaborate hoax to dupe a 16th-century monarch out of a small fortune in gold. Nobody knows.

"It's maddening," says Jacques Guy, a retired French-born linguist who studied the manuscript as a hobby for more than 10 years. "You would think by now we'd have figured it out." Guy is one of a band of hobbyists who are still trying to crack the Voynich manuscript, armed only with computers and a bunch of kooky ideas.

Statistical techniques borrowed from the Human Genome Project have thrown up some fascinating clues to the nature of the text. And now that the most accurate ever transcript of the manuscript is nearly complete, these researchers may have the best chance yet of finally solving the puzzle. If they succeed, it won't be before time.

The American collector Wilifrid M. Voynich came across the book at the Villa Mondragone in Frascati, Italy, back in 1912. Tucked inside the tome was a letter dated 1665 or 1666 from the rector at the University of Prague, asking a well-known scholar to have a go at cracking the cipher. According to the letter, the Holy Roman Emperor Rudolf II of Bohemia, who reigned from 1576 to 1612, bought the manuscript for 600 ducats?about three-and-a-half kilograms of gold. The letter also hinted that Roger Bacon, the 13th-century Franciscan monk and mystic who dabbled in science and cryptography, might have penned the manuscript, though this is highly uncertain. The book is now held at Yale University in the Beinecke Rare Book and Manuscript Library.

But the intriguing history is only half the appeal. The Voynich itself is a decidedly strange object. It's a small, thick book about the size of a hardback novel, illustrated with bizarre pictures. Forget the sumptuously illuminated manuscripts you've seen in museums: the Voynich's art is about as sophisticated as that of a 12-year-old?even if the subject matter is somewhat more advanced.

The botanical section features crudely drawn flowers and plants, unlike anything found in nature. Accuracy does not appear to be the artist's goal, though some of the plants look vaguely like peppers and sunflowers. Then there are the women who appear in the biological section?"nymphs" as Voynich devotees call them. Short and squat with pot bellies and small, pointed breasts, these females wade through ghoulish green baths fed by pipes resembling arteries and Fallopian tubes. The astrological section is perhaps the most carefully drawn, featuring zodiacal circles and drawings of the heavens, including what appears to be the Andromeda Galaxy.

But even more mysterious than the artwork is the text. The characters look teasingly familiar: some resemble Roman characters, Arabic numerals and Latin abbreviations. Elaborate "gallows" characters decorate many beginnings of lines, while an enigmatic swirl like the number nine can be found at the end of many of the words. Yet the document's meaning remains infuriatingly obscure.

When Voynich brought the manuscript to the US, he invited cryptographers to try their luck at decoding it. The first to claim success was William Newbold, at the University of Pennsylvania who announced in 1919 that the manuscript was a copy of Bacon's lab notes. Newbold's translation implied that Bacon had access to telescopes and microscopes, instruments not thought to exist in the 13th century.

Failed Attempt

Eventually Newbold's interpretation was discredited. He died muddled and frustrated, but refusing to admit defeat. Since Newbold, many cryptanalysts and crackpots have claimed victory over the Voynich, but all of them suffered the same problem: their solution did not apply to all the text in the manuscript. Bizarre theories about the manuscript still crop up now and again, such as the idea that it is written in vowel-deficient Ukrainian, or that it is a rare document describing the Cathar movement.

Whatever the truth, it seems the text was encoded using a method so arcane as to render it impossible to decrypt, even with today's computers. Cryptanalysts at the US National Security Agency had a shot at it in the 1960s and 1970s. They transcribed it into a machine-readable form and ran a few statistical tests but even the NSA couldn't solve it.

Then, in the late 1970s, Yale scholar Robert Brumbaugh said he'd worked it out: the manuscript was a hoax designed to fleece Emperor Rudolf II of his riches. Only a few sections of the text were decipherable, said Brumbaugh, presumably to convince the discerning buyer that the item was genuine. Since the Voynich bears little likeness to the sumptuous illustrated manuscripts of its time, and because it was written on the kind of cheap paper used to wrap fish, the hoax theory is a popular one?except with Voynich devotees like British-based pathologist Gabriel Landini of the University of Birmingham. "That sort of attitude is not very productive," he says.

After all, there are perfectly good reasons aside from hoaxing why the Voynich should be hard to crack. Finding a true solution means either working out a cipher or finding a code book?a sort of dictionary in which each Voynich word has an equivalent word in Latin, English or some other language. If the manuscript was coded with a Voynich dictionary, then unpicking the text may prove almost impossible unless a tattered copy of the code book turns up somewhere.

Prospects are better if the book was written using a cipher?an algorithm for replacing the ordinary letters in a text with Voynich characters. The ciphers normally used by 15th or 16th-century cryptanalysts were not all that high-tech. Most consisted of simple substitutions, in which a different character replaced each letter of the alphabet in the message. This kind of cipher is easy to solve. You tabulate the frequencies of the characters in the enciphered text, and then match the distribution with that of the base language to find out which letter each character represents.

Much harder to crack are ciphers that use any of several characters to replace a given letter. Leon Battista Alberti published this concept in the 1460s, so the Voynich could be written in one of these if it dates from a later period. It's also possible to crack a polyalphabetic cipher by comparing letter frequencies, but far harder than for a simple cipher.

The Voynich presents its own special difficulties as well. How do you compare letter frequencies when you don't know what language it uses? Worse, researchers know little about the nature of the Voynich alphabet itself. Even handwriting in English can be hard to read?two cursive e's in a row can look confusingly like a cursive u. And in the flowery and variable Voynich handwriting, many of the letters flow together. Some characters, such as the gallows letters, may simply be paragraph markers. Or perhaps each Voynich word is a letter, or maybe the spaces between words are placed at random to confuse the reader.

It may seem hopeless, but researchers have still managed to glean some basic facts. The manuscript is about 250,000 words long, containing about 40,000 different words. There are between 23 and 30 characters, and curiously, none behaves like a number. The manuscript reads from left to right, and most Voynichese words are about six characters long. They show less variation in length than those of English, Latin and most other Indo-European languages and there are a lot of repeats?up to four consecutive repetitions of a word is common, as are strings of words that vary only by one character.

In the past 10 years, since the inception of a Voynich e-mail circle to share ideas and findings, researchers have been using statistical methods to search the manuscript for clues about what it could be hiding. For all its strangeness, the Voynich does indeed display certain consistent statistical properties that are hallmarks of natural languages. So the manuscript is unlikely to be the random writings of a madman or a fraud.

One way to analyse the manuscript is by studying its entropy?a measure of how densely information is packed into the characters or words. A high-entropy language contains lots of variation in the character order, word order, or both. There are different kinds of entropy but one of the simplest can be calculated by asking, given any character, how much uncertainty there is about what the next character might be. For example, in English we know that a q is almost always followed by a u, so there is low uncertainty and thus low entropy for that particular letter.

In 1976, Yale physicist William Bennett found that this kind of entropy for Voynichese was low compared with Latin, English and other European languages. There are low-entropy languages from Polynesia, but it seems unlikely that Hawaiian islanders were wandering around Renaissance Italy.

This low entropy rules out a simple or polyalphabetic cipher, since the former leaves entropy unchanged, and the other increases it. Although there are some kinds of cipher that can decrease entropy, the entropy data seems to suggest that Voynich is a code-book text, or possibly a made-up language.

A much stranger theory was put forward a few years ago by Jorge Stolfi, a Brazilian computer-graphics researcher at the University of Campinas in Brazil. By running a computer program that analysed the placement of each character within words, he found that certain letters seem to appear only in certain parts of the words?indicating a prefix, middle and suffix to each word.

He conjectured that Voynichese might be a version of Chinese, written down phonetically by Chinese visitors who accompanied Italian explorers home from their travels. Chinese words are monosyllabic, and each one has three phonetic components. Thus the prefix might signify the tonal quality of the word, while the middle determines the consonant and the suffix the vowel.

Stolfi made a pizza bet with other Voynich aficionados that his Chinese theory was correct, but has since backed down. The others aren't ready to accept Stolfi's defeat just yet, however. "I tried to buy Landini a pizza but he said the idea is not dead until someone actually deciphers the manuscript," he says.

Meanwhile, Stolfi has refined his computer model, and last year published a new analysis that finds a "crust, mantle and core" to the words. This consistent structure makes it unlikely that Voynichese is random gibberish or a polyalphabetic cipher. His findings do seem compatible with a dictionary-like translation, or a non-European language with largely monosyllabic words.

Stolfi's current theory, inspired by the rigid word structure, is that the Voynich "words" are actually numerals. Many numbering systems place restrictions on the order in which different symbols appear?a six-character Roman numeral, for example, would probably start with M, D or C, and end with I, V or X. So could the Voynich manuscript actually be a list of numbers? The fact that they are mostly of a similar length suggests another level of complexity, and Stolfi thinks you might need a code book to translate the numbers into meanings.

Hidden Information

Several other researchers are continuing to explore the idea that the manuscript is enciphered. New studies suggest that its entropy may not be as low as has been thought. Rene Zandbergen, who works as a systems analyst in Darmstadt, Germany, found last year that whereas the first and second characters in each Voynich word do indeed have low entropy, the third and subsequent letters carry more information. This means it could yet be an enciphered language after all, and suggests Arabic or Oriental origin.

Meanwhile, Landini has studied the text with a technique called spectral analysis. It's a method normally used to search for patterns within fluctuating strings of data such as DNA bases or musical notes. The likelihood of finding a character repeated after a certain period?eight letters later, say?is plotted against the period. For random inputs such as white noise, the resulting "power spectrum" is a flat line. But for signals with some underlying pattern, the power spectrum deviates from a straight line?even if the pattern is very subtle.

Landini published his results in October (Cryptologia, vol 25, p 275). He found that the patterns in the Voynich manuscript matched those of natural languages. For example, a natural period of about 5.9 characters was evident, which coincides with the average word length in the manuscript. This seems to confirm that the spaces do indeed separate real words, rather than being placed arbitrarily to confuse the reader.

So what's the next line of attack? Perhaps most important for now is getting clean data for the statisticians to work on. The microfilm versions available from Yale for $40 can be difficult to read, increasing the confusion over what is and what isn't a letter, so Voynich enthusiasts have started creating a high-fidelity digital transcription. Zandbergen and Landini have collected the transcriptions made by the US National Security Agency researchers and others over the years. They've been poring over each copy to resolve discrepancies and come up with a definitive version. They hope eventually to persuade the Beinecke Library to issue a full-colour version on CD-ROM. "It is like the Human Genome Project for the Voynich," says Landini. "Once we have a complete copy, we'll be able to repeat all the things that have been done before on better-quality data."

Luck and inspiration will still play an important part, however, says Jim Reeds, a professional cryptanalyst at AT&T Labs in Florham Park, New Jersey. Reeds is known for deciphering another 15th-century work, a book by Johannes Trithemius, a German abbot and avid cryptographer. "Supercomputers don't crack codes by themselves," says Reeds. "They need people to come up with some sort of plan."

But Jacques Guy says he's tired of the chase, although he still regularly posts updates on the Voynich e-mail circle. "The Voynich manuscript is a litmus test of our knowledge of language," he says. "And the result of the test is pretty dreadful." Guy reckons it's time to pursue a hobby that might actually bring a real pay-off?perhaps decrypting the Easter Island tablets. "At least I know they're not a hoax."


Catherine Zandonella
 


Rene Zandbergen <r_zandbergen@xxxxxxxxx> wrote:

--- Dana Scott wrote:
> "Voynich Manuscript Analysis Program", by Joel
> Youngblom
>
>
http://www.d.umn.edu/~tpederse/Courses/CS5761-SPR04/Projects/youn0512.pdf

Thanks Dana,

I wonder if we know his two main references:

Whitfield (Dec 2003) [could be about Rugg's
hypothesis]
Zadonelia (New Scientist 172 (2001), pp.36-39)
[This one rings a vague bell].

Cheers, Rene





__________________________________
Do you Yahoo!?
Make Yahoo! your home page
http://www.yahoo.com/r/hs
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list


Découvrez le nouveau Yahoo! Mail : 250 Mo d'espace de stockage pour vos mails !
Créez votre Yahoo! Mail