[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: qo-words MORE

Hi Nick,

On Wed, 12 Mar 2003 10:34:57 +0000
Nick Pelling <incoming@xxxxxxxxxxxxxxxxx> wrote:

> My own central hypothesis is that, whereas almost all codes/ciphers of the 
> 15th Century were designed by *cryptographers*, the VMS was designed by a 
> *cryptologist*, whose key design decision was to make direct statistical 
> analysis unrevealing.

I'm not an expert of cryptgraphy, so can you tell me if 15th Century
cryptlogists knew about attack through statistical analysis?
For several month (since I found information about VMS four months ago) I have
wondered why could VMS be statistically abnormal in such a way. It looks
very natural at a glance, it seems to obey Zipf's Law, but its contextual
property is just nonsense.

> I therefore think that blind (ie, non-theory-driven) statistical assault 
> will most likely not be helpful - and it would seem the last 90 years of 
> effort supports this idea. :-(

I agree. My analysis was theory-driven, that was: VMS `words' are really 
words. (No theory is a kind of theory :-)
This assumption was rejected by the observation. Now, there seems to be
the following (or more) possibilities about VMS `words':

(1) VMS is just a bunch of nonsense. (I don't want to believe it)
(2) Word order is shuffled in some way, as someone pointed out in this list
    (I tested it by shuffling English text. The contextual property of the
     randomly shuffled text was very similar to that of VMS)
(3) Some meaningless garbage characters are mixed into words.
    (For example, i/ii/iii are identical)

> (1) Where are the numbers in the text? Has there been a proper statistical 
> assault looking for candidate numbering systems in the VMS?
> The internal properties of the VMS' numbering systems might be 
> statistically distinctive:-
> *	It would probably have a "generative structure" - ie, its layout
> 	would be rule-based, giving a large number of closely-related
> 	[yet very slightly different] "words".

In fact, almost all Voynichese words looks generative. But its production
rules (by Jorge Stolfi) looks very different from that of, say, roman numerals.
The rules might contain roman-numeral-like subrules in them. Worth checking.

> (2) One theory is that the VMS is an "embellished code" - ie, that many of 
> its "words" are actually number indices into a code-book (which is perhaps 
> hidden in the star paragraphs at the back, or simply lost), but written in 
> a way that makes them not obvious.
> Here, most of the VMS would be comprised of indices: but it would require a 
> special "shift" code to indicate when the number following the code is 
> really a number (and not an index)... and I suspect that "q" is the shift-code.
> One way of testing this might be to look for differences in (Zipf-style) 
> distribution between "q"- words and their related "non-q" words.

This idea (q as the shift character) doesn't seems to be true, from the 
recent discussion about qo- words. But anyway, code-based explanation needs
some shift mechanism, or word-based statistical analysis could revealed
the text's statistics because word-based coding without shift doesn't change
the text's statistical property. Gallows as shift characters ((6) in your mail)
seems to be a possible hypothesis.

> (4) Another idea is that labels might have a different internal structure, 
> to prevent direct crpyptological assaults on them - ie, they may be 
> anagrams etc. What are the statistical differences between labels and other 
> similar-length words in the corpus?

IMO, as number of label words is not large, their statistical analysis
will be difficult. 

Akinori Ito
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list