[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: qo-words MORE



Dear Akinori,

I'm not an expert of cryptgraphy, so can you tell me if 15th Century
cryptlogists knew about attack through statistical analysis?

Yes - in the very early 15th Century, Italian code-makers started to use multiple characters for vowels (and for frequently used letters in general). This is an indication that code-breakers were using vowels as a "lever" to break open the code, and that code-makers were responding - it was a cryptographic "arms race" back then. This corresponds to "attack through tacit statistical analysis".


In 1474, Cicco Simonetta wrote the first known (though quite short) specifically cryptologic paper: "Regule ad extrahendum litteras ziferatas, sine exemplo" - which corresponds to "attack through explicit statistical analysis". You can see it here (in the Ciphers section):-
http://www.library.yale.edu/Ilardi/il-toc.htm


Cicco Simonetta was quite an extraordinary man - like Vladimir Putin (and George Bush Sr), he was a statesman who'd reached the top having originally run a state's secret service (Milan's, in Simonetta's case, which necessarily involved a lot of exposure to codes and ciphers). He also became extraordinarily rich - but was executed in 1480 after falling foul of a power struggle within the Sforza family.

My analysis was theory-driven, that was: VMS `words' are really
words. (No theory is a kind of theory :-)
This assumption was rejected by the observation. Now, there seems to be
the following (or more) possibilities about VMS `words':

(1) VMS is just a bunch of nonsense. (I don't want to believe it)
(2) Word order is shuffled in some way, as someone pointed out in this list
    (I tested it by shuffling English text. The contextual property of the
     randomly shuffled text was very similar to that of VMS)
(3) Some meaningless garbage characters are mixed into words.
    (For example, i/ii/iii are identical)

Perhaps just as important is the observation that the apparent word-length has an artificial-looking distribution that you probably wouldn't get from real languages - this has been discussed quite extensively on-list in the past.


For me, when you combine (a) the extremely small alphabet, (b) the tendency for certain letters to appear at the beginning and end of "words" and (c) the artificial word-length stats, it seems to imply one hypothesis quite strongly: that spaces are probably inserted in a stream of characters by following some kind of *encoding rule*.

I'm thinking of a superficial (ie, non-semantic) rule like: "insert a space after <in> or <ir> (etc), or before <q>, <of> or <f> (etc)... if it looks nice."

This kind of thing would give the apparent (but misleading) structure to the text that we see - and that (hence) VMS "words" are merely superficial coding artefacts, and have no intrinsic meaning.

That's not to say that there probably (IMO) isn't a deep structure to Voynichese - rather, that spaces are designed both to beautify the text and to misdirect code-breakers, and that the deep structure lies elsewhere. :-)

Here are some possible ideas to test this general hypothesis:-
(1) Given a table with Currier-style entries on both axes, representing pairs of letters [A,B], what is the ratio of (# of <AB> instances) to (# of <A B> instances) in the text? ie, given a left context and a right context, how likely is it that an artificial space would be inserted between them?
(2) Given a corpus of VMS text with spaces removed, what proportion of the spacing as observed can be generated by a simple set of (purely letter-based) rules? ie, how much *generative* information is in the spaces?
(3) Are there statistical differences in the role of "space" between sections, between languages, or between individual pages?


Plenty to think about! :-)

Best regards, .....Nick Pelling.....

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list