[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Reports of some dead ends




Hi, list,


Over the last weeks, and with the new scans from Beinecke, I've tackled a few approaches to the VM. They turned out to be dead ends, nevertheless I thought I might share my experiences with you -- perhaps you're going to benefit somehow from them. Maybe it's entertaining at least.

My first starting point was that most of the text paragraphs begin with a gallows character. I hypothesized that the gallows might have to do with capital letters. (Pretty much the only thing we can say about paragraph- or sentence-initial letters is that they are capital.)

I assumed that the VM might be a substitution cipher with two glyphs per plaintext letter. If you arrange the alphabet in a table with the capital letters filling the top half of the table, and minor letters filling the bottom, you need somewhere around 50 cells in the table -- eg seven rows and columns each. Now, if every column is denoted by a ciphertext glyph, and every row is, you can address any cleartext letter by a pair of "coordinates". If the gallows denote the four topmost rows of the table, they'd cover all the capital letters, plus a few minor ones: Voila, paragraph beginnings (with capital letters) are coded with a pair of glyphs, one of which is a gallow. Let's assume that the order of the glyphs -- rows first or columns first -- isn't substantial.

So far so good, but I immediately ran into trouble. First of all, pretty obviously there is a fair amount of ciphertext groups composed of an odd number of glyphs. This can be still remedied by assuming that the spacings serve no purpose but to confuse the wannabe codebreaker. The next blow -- more severe -- was the fact that there were triple character sequences of <eee> and <iii>. As long as the glyphs to denote columns are different from the row designations, you would expect a doublet at most -- but no triplets, as are found fairly regularly.

I soldiered on regardless and tried to come to an understanding of row and columns glyphs. If the gallows denoted rows, for example, the immediately following glyph must be a column (at the paragraph beginning). At least one of the neighbors of a column designator must be a row designator, and vice versa. Unfortunately, I very quickly ran into severe inconsistencies. Even if I was very tolerant in my assumptions what actually constituted a glyph -- it just didn't work out.

I then went back to my old transition theory, namely assuming that the ciphertext glyphs don't directly encode letters, but give you instructions how to reach one plaintext letter from the previous one. (Ie, EVA <8> might say "jump 5 letters forward in the alphabet.") Let's assume once more that our letters are arranged in a table, and the gallows serve to "synchronize", ie they give you a fixed point to pick up decoding again. (If you don't have such a synchronisation point, you'll be in big trouble once you lose a single letter on the way...)

In a previous simulation, I had already found that the behaviour of such a code would not be what we observe in the VM. Most notably, you get a very even distribution of transitions, exactly the opposite of the high repetitivity. Also, you'd expect jump sequences to be fairly random, ie a jump sequence "AB" should be about as probably as "BA". Unfortunately, EVA <ch> has 11,000 occurences, while "hc" has 650. <he> occurs 8100 times, while <eh> occurs... four times. It simply didn't work.

Today I gave it another shot when I found somewhere on the web that in some encoding schemes a vowel and the consonant following in the alphabet would be coded with the same ciphertext letter -- in our case, both "a" and "b" might be subsituted with a <9>. That didn't really help with the paragraph-initial gallows, but it might have been an explanation for the triplet sequences of <i> and <e>.

I did a frequency check for German, English, French and Italian texts of the period, counting the relative frequencies of "ab", pairs of "ab"s, triplets and so on. I compared this to the frequencies of <i><i><i> sequences in the VM and so on.

The best match I got was to compare Italian "il" with EVA <e>. (I used "il" instead of "ij" or "ik", because in my text version -- Dante's Divina Comedia -- neither "j" nor "k" were used.) Here are some of the relative frequencies I got, all in percent:

VM: <e>: 10.4 <ee>: 4.548 <eee>: 0.258 <eeee>: 0.0043

Ital: "il": 12.5 "il"2: 3.167 "il"3: 0.238 "il"4: 0.0015

That looked neither too bad nor too convincing. (After all, it wasn't too surprising to find _some_ match in a large sample.)

Unfortunately, if it had been that simple case, the VM should have contained a significant amount of two-letter words with <e>. Eg, "la" and "le" should form frequent <e_> groups, "di" would be <_e>, and "il" should form frequent <ee>s. Which is not observed. Along with the fact that my statistical match wasn't overwhelmingly good (especially the fact that values are scattered both above and below), I now tend to think this was another dead end.

That's it for tonight, back to the European soccer championships.

Tallyho,

Elmar

--
Elmar Vogt / Königswarterstr. 18 / 90762 Fürth / GERMANY
elvogt@xxxxxxxxxxx / Tel.: (++49/0)911 - 31 52 58
Agilmar von Sevelingen: VIS VISCERIS NON FERRE FERTUR (T.Doom)

"'Schmetterling'... sounds like a German WW II fighter plane" (Monkey boy)

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list