[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: Reports of some dead ends
Hi, list,
Over the last weeks, and with the new scans from Beinecke, I've tackled a
few approaches to the VM. They turned out to be dead ends, nevertheless I
thought I might share my experiences with you -- perhaps you're going to
benefit somehow from them. Maybe it's entertaining at least.
My first starting point was that most of the text paragraphs begin with a
gallows character. I hypothesized that the gallows might have to do with
capital letters. (Pretty much the only thing we can say about paragraph- or
sentence-initial letters is that they are capital.)
I assumed that the VM might be a substitution cipher with two glyphs per
plaintext letter. If you arrange the alphabet in a table with the capital
letters filling the top half of the table, and minor letters filling the
bottom, you need somewhere around 50 cells in the table -- eg seven rows and
columns each. Now, if every column is denoted by a ciphertext glyph, and
every row is, you can address any cleartext letter by a pair of
"coordinates". If the gallows denote the four topmost rows of the table,
they'd cover all the capital letters, plus a few minor ones: Voila,
paragraph beginnings (with capital letters) are coded with a pair of glyphs,
one of which is a gallow. Let's assume that the order of the glyphs -- rows
first or columns first -- isn't substantial.
So far so good, but I immediately ran into trouble. First of all, pretty
obviously there is a fair amount of ciphertext groups composed of an odd
number of glyphs. This can be still remedied by assuming that the spacings
serve no purpose but to confuse the wannabe codebreaker. The next blow --
more severe -- was the fact that there were triple character sequences of
<eee> and <iii>. As long as the glyphs to denote columns are different from
the row designations, you would expect a doublet at most -- but no triplets,
as are found fairly regularly.
I soldiered on regardless and tried to come to an understanding of row and
columns glyphs. If the gallows denoted rows, for example, the immediately
following glyph must be a column (at the paragraph beginning). At least one
of the neighbors of a column designator must be a row designator, and vice
versa. Unfortunately, I very quickly ran into severe inconsistencies. Even
if I was very tolerant in my assumptions what actually constituted a glyph
-- it just didn't work out.
I then went back to my old transition theory, namely assuming that the
ciphertext glyphs don't directly encode letters, but give you instructions
how to reach one plaintext letter from the previous one. (Ie, EVA <8> might
say "jump 5 letters forward in the alphabet.") Let's assume once more that
our letters are arranged in a table, and the gallows serve to "synchronize",
ie they give you a fixed point to pick up decoding again. (If you don't have
such a synchronisation point, you'll be in big trouble once you lose a
single letter on the way...)
In a previous simulation, I had already found that the behaviour of such a
code would not be what we observe in the VM. Most notably, you get a very
even distribution of transitions, exactly the opposite of the high
repetitivity. Also, you'd expect jump sequences to be fairly random, ie a
jump sequence "AB" should be about as probably as "BA". Unfortunately, EVA
<ch> has 11,000 occurences, while "hc" has 650. <he> occurs 8100 times,
while <eh> occurs... four times. It simply didn't work.
Today I gave it another shot when I found somewhere on the web that in some
encoding schemes a vowel and the consonant following in the alphabet would
be coded with the same ciphertext letter -- in our case, both "a" and "b"
might be subsituted with a <9>. That didn't really help with the
paragraph-initial gallows, but it might have been an explanation for the
triplet sequences of <i> and <e>.
I did a frequency check for German, English, French and Italian texts of the
period, counting the relative frequencies of "ab", pairs of "ab"s, triplets
and so on. I compared this to the frequencies of <i><i><i> sequences in the
VM and so on.
The best match I got was to compare Italian "il" with EVA <e>. (I used "il"
instead of "ij" or "ik", because in my text version -- Dante's Divina
Comedia -- neither "j" nor "k" were used.) Here are some of the relative
frequencies I got, all in percent:
VM: <e>: 10.4 <ee>: 4.548 <eee>: 0.258 <eeee>: 0.0043
Ital: "il": 12.5 "il"2: 3.167 "il"3: 0.238 "il"4: 0.0015
That looked neither too bad nor too convincing. (After all, it wasn't too
surprising to find _some_ match in a large sample.)
Unfortunately, if it had been that simple case, the VM should have contained
a significant amount of two-letter words with <e>. Eg, "la" and "le" should
form frequent <e_> groups, "di" would be <_e>, and "il" should form frequent
<ee>s. Which is not observed. Along with the fact that my statistical match
wasn't overwhelmingly good (especially the fact that values are scattered
both above and below), I now tend to think this was another dead end.
That's it for tonight, back to the European soccer championships.
Tallyho,
Elmar
--
Elmar Vogt / Königswarterstr. 18 / 90762 Fürth / GERMANY
elvogt@xxxxxxxxxxx / Tel.: (++49/0)911 - 31 52 58
Agilmar von Sevelingen: VIS VISCERIS NON FERRE FERTUR (T.Doom)
"'Schmetterling'... sounds like a German WW II fighter plane" (Monkey boy)
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list