[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Paired gallows + entropy...



Hi Steve,

IF every gallows character was a "paired character" and the lower case
'pointers' were just _'single' coded characters_ how would that
affect/effect its overall enthropy for European languages?

I don't have the means to run this Ratio/Percentages question...

Entropy of the text as a whole isn't likely to be a very helpful concept: given that we have no consensus as to what the right "grouping" (of glyphs or strokes into composite characters, whether paired or not), what's probably more important are the relative frequencies (ie the statistical distribution) of the characters in whatever "underlying alphabet" you're interested in.


Certainly, EVA seems unlikely to correspond to the underlying alphabet: and even if you take a more obviously glyph-based view (such as FSG or Strong), there are many possible renderings (is "4o" a glyph or not? is "cc" a glyph or not? etc), and so opinions differ.

Even so, the statistical distribution at that level doesn't really match up with what I'd expect of a natural language (or even of any unnatural language, though others' expectations differ): and so it's difficult to see how it could be a monoalphabetic cipher, even locally - even if you don't know precisely how the two match, if their curves match then you can make some pretty shrewd guesses, and hence quite possibly crack the cipher (without trying too hard).

The problem with the VMS is that - compared to European languages - its statistical distribution looks too steep. That is to say, it's common letters appear much more common than European languages common letters, and so forth.

The more you pairify frequent letters (like "o" + "t" --> "[ot]"), the "flatter" the statistical distribution becomes (ie the more the peakiness reduces) - but note that the more choices (and pairs) in your proposed underlying alphabet, the longer the tail on the stats curve becomes, which is also wrong.

Perhaps there's a "sweet spot" between FSG and a largely pairified alphabet which matches a European language - if so, I haven't found it. However, the lack of structure above the pair-level structure points (to my eyes, at least) to some higher level variation in the coding system, so I must admit I'm not looking very hard there.

If the plaintext language is mostly Italian (as I suspect), then I'd expect (for example) such items as "ch" and "pr" to be encoded as additional tokens, and hence the distribution curve of the remapped plaintext to be different again. Even so, comparing the curve for a pairified VMS text (which you don't know) with the curve for such a remapped plaintext (which you also don't know) would necessarily be a bit of a black art. Perhaps that's the point of the VMS' cipher - uncertainty about both the source alphabet and the target cipherbet (but in different ways) would make it difficult for cryptologists to home in on either with any certainty.

So: the answer you're looking for is - pairifying the gallows (into separate characters) does flatten out the stats, making them look a bit more like a European language... but this doesn't appear to be the whole story, not by a long way. Perhaps it's a step in the right direction, though. :-)

Cheers, .....Nick Pelling.....


______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list