[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: VMs IS NOT Welsh



Hi everyone,

At 13:59 24/01/03 +0100, Bernd Neuner wrote:
The results of the paper recently mentioned by Rene (Akinori Ito, Observation of left and
right entropy in Voynich MS) clearly exclude that possibility (IMHO). Though my knowledge
of Welsh is rather shallow, I would expect that every Welsh word would - as in other
European languages - influence its immediate surroundings, that is, increase/decrease
the probability of being preceded/followed by certain other words.


Example: Verbs tend to be follwed rather by non verbs.

The Ito paper points out that this is not the case in the VMS. In other words: the
left/right word entropies suggest that the VMS symbol groups are virtually
context-independent.


Am I missing something?

Some quick thoughts on the interface between cryptography and statistics:-


Based on the common obfuscation trick used by the (very few) known cipherbets from 1440-1460 which have the same "4" character as the VMS, I'd say this: if the author of the VMS' cipherbet was the same person, there is a high (I'd say more than 90%) chance that the VMS also uses this exact same trick.

That is, as "4" and "4o" code for separate symbols in the other cipherbets where they appear, so too : similarly, ["o" + gallows] could well code for a separate symbol (whether a letter, index, or coding action) to plain [gallows]. This trick has two effects:-
(1) the size of the apparent alphabet is smaller than the underlying alphabet
(2) statistical analysis results become sharply less useful than on "pure" cipherbets


IMO, the problem with most letter-based statistical analyses done to date on the VMS is that they're merely superficially "rattling the bars of the cage" - that is, they're based on the (reasonable, but probably wrong) inference that, because of the clarity of the VMS's apparent alphabet, there's a one-to-one relation between the characters and the underlying alphabet.

I'm also highly suspicious of inferences based on "words" in the VMS, as I strongly suspect that not only do the apparent word "shapes" look a lot like a kind of embellished/obfuscated numbering system, but that the apparent length of VMS words also (as has been discussed many times before) has a non-"word"-like distribution.

Akinori Ito's paper would seem to support the (widely-held and long-held) view here that the code-stream' statistics don't look much like any existing language - but to the degree that it's still operating inside the cage built for us by the code's creator, it's not moving us forward.

In short: I think multiple letters frequently code for a single symbol, and that words are very frequently embellished numbers - so statistical analyses that don't try to break out of these twin traps *simultaneously* are very likely to be unhelpful.

But the question then arises: what kind of analysis *would* help?

Obviously, if it has any structure at all, then viewing it as a (probably complex) Markov chain might well be a decent starting point: but even so, this will require a great deal of insight to even get started - to date, I've had no great luck trying to model the numbering system used (which I believe is some kind of embellished Roman numerals), but perhaps some kind of state machine analysis might help move this forward.

Here's a suggestion: perhaps using Markov model analysis to find transitions in numbering patterns that *never* occur, and to then build up some kind of constraint analysis from that?

Conventional statistical tools wouldn't be much good at that, though. :-(

Cheers, .....Nick Pelling.....

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list