[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: VMs IS NOT Welsh
At 13:59 24/01/03 +0100, Bernd Neuner wrote:
The results of the paper recently mentioned by Rene (Akinori Ito,
Observation of left and
right entropy in Voynich MS) clearly exclude that possibility (IMHO).
Though my knowledge
of Welsh is rather shallow, I would expect that every Welsh word would -
as in other
European languages - influence its immediate surroundings, that is,
the probability of being preceded/followed by certain other words.
Example: Verbs tend to be follwed rather by non verbs.
The Ito paper points out that this is not the case in the VMS. In other
left/right word entropies suggest that the VMS symbol groups are virtually
Am I missing something?
Some quick thoughts on the interface between cryptography and statistics:-
Based on the common obfuscation trick used by the (very few) known
cipherbets from 1440-1460 which have the same "4" character as the VMS, I'd
say this: if the author of the VMS' cipherbet was the same person, there is
a high (I'd say more than 90%) chance that the VMS also uses this exact
That is, as "4" and "4o" code for separate symbols in the other cipherbets
where they appear, so too : similarly, ["o" + gallows] could well code for
a separate symbol (whether a letter, index, or coding action) to plain
[gallows]. This trick has two effects:-
(1) the size of the apparent alphabet is smaller than the underlying alphabet
(2) statistical analysis results become sharply less useful than on "pure"
IMO, the problem with most letter-based statistical analyses done to date
on the VMS is that they're merely superficially "rattling the bars of the
cage" - that is, they're based on the (reasonable, but probably wrong)
inference that, because of the clarity of the VMS's apparent alphabet,
there's a one-to-one relation between the characters and the underlying
I'm also highly suspicious of inferences based on "words" in the VMS, as I
strongly suspect that not only do the apparent word "shapes" look a lot
like a kind of embellished/obfuscated numbering system, but that the
apparent length of VMS words also (as has been discussed many times before)
has a non-"word"-like distribution.
Akinori Ito's paper would seem to support the (widely-held and long-held)
view here that the code-stream' statistics don't look much like any
existing language - but to the degree that it's still operating inside the
cage built for us by the code's creator, it's not moving us forward.
In short: I think multiple letters frequently code for a single symbol, and
that words are very frequently embellished numbers - so statistical
analyses that don't try to break out of these twin traps *simultaneously*
are very likely to be unhelpful.
But the question then arises: what kind of analysis *would* help?
Obviously, if it has any structure at all, then viewing it as a (probably
complex) Markov chain might well be a decent starting point: but even so,
this will require a great deal of insight to even get started - to date,
I've had no great luck trying to model the numbering system used (which I
believe is some kind of embellished Roman numerals), but perhaps some kind
of state machine analysis might help move this forward.
Here's a suggestion: perhaps using Markov model analysis to find
transitions in numbering patterns that *never* occur, and to then build up
some kind of constraint analysis from that?
Conventional statistical tools wouldn't be much good at that, though. :-(
Cheers, .....Nick Pelling.....
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: