Cribs. Let me qualify that. If you have even only shallowly dug into
the archives, you will have noticed several long discussions on the
entropy of texts. Entropy is a measure of the unpredictability of "what
follows". Hawaiian for instance has a very low entropy because whenever
you hit a consonant you know that the next letter/sound is going to
be a vowel, and there are only five of them: a, e, i, o, and u.
Very predictable.

English on the other hand... if you see a 'g' almost anything can
follow, and 'h', an 'l', an 'r', another 'g', and so on. Much less
predictable: higher entropy.

So the computation of the entropy helps narrow down the range of
possible languages (assuming the code is a simple substitution one).

What if the code is not a simple substitution?  The effect of
good encipherment schemes is to raise the entropy. A text
enciphered with a very secure algorithm will look completely
random: no way of predicting what the next letter/symbol is.

And now to pairs. Imagine a very secure cipher. The cipher
text is random, therefore its entropy is maximal. Now replace
each letter with a pair, 'a' with (say) 'ba', 'b' with 'to', etc.

Suddenly, the second-order entropy drops drastically (the cipher
text has, superficially, become much like Hawaiian: consonant, vowel,
consonant, vowel...). _But_ its third and fourth-order entropies
remain unchanged, those of a completely random text.

That was one example of the purpose and use of the statistical
analyses we have been indulging in for the past 13 years (yes,
thirteen years).

Has it occurred to you that you have just described shorthand writing?

>In any case enciphering phonemes rather than letters would radically
>alter a language's visual appearance, and its unique statistics would no
>longer be valid (EG "e" may be the most common letter in english but the
>most common phoneme is /t/).

I have recently, perhaps a month ago, posted again here the English
adaptations of two or three articles by Boris Viktorovich Sukhotin,
which address all these questions in general terms.

No. The "Chinese hypothesis" alone would require producing frequency
tables of several hundred Chinese dialects, most mutually unintelligible.
I have a comparative dictionary of (modern) Chinese dialects, but no texts.
Even if there were texts, those should be in the dialects as they
were 500 years ago, when the VMs was likely written. We have nothing.
Chinese was always written in wen2yan2 (Classical Chinese), never in the
dialectal forms, and we do not know with any degree of certitude how these
were pronounced, nor how many there were (many must have become extinct,
many arisen since).

The situation is the same for most other possible candidates. At any
rate, it does not take any statistical analysis to realise that the
VMs, if in a simple substitution cipher, cannot possibly be in Gaelic,
nor in Nahuatl, but just might, just might, be in Malay (or some other
Austronesian language), and very possibly in a Chinese-type sort of
language.

Nothing is ignored in our current transcription system.

Frogguy (do a google search), a.k.a. Jacques Guy

