[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Chinese (Doubled words)
> [Philip Neal:] The scenarios you give are clearly possible, and
> I do not dismiss the Chinese theory entirely, but I think it is
> much more probable that a group of Europeans enciphered secret
> knowledge about herbs, times of conception etc for their own
> private use.
For sure, the Human Rights declaration gives to every man the right to
pick his own a prori probabilities; and then Bayes gives him the
right to ignore an arbitrary amount of opposing evidence.
Short of finding an "obviously correct" solution, I don't know
how I could change people's "gut feelings", nor why I should try to.
Still, it is strange that people find the Crypto Theory more likely
than the Chinese Theory. Scores of professional and amateur
cryptographers have tried to crack the "code" for almost 90 years, and
have made absolutely *zero* progress. Worse, the crpto camp cannot
even explain away the many arguments that point to the VMS *not* being
a code.
I can understand that people are reluctant to look at East Asia when
there is nothing obviously Chinese in the pictures or texts (at least,
not if you look at it in the wrong way... More on that later). But
what should we make of the natural-looking Zipf plots, the statistics
on figure labels, and the binomial word-length distribution? These
features are strong arguments against any character-level,
Vigenère-style code. If we exclude the Chinese Theory by axiom, the
only other alternative that I can think of is a word-level,
codebook-based system. Why hasn't *that* been discussed in the list?
There is also the matter of the peculiar word structure. I understand
that my description of it is not as clear and succint as it could be,
and people may be put off by the comparison to East Asian word
(syllable) structure. But the word structure is there, and demands
*some* explanation. A Roman-like number system could be another
possibility; shouldn't that be looked into?
> I have taken a look at some Chinese character text (Sun Tzu and
> the first chapters of the Red Chamber). The ten most common
> characters in Sun Tzu are not reduplicated in the entire text,
> and of the ten most common characters in the Red Chamber only
> one is reduplicated once in the sample (that character is yi1,
> 'one').
>
> It remains to be seen how common reduplication is in a phonetic
> rendering of a text in Chinese or a similar language. I am not
> certain that it will be sufficient to rewrite a literary
> character text in pinyin and make a count.
The most common *characters* in the Red Mansion, apart from punctuation,
seem to be grammatical particles. Here are my illiterate, dictionary-based
guesses at their meanings:
"ÁË" le past / completion indicator
"µÄ" de posessive postposition (like English "'s")
"²»" bu4 negation
"Ò»" yi1 "one", "a"; also part of "all", "together", "once", etc.
"À´" lai2 "come", "arrive"
"µÀ" dao4 past / completion indicator
The characters are given in the GB encoding, not Big5.) The first two
particles are toneless, meaning that they can assume several tones
depending on the preceding syllable.
Indeed those particles should not repeat. However, they all have
several homophones:
le past / completion indicator
le4 "ribs"
le4 "bridle", "force", "tighten"
le4 "music", "happy", "joyful"
de possessive postposition
de adverbial subordinative conjunction
de2 "get", "obtain", "gain", "need", "must"
de2 "virtue", "righteousness"
Each homophone is denoted by a different character in the Chinese
(Big5 or GB) file, but they would all be rendered in the same way in a
phonetic transcription. Thus the phrase "virtue's", for example, would
not be a repeat in the former, but may be in the latter (That's a
guess too --- I don't know how the possesive "de" is pronounced here.)
That may explain your observations: the most common characters in the
Chinese file don't form repeats (because they are grammatical
particles), while the most common words in the VMS often do (whenever
a common particle appears next to a homophone, as in "virtue's").
On the other hand, the same toneless particle will map to two or three
different words in the phonetic text, even if there are no
transcription mistakes or multiple equivalent spellings. Thus perhaps
"daiin", "dain", "dy" are all forms of the same Chinese particle like
"de" or "le"...
All the best,
--stolfi