[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: More on label anomalies
> [Gabriel:] I got the following surprise(in %) So almost half the
> labels (if <q> and <o> are not strictly part of the word) start
> with <k> or <t>. Those % are so close...
Indeed.
This phenomenon is not confined to labels. In the text too,
almost exactly half of the words have gallows, and half haven't.
Ditto for "chairs" (EVA <ch>, <sh>, <ee>). Moreover
these two traits are independent (25% have gallows and chairs,
25% have gallows and no chairs, etc.)
> Is <k>=<t>?
Probably not...
> If so, why half the labels start with <k>|<t>?
In the language of the layer model, when the core (gallows) is
non-empty, the crust and mantle prefixes are usually empty. So if you
delete the <q>s and <o>s, the gallows will be found mostly at the
beginning of the words. Since half of the labels have gallows, it is
not surprising that half of the trimmed labels begin with gallows.
OK, but why do half of the labels have gallows? See my "word length
distribution" page for a possible answer.
I now suspect that the "words" are numbers, in some original system
resembling Roman numerals. As said in that page, if you take the Roman
numerals from 1 to a few hundred (or, in fact, any random sample, deleting
repetitions), you will find that almost exactly half of them will have
a "V" digit, and half won't; half will have an "L" digit, and
half won't; and the two traits are independent.
The equal-frequency penomenon should be much clearer for words
(ignoring repetitions) than for tokens, because statistics for tokens
are usually skewed by a handful of extremely frequent words. And
indeed that seems to be the case for the main text.
You just found out that it holds for labels too. Well, one thing that
distibguishes labels from the main text is that there are very few
repeated labels. Hence, for them, the token-based statistics
should indeed be similar to the word-based ones.
The "Chinese hypothesis" is in serious trouble now. To rescue it, I
would have to show that there are certain binary phonetic traits that
do occur independently and equitably in the lexicon of Chinese (or
some other plausible monosyllabic language), just like gallows and
chairs do in the Voynichese lexicon. In other words, I must show that
the candidate language makes near-optimal use of its available sounds.
Perhaps...
> The fraction of <q|o|qo> starting labels
> only(458) has the following distribution of the
> 2nd or 3rd character.
>
> t 34.72
> k 32.97
...which add to about 67%, or 2/3. Not surprisingly, another
surprise... 8-)
I suspect that there is a big and obvious truth jumping up and down
and screaming right in front of us, but somehow we cannot see it.
I predict that we will all feel rather ashamed of ourselves in
the not too distant future. 8-)
All the best,
--stolfi