Great Stuff!
If 'o' is a definite article, could 'q' preceding
'o' be a plural form. Thus La/Le and Les prefixed nouns. This could explain why
there are more occurrences of 'qo' in tokens than in labels. You are more likely
to list 'single' items, but will refer to them in the plural when talking about
them in a general sense.
Initial 'c' or 'e' is
possibly over-ridden by a 'ch' in word - or perhaps syllable-initial
position.
John.
----- Original Message -----
Sent: Sunday, March 04, 2001 10:00
AM
Subject: Labels o and q.
Hi all, As you may remember, somebody (?) noted that there
is a tendency of the labels to start with <o>. I just had a look at
the 676 labels which had no unambiguous first characters (the total number of
labels according to my count - which may not be correct and has not been
double checked- is 684). I also counted the distribution of first
characters in tokens (again those with unambiguous characters) and subtracted
the label counts to obtain a label-less token count.
The distribution
(in %) of 1st characters is as follows: (sorted descending by the frequency
of tokens).
Char Tokens Labels o
21.92 67.01 c 18.22 6.66 q 14.13 0.44 s 11.99 5.92 d 9.59
7.25 a 5.78 2.37 y 4.93 6.80 l 3.67 0.44 k 3.40 1.18 t 2.71
0.89 p 1.45 0.15 r 1.27 0.15 e 0.40 0.74 /color>f 0.32 0. i
0.07 0. x 0.05 0. g 0.04 0. v 0.03 0. m 0.03 0. n 0.01 0. j 0.01 0. u
0.00 0. z 0.00 0.
The two largest
differences seem to be the excess of <o> and the lack of <q> as
initial characters.
I remember that it has been suggested before that
<o> may be an article preceding a noun. This could well be the
case.
I also remember that <q> has been suggested to act as "and"
or "&" joining from the previous line. The lack of <q> in the labels
(only 3 have them) seems to fit nice with that idea too.
Also <c>
appears less than expected; I had a look at the labels they are all
<ch>+something except 2 labels which start with
<cph>.
I am not sure whether the following means anything, but if
we leave out those labels starting with <o> and <q>, then the
distribution seems to get a bit closer to that of the tokens. Note that by
doing this I ignored/color> 67% of the labels...
char
tokens Labels c 28.48 20.45 s 18.75 18.18 d 14.99 22.27 a 9.03
7.27 y 7.71 20.91 l 5.73 1.36 k 5.32 3.64 t 4.23 2.73 p 2.27
0.45 r 1.98 0.45 e 0.63
2.27 ...
Still there are too many
<y>, too few <l>, etc.. /color>A few cells are empty or <5
items in it to perform a chi-squared test with confidence...
What I should have done (and will do
later) is to count the labels by the 2nd letter if they start with <o>
or <q>, and look at the result {i.e do they produce valid voynich
words?} but this brings another problem as some labels have ambiguous 2nd
character. Anyway I thought that what I found so far would be of
interest. All comments are
welcome.
Regards, Gabriel
|