[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: More on label anomalies
On 10 Mar 2001, at 20:25, Jorge Stolfi wrote:
> In the language of the layer model, when the core (gallows) is
> non-empty, the crust and mantle prefixes are usually empty. So if you
> delete the <q>s and <o>s, the gallows will be found mostly at the
> beginning of the words. Since half of the labels have gallows, it is
> not surprising that half of the trimmed labels begin with gallows.
I still like the idea of <q> being a conjunction (perhaps "and") so I
counted the number of strings there are (all lines, labels, blocks,
circles and scattered text). In total these are 4592 strings. Of
those, 539 start with <q> (11%). I also counted 769 "end of
paragraph" marks, but (here is the interesting bit) it is unusual for a
<q> word to be first in the paragraph (only 31 instances or 4%);
and also unusual to be first word in the first paragraph (only 4 times
or 0.52%).
We know that paragraphs commonly start with <p> or <f>, so
there is no big surprise that other characters as paragraph-initial
are in a low proportion, but this low??
In some languages (like Spanish, I guess the same in
Portuguese/Italian/French) is not good practice to start a sentence
with conjunctions (in Japanese though, it is quite common), so
perhaps the distributions observed give some support for <q> being
a single word with a special function...
Here is the list of paragraph-initial-<q> words
<f10v.4> qotchytor.shoiin.daiin...
<f23r.4> qokoldy.okaiir.ykaiil,g...
<f34r.5> qoteedy.shedy.shedy....
<f37v.8> qotor.choiin.chetchy...
<f45v.5> qotol.choiin.okchar...
<f49v.18> qotcho.cheol.chol,s...
<f76v.37> qoeedy.lchedy.cheeb...
<f77v.1> qetedy.shedy.qotol...
<f78v.20> qofcheol.opchedy.qokain...
<f82r.1> qosheedy.qokeol.daiin...
<f82v.29> qody.shar.a(ith)y...
<f83r.25> qokeedy.qolchey.qokeey...
<f83v.19> qokeed.qokaiin.sheolkain...
<f84r.10> qotchsdy.ykeedy.qokal...
<f89r1.1> qoar.shar.qopcholy...
<f89r2.1> qokcheody.cheodal.dair...
<f99v.9> qokeeoy.chokal.qokeeo...
<f103r.27> qokechy.okeey.qokeey...
<f103r.33> qokeey.chechy.qokey...
<f103r.35> qokeear.chain.olain...
<f103r.40> qokeey.sheeol.shckhy...
<f103r.45> qokeedy.qokeedy.shol...
<f103v.17> qokeedy.chedy.qoteey...
<f108r.21> qolshy.qoeedy.lkeal,shedy...
<f108v.23> qokeeor.okeey.qoeey...
<f111r.13> qosheo.lchdy.lshedy...
<f111r.20> qokeey.qokeey.lchedy...
<f111v.16> qokain.sheol.qokain...
<f111v.23> qokaiin.sheckhy.qokar...
<f112r.23> qoain.qoiin.olcheedy...
<f116r.24> qokedy.okain.chcthy...
And here is the distribution of 1st characters in the first paragraph
of each folio.
char freq %
p 93 45.81
t 41 20.20
k 36 17.73
f 12 5.91
c+gallow 5 2.46
o 5 2.46
q 4 1.97
s 3 1.48
d 2 0.99
l 1 0.49
* 1 0.49
again <k> and <t> suspiciously similar and <q> quite low. To tell
the truth I was expecting <o> (perhaps an article) to be more
frequent.
> I now suspect that the "words" are numbers, in some original system
> resembling Roman numerals.
Yes, it may be, but perhaps they are numbers with added
characters (like <o> and <qo>), otherwise there should be no
reason to see these line-dependent distributions as above.
> > The fraction of <q|o|qo> starting labels
> > only(458) has the following distribution of the
> > 2nd or 3rd character.
> >
> > t 34.72
> > k 32.97
>
> ...which add to about 67%, or 2/3. Not surprisingly, another
> surprise... 8-)
Yup.
Regards,
Gabriel