[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: More on label anomalies



On 10 Mar 2001, at 20:25, Jorge Stolfi wrote:
> In the language of the layer model, when the core (gallows) is
> non-empty, the crust and mantle prefixes are usually empty. So if you
> delete the <q>s and <o>s, the gallows will be found mostly at the
> beginning of the words. Since half of the labels have gallows, it is
> not surprising that half of the trimmed labels begin with gallows.

I still like the idea of <q> being a conjunction (perhaps "and") so I 
counted the number of strings there are (all lines, labels, blocks, 
circles and scattered text). In total these are 4592 strings. Of 
those, 539 start with <q> (11%). I also counted 769 "end of 
paragraph" marks, but (here is the interesting bit) it is unusual for a 
<q> word to be first in the paragraph (only 31 instances or 4%); 
and also unusual to be first word in the first paragraph (only 4 times 
or 0.52%).

We know that paragraphs commonly start with <p> or <f>, so 
there is no big surprise that other characters as paragraph-initial 
are in a low proportion, but this low??
In some languages (like Spanish, I guess the same in 
Portuguese/Italian/French) is not good practice to start a sentence 
with conjunctions (in Japanese though, it is quite common), so 
perhaps the distributions observed give some support for <q> being 
a single word with a special function...

Here is the list of paragraph-initial-<q> words

<f10v.4>       qotchytor.shoiin.daiin...
<f23r.4>       qokoldy.okaiir.ykaiil,g...
<f34r.5>       qoteedy.shedy.shedy....
<f37v.8>       qotor.choiin.chetchy...
<f45v.5>       qotol.choiin.okchar...
<f49v.18>      qotcho.cheol.chol,s...
<f76v.37>      qoeedy.lchedy.cheeb...
<f77v.1>       qetedy.shedy.qotol...
<f78v.20>      qofcheol.opchedy.qokain...
<f82r.1>       qosheedy.qokeol.daiin...
<f82v.29>      qody.shar.a(ith)y...
<f83r.25>      qokeedy.qolchey.qokeey...
<f83v.19>      qokeed.qokaiin.sheolkain...
<f84r.10>      qotchsdy.ykeedy.qokal...
<f89r1.1>      qoar.shar.qopcholy...
<f89r2.1>      qokcheody.cheodal.dair...
<f99v.9>       qokeeoy.chokal.qokeeo...
<f103r.27>     qokechy.okeey.qokeey...
<f103r.33>     qokeey.chechy.qokey...
<f103r.35>     qokeear.chain.olain...
<f103r.40>     qokeey.sheeol.shckhy...
<f103r.45>     qokeedy.qokeedy.shol...
<f103v.17>     qokeedy.chedy.qoteey...
<f108r.21>     qolshy.qoeedy.lkeal,shedy...
<f108v.23>     qokeeor.okeey.qoeey...
<f111r.13>     qosheo.lchdy.lshedy...
<f111r.20>     qokeey.qokeey.lchedy...
<f111v.16>     qokain.sheol.qokain...
<f111v.23>     qokaiin.sheckhy.qokar...
<f112r.23>     qoain.qoiin.olcheedy...
<f116r.24>     qokedy.okain.chcthy...

And here is the distribution of 1st characters in the first paragraph 
of each folio.

char          freq        %
p        93   45.81
t        41   20.20
k        36   17.73
f        12    5.91
c+gallow  5    2.46
o         5    2.46
q         4    1.97
s         3    1.48
d         2    0.99
l         1    0.49
*         1    0.49


again <k> and <t> suspiciously similar and <q> quite low. To tell 
the truth I was expecting <o> (perhaps an article) to be more 
frequent.

> I now suspect that the "words" are numbers, in some original system
> resembling Roman numerals.

Yes, it may be, but perhaps they are numbers with added 
characters (like <o> and <qo>), otherwise there should be no 
reason to see these line-dependent distributions as above.

>     > The fraction of <q|o|qo> starting labels 
>     > only(458) has the following distribution of the 
>     > 2nd or 3rd character.		
>     > 
>     > t  34.72
>     > k  32.97
> 
> ...which add to about 67%, or 2/3.  Not surprisingly, another 
> surprise... 8-)

Yup.

Regards,

Gabriel