[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Count of words with gallows, mantle, and "ed"
For the record, here are some additional word statistics per section.
In all cases, the input data is the majority-vote text derived from
the EVA interlinear transcription, after discarding all lines that
contain unreadable characters or stalemates (characters without
absolute majority agreement).
Labels were collected in the pseudo-section "lab.n", and excluded
from all other sections. The pseudo-section "txt.n" is the union of all
the others, labels excluded.
Some traditional sections were split in two or more subsections,
to highlight their dispersion within the manuscript. The herbal
sections were split also according to Currier's A/B subsets.
Sections "unk.1" to "unk.8" are isolated pages or folios
with uncertain classification.
The "count" columns give the number of tokens (word instances)
without and with the feature in question. The "freq." columns give
the corresponding frequencies, relative to the section's total.
In each table, the sections are separated into "large" (> 500 tokens)
and "small" (< 500 tokens), and each group is sorted by the
with:without ratio.
First, the token counts for PRESENCE OF CORE LETTERS
(a.k.a. "gallows letters" - EVA {k,t,p,f},
possibly with <c-h> or <i-h> pedestal)
WITHOUT WITH
----------- -----------
sect. count freq. count freq. pages
----- ----- ----- ----- ----- ---------------------------
txt.n 11850 0.490 12330 0.510 text(all)
lab.n 372 0.384 596 0.616 labels(all)
pha.1 304 0.588 213 0.412 f88r-f89v1
hea.2 322 0.529 287 0.471 f87r-f87v,f90r1-f90v1,f93r-f93v,f96r-f96v
hea.1 3112 0.525 2819 0.475 f1v-f49r(A),f51r-f56v(A)
pha.2 376 0.523 343 0.477 f99r-f102v1
bio.1 2450 0.510 2350 0.490 f75r-f84v
cos.2 318 0.502 316 0.498 f67r1-f70r2
heb.1 1047 0.459 1234 0.541 f26r-f48v(A),f50r-f57r(B),f66v
str.2 3043 0.444 3803 0.556 f103r-f108v,f111r-f116r
cos.1 21 0.656 11 0.344 f57v
unk.2 66 0.550 54 0.450 f49v
unk.1 76 0.535 66 0.465 f1r
cos.3 118 0.529 105 0.471 f85r2-f86v4,f85v2,f86v3
str.1 159 0.485 169 0.515 f58r-f58v
unk.5 62 0.470 70 0.530 f85r1
unk.4 115 0.469 130 0.531 f66r
unk.6 49 0.430 65 0.570 f86v6
unk.7 62 0.425 84 0.575 f86v5
zod.1 19 0.422 26 0.578 f70v2-f73v
heb.2 121 0.420 167 0.580 f94r-f95v1
unk.3 10 0.357 18 0.643 f65r-f65v
unk.8 0 0.000 0 0.000 f116v
Here are the statistics for PRESENCE OF MANTLE LETTERS. These are the
EVA letters <sh> and <ch> (a.k.a. "tables" or "chairs"), plus some
relatively rare groups that seem to be variants or misreadings of the
same (chiefly <ee>, and a few <se>, <es>, and single <e> not following
another core or mantle letter.)
The <c-h> and <c-he> gallows pedestals were NOT counted as mantle letters.
(With this convention, there seems to be practically no correlation
between the number of core letters and the number of mantle letters
in the word. If gallows platforms are counted as mantle letters,
there seems to be a small positive correlation between the two counts.
Obviously I do not know which one (if any) is the "correct" choice.)
WITHOUT WITH
----------- -----------
sect. count freq. count freq. pages
----- ----- ----- ----- ----- ---------------------------
txt.n 12200 0.505 11980 0.495 text(all)
lab.n 652 0.674 316 0.326 labels(all)
heb.1 1309 0.574 972 0.426 f26r-f48v(A),f50r-f57r(B),f66v
pha.1 290 0.561 227 0.439 f88r-f89v1
pha.2 395 0.549 324 0.451 f99r-f102v1
bio.1 2540 0.529 2260 0.471 f75r-f84v
cos.2 329 0.519 305 0.481 f67r1-f70r2
hea.2 304 0.499 305 0.501 f87r-f87v,f90r1-f90v1,f93r-f93v,f96r-f96v
hea.1 2803 0.473 3128 0.527 f1v-f49r(A),f51r-f56v(A)
str.2 3218 0.470 3628 0.530 f103r-f108v,f111r-f116r
cos.1 23 0.719 9 0.281 f57v
unk.7 94 0.644 52 0.356 f86v5
unk.3 18 0.643 10 0.357 f65r-f65v
str.1 204 0.622 124 0.378 f58r-f58v
unk.1 85 0.599 57 0.401 f1r
unk.6 67 0.588 47 0.412 f86v6
heb.2 165 0.573 123 0.427 f94r-f95v1
cos.3 115 0.516 108 0.484 f85r2-f86v4,f85v2,f86v3
unk.5 65 0.492 67 0.508 f85r1
unk.4 120 0.490 125 0.510 f66r
zod.1 22 0.489 23 0.511 f70v2-f73v
unk.2 34 0.283 86 0.717 f49v
unk.8 0 0.000 0 0.000 f116v
Finally, the statistics for the ED-GROUP (EVA <ed>)
which Rene proposed as an indicator of "language evolution":
txt.n 21235 0.859 3499 0.141 text(all)
lab.n 917 0.939 60 0.061 labels(all)
hea.1 5925 0.999 6 0.001 f1v-f49r(A),f51r-f56v(A)
hea.2 607 0.997 2 0.003 f87r-f87v,f90r1-f90v1,f93r-f93v,f96r-f96v
pha.2 717 0.997 2 0.003 f99r-f102v1
pha.1 512 0.990 5 0.010 f88r-f89v1
cos.2 622 0.981 12 0.019 f67r1-f70r2
heb.1 1957 0.835 387 0.165 f26r-f48v(A),f50r-f57r(B),f66v
str.2 5653 0.793 1476 0.207 f103r-f108v,f111r-f116r
bio.1 3505 0.708 1444 0.292 f75r-f84v
unk.1 142 1.000 0 0.000 f1r
unk.2 120 1.000 0 0.000 f49v
str.1 333 0.988 4 0.012 f58r-f58v
zod.1 44 0.978 1 0.022 f70v2-f73v
unk.3 27 0.964 1 0.036 f65r-f65v
unk.7 146 0.954 7 0.046 f86v5
cos.1 29 0.906 3 0.094 f57v
heb.2 263 0.889 33 0.111 f94r-f95v1
unk.6 111 0.874 16 0.126 f86v6
cos.3 199 0.869 30 0.131 f85r2-f86v4,f85v2,f86v3
unk.4 208 0.849 37 0.151 f66r
unk.5 115 0.777 33 0.223 f85r1
unk.8 0 0.000 0 0.000 f116v
CAVE ASINUM - do not trust my statistics blindly!
I have been known to make mistakes occasionally 8-/
All the best,
--stolfi