[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Count of words with gallows, mantle, and "ed"




For the record, here are some additional word statistics per section.  

In all cases, the input data is the majority-vote text derived from
the EVA interlinear transcription, after discarding all lines that
contain unreadable characters or stalemates (characters without
absolute majority agreement).

Labels were collected in the pseudo-section "lab.n", and excluded
from all other sections. The pseudo-section "txt.n" is the union of all 
the others, labels excluded. 

Some traditional sections were split in two or more subsections,
to highlight their dispersion within the manuscript.  The herbal
sections were split also according to Currier's A/B subsets. 
Sections "unk.1" to "unk.8" are isolated pages or folios 
with uncertain classification.

The "count" columns give the number of tokens (word instances)
without and with the feature in question.  The "freq." columns give
the corresponding frequencies, relative to the section's total.
In each table, the sections are separated into "large" (> 500 tokens)
and "small" (< 500 tokens), and each group is sorted by the 
with:without ratio.

First, the token counts for PRESENCE OF CORE LETTERS 
(a.k.a. "gallows letters" - EVA {k,t,p,f},
possibly with <c-h> or <i-h> pedestal)

           WITHOUT      WITH
         -----------  -----------
  sect.  count freq.  count freq.  pages
  -----  ----- -----  ----- -----  ---------------------------
  txt.n  11850 0.490  12330 0.510  text(all)
  lab.n    372 0.384    596 0.616  labels(all)

  pha.1    304 0.588    213 0.412  f88r-f89v1
  hea.2    322 0.529    287 0.471  f87r-f87v,f90r1-f90v1,f93r-f93v,f96r-f96v
  hea.1   3112 0.525   2819 0.475  f1v-f49r(A),f51r-f56v(A)
  pha.2    376 0.523    343 0.477  f99r-f102v1
  bio.1   2450 0.510   2350 0.490  f75r-f84v
  cos.2    318 0.502    316 0.498  f67r1-f70r2
  heb.1   1047 0.459   1234 0.541  f26r-f48v(A),f50r-f57r(B),f66v
  str.2   3043 0.444   3803 0.556  f103r-f108v,f111r-f116r

  cos.1     21 0.656     11 0.344  f57v
  unk.2     66 0.550     54 0.450  f49v
  unk.1     76 0.535     66 0.465  f1r
  cos.3    118 0.529    105 0.471  f85r2-f86v4,f85v2,f86v3
  str.1    159 0.485    169 0.515  f58r-f58v
  unk.5     62 0.470     70 0.530  f85r1
  unk.4    115 0.469    130 0.531  f66r
  unk.6     49 0.430     65 0.570  f86v6
  unk.7     62 0.425     84 0.575  f86v5
  zod.1     19 0.422     26 0.578  f70v2-f73v
  heb.2    121 0.420    167 0.580  f94r-f95v1
  unk.3     10 0.357     18 0.643  f65r-f65v
  unk.8      0 0.000      0 0.000  f116v

Here are the statistics for PRESENCE OF MANTLE LETTERS. These are the
EVA letters <sh> and <ch> (a.k.a. "tables" or "chairs"), plus some
relatively rare groups that seem to be variants or misreadings of the
same (chiefly <ee>, and a few <se>, <es>, and single <e> not following
another core or mantle letter.)

The <c-h> and <c-he> gallows pedestals were NOT counted as mantle letters.
(With this convention, there seems to be practically no correlation
between the number of core letters and the number of mantle letters
in the word.  If gallows platforms are counted as mantle letters,
there seems to be a small positive correlation between the two counts.
Obviously I do not know which one (if any) is the "correct" choice.)

           WITHOUT      WITH
         -----------  -----------
  sect.  count freq.  count freq.  pages
  -----  ----- -----  ----- -----  ---------------------------
  txt.n  12200 0.505  11980 0.495  text(all)
  lab.n    652 0.674    316 0.326  labels(all)

  heb.1   1309 0.574    972 0.426  f26r-f48v(A),f50r-f57r(B),f66v
  pha.1    290 0.561    227 0.439  f88r-f89v1
  pha.2    395 0.549    324 0.451  f99r-f102v1
  bio.1   2540 0.529   2260 0.471  f75r-f84v
  cos.2    329 0.519    305 0.481  f67r1-f70r2
  hea.2    304 0.499    305 0.501  f87r-f87v,f90r1-f90v1,f93r-f93v,f96r-f96v
  hea.1   2803 0.473   3128 0.527  f1v-f49r(A),f51r-f56v(A)
  str.2   3218 0.470   3628 0.530  f103r-f108v,f111r-f116r

  cos.1     23 0.719      9 0.281  f57v
  unk.7     94 0.644     52 0.356  f86v5
  unk.3     18 0.643     10 0.357  f65r-f65v
  str.1    204 0.622    124 0.378  f58r-f58v
  unk.1     85 0.599     57 0.401  f1r
  unk.6     67 0.588     47 0.412  f86v6
  heb.2    165 0.573    123 0.427  f94r-f95v1
  cos.3    115 0.516    108 0.484  f85r2-f86v4,f85v2,f86v3
  unk.5     65 0.492     67 0.508  f85r1
  unk.4    120 0.490    125 0.510  f66r
  zod.1     22 0.489     23 0.511  f70v2-f73v
  unk.2     34 0.283     86 0.717  f49v
  unk.8      0 0.000      0 0.000  f116v

Finally, the statistics for the ED-GROUP (EVA <ed>)
which Rene proposed as an indicator of "language evolution": 

  txt.n  21235 0.859   3499 0.141  text(all)
  lab.n    917 0.939     60 0.061  labels(all)
  
  hea.1   5925 0.999      6 0.001  f1v-f49r(A),f51r-f56v(A)
  hea.2    607 0.997      2 0.003  f87r-f87v,f90r1-f90v1,f93r-f93v,f96r-f96v
  pha.2    717 0.997      2 0.003  f99r-f102v1
  pha.1    512 0.990      5 0.010  f88r-f89v1
  cos.2    622 0.981     12 0.019  f67r1-f70r2
  heb.1   1957 0.835    387 0.165  f26r-f48v(A),f50r-f57r(B),f66v
  str.2   5653 0.793   1476 0.207  f103r-f108v,f111r-f116r
  bio.1   3505 0.708   1444 0.292  f75r-f84v
  
  unk.1    142 1.000      0 0.000  f1r
  unk.2    120 1.000      0 0.000  f49v
  str.1    333 0.988      4 0.012  f58r-f58v
  zod.1     44 0.978      1 0.022  f70v2-f73v
  unk.3     27 0.964      1 0.036  f65r-f65v
  unk.7    146 0.954      7 0.046  f86v5
  cos.1     29 0.906      3 0.094  f57v
  heb.2    263 0.889     33 0.111  f94r-f95v1
  unk.6    111 0.874     16 0.126  f86v6
  cos.3    199 0.869     30 0.131  f85r2-f86v4,f85v2,f86v3
  unk.4    208 0.849     37 0.151  f66r
  unk.5    115 0.777     33 0.223  f85r1
  unk.8      0 0.000      0 0.000  f116v

CAVE ASINUM - do not trust my statistics blindly! 
I have been known to make mistakes occasionally 8-/

All the best,

--stolfi