[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: Do labels fit the word paradigms?
> [Rene:] Do they follow the word paradigms more consistently, or
> less, or equally?
For part of the answer, see
http://www.ic.unicamp.br/~stolfi/voynich/00-06-07-word-grammar/cmp-ct.html
http://www.ic.unicamp.br/~stolfi/voynich/00-06-07-word-grammar/cmp-fr.html
The format of these files is similar to that of the main grammar file.
Each "paragraph" lists a non-terminal symbol (bright red,
left-justified) followed by one line for each of its alternative
expansions (righmost field, in dark red).
The eight numeric fields (black) next to each alternative give its
usage counts (first file) or its relative usage frequencies (second
file) within the parent non-terminal symbol. The first column is
computed using all tokens in the running (non-label) text of all
sections. The next 6 columns are a breakdown of those same counts
per section. The last column applies to the labels only.
As you can see, the rule usage frequencies for text tokens are fairly
uniform over all sections. Labels are different from text tokens, but
only by a little.
As we know, EVA "q" is relatively rare in labels:
CrustPrefix:
.8180 .8323 .9387 .7905 .7039 .8910 .8778 .9877 CrP
.1820 .1677 .0613 .2095 .2961 .1090 .1222 .0123 Q.CrP
----- ----- ----- ----- ----- ----- ----- -----
txt.n pha.2 cos.2 str.2 bio.1 hea.1 heb.1 lab.n
CrustSuffix:
1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 CrS.OptOFinal
----- ----- ----- ----- ----- ----- ----- -----
txt.n pha.2 cos.2 str.2 bio.1 hea.1 heb.1 lab.n
WholeCrust:
.9480 .9707 1.0000 .9442 .8938 .9546 .9756 .9959 CrW.OptOFinal
.0520 .0293 . .0558 .1062 .0454 .0244 .0041 Q.CrW.OptOFinal
----- ----- ----- ----- ----- ----- ----- -----
txt.n pha.2 cos.2 str.2 bio.1 hea.1 heb.1 lab.n
On the other hand, labels seem to be a little "fatter" in the crust
layer (comprised of the "leader" letters [ldrs] with attached "circle"
letters [aoy]). Note, in particular, that those few tokens in the VMS
that have four crust letters in the suffix are almost all labels:
CrP:
.8924 .9340 .9274 .8694 .8366 .9331 .9310 .8951 .
.1018 .0628 .0670 .1260 .1503 .0651 .0659 .0802 OR
.0059 .0032 .0057 .0046 .0131 .0019 .0030 .0247 OR.OR
----- ----- ----- ----- ----- ----- ----- -----
txt.n pha.2 cos.2 str.2 bio.1 hea.1 heb.1 lab.n
CrS:
.4147 .3874 .4347 .4191 .3900 .4989 .3438 .2253 .
.5379 .5801 .5085 .5290 .5799 .4618 .5999 .5756 OR
.0446 .0314 .0477 .0487 .0284 .0369 .0543 .1667 OR.OR
.0028 .0011 .0091 .0033 .0017 .0025 .0020 .0293 OR.OR.OR
. . . . . . . .0031 OR.OR.OR.OR
----- ----- ----- ----- ----- ----- ----- -----
txt.n pha.2 cos.2 str.2 bio.1 hea.1 heb.1 lab.n
CrW:
.1370 .0997 .1709 .2042 .0735 .0773 .1574 .0980 .
.5985 .6129 .5201 .5676 .6006 .6806 .5739 .3469 OR
.2327 .2551 .2764 .1993 .3031 .2128 .2198 .3388 OR.OR
.0293 .0323 .0302 .0260 .0210 .0264 .0475 .1429 OR.OR.OR
.0025 . .0025 .0029 .0019 .0025 .0014 .0694 OR.OR.OR.OR
.0001 . . . . .0006 . .0041 OR.OR.OR.OR.OR
----- ----- ----- ----- ----- ----- ----- -----
txt.n pha.2 cos.2 str.2 bio.1 hea.1 heb.1 lab.n
Compared to text tokens, labels are also definitely more fond of "am"/"om"
endings, and slightly less fond of "ai*n"/"oi*n" ones:
Final:
.6876 .7309 .7453 .6564 .7849 .6334 .7096 .7363 Y
.0385 .0183 .0453 .0397 .0140 .0480 .0513 .0960 A.M
.2739 .2508 .2093 .3039 .2011 .3186 .2391 .1676 A.IN
----- ----- ----- ----- ----- ----- ----- -----
txt.n pha.2 cos.2 str.2 bio.1 hea.1 heb.1 lab.n
And so on. That is: no drastic differences that I can see. However,
among labels there are no obvious "Grove words" (two or more gallows,
starting with gallows).
Hope it helps,
--stolfi
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list