[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: zara
Suppose the author of the VMS began with a word-list identical to all
the words actually used in the manuscript. Graphing their lengths forms
the symmetrical curve found by Jorge Stolfi.
The individual quires of the herbal form fairly smooth bellish-shaped
curves for word length distribution. They have between 464 and 843 words
according to the text I used.
The first 216 (see below) throws of dice (simulating draws of words from
the word-list) are very unlikely to approximate that curve. Two trials
of 500 throws and 1000 throws produced wavy curves. I used the
randbetween() function which should be "random enough" for this.
Possible scores using three dice with face values set to 0-0-1-1-2-2,
1-1-2-2-3-3, and 0-1-2-3-4-5.
score: 1--2--3--4--5--6--7--8--9-10
count: 4-12-24-32-36-36-32-24-12--4 = 216
The first 219 vocabulary words of the VMS (from all of f1r and f1v with
the text I am using) produce a fair curve with a slight secondary bulge
(not quite a "hump").
At the same time, the token lengths of the VMS text begin to form their
characteristic curve (different from the word-list curve) early in the
script. This would not be true of randomly drawn words (in this
instance, drawn without regard to whether they had been drawn from the
word-list previously).
I used two dice with face values 0-5 & 1-6 to simulate the (ideal) token
length curve. Fifty rolls yielded multi-humped curves beginning with the
first throws. The first lines of the VMS begin to form a single-humped
curve early.
To me, this means that a final result of using a random method applied
to a selected set of words can be made to match the VMS in some
statistics but that same method, barring a statistical fluke, will not
approach those statistics as soon as the VMS. I think that has already
been shown by other methods.
How well will a meaningful text that substitutes the VMS vocabulary for
its own -- using word frequency to determine the substitutions -- mimic
the VMS in early characteristic word length distribution and/or token
length distribution? The first one-eighth of Gabriel Landini's NewTxt
produced the bell-shaped curve for vocabulary words. So did the second
one-eighth. I have not checked it for just the first 216 words or for
token length distribution.
I mentioned zara, a game with 3 dice, in a previous post. I think I saw
a reference to it in a book about a 14th C. merchant.
KM
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list