[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: zara

Suppose the author of the VMS began with a word-list identical to all the words actually used in the manuscript. Graphing their lengths forms the symmetrical curve found by Jorge Stolfi.

The individual quires of the herbal form fairly smooth bellish-shaped curves for word length distribution. They have between 464 and 843 words according to the text I used.

The first 216 (see below) throws of dice (simulating draws of words from the word-list) are very unlikely to approximate that curve. Two trials of 500 throws and 1000 throws produced wavy curves. I used the randbetween() function which should be "random enough" for this.

Possible scores using three dice with face values set to 0-0-1-1-2-2, 1-1-2-2-3-3, and 0-1-2-3-4-5.
score: 1--2--3--4--5--6--7--8--9-10
count: 4-12-24-32-36-36-32-24-12--4 = 216

The first 219 vocabulary words of the VMS (from all of f1r and f1v with the text I am using) produce a fair curve with a slight secondary bulge (not quite a "hump").

At the same time, the token lengths of the VMS text begin to form their characteristic curve (different from the word-list curve) early in the script. This would not be true of randomly drawn words (in this instance, drawn without regard to whether they had been drawn from the word-list previously).

I used two dice with face values 0-5 & 1-6 to simulate the (ideal) token length curve. Fifty rolls yielded multi-humped curves beginning with the first throws. The first lines of the VMS begin to form a single-humped curve early.

To me, this means that a final result of using a random method applied to a selected set of words can be made to match the VMS in some statistics but that same method, barring a statistical fluke, will not approach those statistics as soon as the VMS. I think that has already been shown by other methods.

How well will a meaningful text that substitutes the VMS vocabulary for its own -- using word frequency to determine the substitutions -- mimic the VMS in early characteristic word length distribution and/or token length distribution? The first one-eighth of Gabriel Landini's NewTxt produced the bell-shaped curve for vocabulary words. So did the second one-eighth. I have not checked it for just the first 216 words or for token length distribution.

I mentioned zara, a game with 3 dice, in a previous post. I think I saw a reference to it in a book about a 14th C. merchant.

KM ______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list