[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Word Length Distribution
First four colums as before.
Fifth column shows word length distribution after changing "or",
"ol", "al" and "qo" to single glyphs in addition to those already
changed.
QUOTE
"A word is an abstract sequence of symbols; a token is an occurrence
of a word in the VMS text (delimited by blanks, line breaks, etc.)
The length of a word or token is the number of symbols it contains.
For this page, we will define symbol as Currier did; i.e. EVA ch ans
sh will be counted as single symbols, and so are EVA cth, ckh, etc.."
UNQUOTE ---- J. Stolfi
http://www.dcc.unicamp.br/~stolfi/voynich/00-12-21-word-length-distr/
... and here are a couple of the words and their factorization into
the "alphabet" he used to define the word length:
chcthdy {Ch}{CTh}{d}{y} 4
cphey {CPh}{e}{y} 3
When I previously said "almost identical" to his graph it was by
eyeball. My old version of Quattro-Pro does not work on this machine.
I have not looked at Excel functions yet and really did not
understand a tenth of what QP had to offer. Hello Christoph, are you
there? Someone? I thought this would be definitive by sight but it is
not -- not to me.
I have not been able to change font with the settings for plain text
only in Pegasus. If copied, pasted and changed to Currier New the
columns should align unless the mail scrambles them.
I will push this a little more with consolidations. Not sure how to
handle "eee".
1 15 16 17 21
2 70 85 110 193
3 237 297 440 778
4 641 736 1138 1569
5 1276 1388 1798 1906
6 1701 1739 1847 1628
7 1645 1565 1267 916
8 1096 993 670 471
9 626 554 294 215
10 261 231 142 90
11 122 111 77 56
12 91 84 50 32
13 50 48 22 9
14 27 23 13 10
15 18 12 9 6
16 16 13 9 4
17 4 4 2 4
18 3 2 2 0
19 6 4 1 0
20 2 2 0 0
21 2 2 0 0
Regards to all,
Knox
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list