[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: WG: average word length in VMS



Hi Jorge,
you wrote:
>I don't quite understand what you mean. Are your words the same thing
>as our words (i.e. strings of non-break characters delimited by
>breaks)? Or do you discard the break characters and then take all
>possible substrings of each line (i.e. from column i to column j, for
>all pairs 1 <= i < j <= n)?
by words I meant all character combinations in a line discarding the breaks
(=> all possible substrings). Yes, I used the frequency of such substrings,
so the result is a weighted word length.
My machine is a windows based machine, but - if necessary - I could break
the VMS into several parts and then compute the numbers.The first step was
only for curiosity- what if the "breaks" are no breaks at all, but something
completly different.
Today I computed the numbers for possible endings (from the above result as
a wordlist) a got the following numbers:

suffix  norm. number of 	result 	   rejected words
length  poss. suff.		= unique words length < suffix length -1
-------------------------------------------------------------------
2	    19  			7617		   237
3	   102			4877		  1035
4	   281                  3081		  2949
5        289                  1901          6212
6	   224			1201         10067
7         64			 802         13487

The sum of unique words and rejections should be the number of words in the
VMS vocabulary. 
(the complete list had 20605 different words). As a intuitive result I would
conclude the mean suffix length in Voynichese (if there is such thing as
suffix) could be 3 (minimum vocabulary).
To  normalize the suffix count I looked at the first length-1 char of the
suffix and counted only these substrings. The last char was always a similar
string of single chars like:
ai c i l m n r
al a c d k l o q r s t y
am a c d o q s y		// i.e. possible endings are ama amc amd amo
amq ams and amy
ar a c d e i k l o q s y
ch a c d e i k l o p r s t y
da i k l m n o r s
de d e y
dk a e s
do a c d i k l m r t
dy a c d e f k l o p q r s t y
ea d i l m r s y
ec h k t
ed a c e l o q s y
ee a c d e k n o r s t y
ef a c y
ek a c e o s y
eo a c d e k l m o p q r s t y
ep a c y
es a c d e h o q s y
et a c e s y
.........
My method isn't very orthodox but I'm somehow convinced that there is a kind
of finer/inner structure to Voynichese and there are words and the breaks
have a special, functional meaning.
Cheers
Claus

<<attachment: winmail.dat>>