Hi Jorge, you wrote: >I don't quite understand what you mean. Are your words the same thing >as our words (i.e. strings of non-break characters delimited by >breaks)? Or do you discard the break characters and then take all >possible substrings of each line (i.e. from column i to column j, for >all pairs 1 <= i < j <= n)? by words I meant all character combinations in a line discarding the breaks (=> all possible substrings). Yes, I used the frequency of such substrings, so the result is a weighted word length. My machine is a windows based machine, but - if necessary - I could break the VMS into several parts and then compute the numbers.The first step was only for curiosity- what if the "breaks" are no breaks at all, but something completly different. Today I computed the numbers for possible endings (from the above result as a wordlist) a got the following numbers: suffix norm. number of result rejected words length poss. suff. = unique words length < suffix length -1 ------------------------------------------------------------------- 2 19 7617 237 3 102 4877 1035 4 281 3081 2949 5 289 1901 6212 6 224 1201 10067 7 64 802 13487 The sum of unique words and rejections should be the number of words in the VMS vocabulary. (the complete list had 20605 different words). As a intuitive result I would conclude the mean suffix length in Voynichese (if there is such thing as suffix) could be 3 (minimum vocabulary). To normalize the suffix count I looked at the first length-1 char of the suffix and counted only these substrings. The last char was always a similar string of single chars like: ai c i l m n r al a c d k l o q r s t y am a c d o q s y // i.e. possible endings are ama amc amd amo amq ams and amy ar a c d e i k l o q s y ch a c d e i k l o p r s t y da i k l m n o r s de d e y dk a e s do a c d i k l m r t dy a c d e f k l o p q r s t y ea d i l m r s y ec h k t ed a c e l o q s y ee a c d e k n o r s t y ef a c y ek a c e o s y eo a c d e k l m o p q r s t y ep a c y es a c d e h o q s y et a c e s y ......... My method isn't very orthodox but I'm somehow convinced that there is a kind of finer/inner structure to Voynichese and there are words and the breaks have a special, functional meaning. Cheers Claus
<<attachment: winmail.dat>>