[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Trithemius progressive cipher



On 3 Sep 2000, at 19:51, Claus Anders wrote:
> So I wonder, is there any coding scheme, which will
> produce such a low h2 and enlarge the average word length?

Yes, there are, although they may not be the methods used in the 
vms:

http://www2.micro-net.com/~ixohoxi/voy/ekt.txt
http://web.bham.ac.uk/G.Landini/evmt/daindaiin.htm

Note that the word and token length problem is as follows (to clarify, 
a "token" is any string separated by spaces, while a word is a type 
of token, regardless of its frequency).

Both, long tokens and long words are less frequent in the vms than 
in English or Latin. The word length distribution shows that the vms 
vocabulary is made of shorter words, but for the token length 
distribution (which shows somehow the use of the vocabulary), 
Latin and English show the maxima at slightly shorter token lengths 
than the vms.
This means that there is also a lack of short tokens in the vms text. I 
guess that this discrepancy is because high frequent words (in 
English: the, for, a,  of, and,  etc.) are very short while the most 
common in the vms: daiin, ol, aiin, chedy, shedy are (except ol) 
longer. 
Despite this observation, Zipf's length-frequency law also seems to 
be followed.
Even if one assumes that words may run together (like "thecat 
jumps on thetable")  there is still a shortage of long words. Note 
(Figs 17 & 18) in:

http://web.bham.ac.uk/G.Landini/evmt/zipf.htm

there are very few words (and tokens, obviously) longer than 10 
characters.
Of course, all this depends on what we call a character in the vms.

Cheers,

Gabriel