[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Trithemius progressive cipher
On 3 Sep 2000, at 19:51, Claus Anders wrote:
> So I wonder, is there any coding scheme, which will
> produce such a low h2 and enlarge the average word length?
Yes, there are, although they may not be the methods used in the
vms:
http://www2.micro-net.com/~ixohoxi/voy/ekt.txt
http://web.bham.ac.uk/G.Landini/evmt/daindaiin.htm
Note that the word and token length problem is as follows (to clarify,
a "token" is any string separated by spaces, while a word is a type
of token, regardless of its frequency).
Both, long tokens and long words are less frequent in the vms than
in English or Latin. The word length distribution shows that the vms
vocabulary is made of shorter words, but for the token length
distribution (which shows somehow the use of the vocabulary),
Latin and English show the maxima at slightly shorter token lengths
than the vms.
This means that there is also a lack of short tokens in the vms text. I
guess that this discrepancy is because high frequent words (in
English: the, for, a, of, and, etc.) are very short while the most
common in the vms: daiin, ol, aiin, chedy, shedy are (except ol)
longer.
Despite this observation, Zipf's length-frequency law also seems to
be followed.
Even if one assumes that words may run together (like "thecat
jumps on thetable") there is still a shortage of long words. Note
(Figs 17 & 18) in:
http://web.bham.ac.uk/G.Landini/evmt/zipf.htm
there are very few words (and tokens, obviously) longer than 10
characters.
Of course, all this depends on what we call a character in the vms.
Cheers,
Gabriel