[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: Re: Context sensitive encoding
--- "Anders, Claus" <Claus.Anders@xxxxxxxxxxxxx>
> As long as you allow the encoded text will be longer
> than the original one,
> it's no problem to lower the entropy.But my
> restriction is, the encode text
> should have the same count of chars.
> If I just double every char,I'm getting a low
> entropy for sure (h0=4.5,
> h1=4.06, h2=2.6).
I looked at this very problem before:
1) the character entropy (h2) in the VMs is low
2) the words are on average not longer than in Latin
3) the vocabulary is as diverse as Latin.
This is an apparent contracdiction, but it turned
out that the VMs language is simply more economic.
Look at http://www.voynich.nu/extra/wordent.html
to see the details.
In fact, if a long text in a given language
has 8000 different words, the words can be
written using numbers of only 4 characters.
The alphabet size would be ten.
That gives an h0 of 3.3. The distribution would be
rather flat, so h1 and h2 would only be a little
Here, everything has been reduced: the average
word length and all the entropies.
I am sure that, by 'playing' with the character
set (the digits) on a position-dependent basis,
a language with VMs-like statistics could be
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: