[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Talking about entropy
Hi!
I'm not entirely sure I follow exactly:
--- "Anders, Claus" <Claus.Anders@xxxxxxxxxxxxx>
wrote:
> how can I calculate the entropy of a given text
> (let's say the VMS) encoded
> in an arbitraly set of symbols (let's say Hanzi
> ideograms).
> I'd do a mppaing for each VMS token (chars bet '.'
> and such) and map them of
> one of the most common (let's assume the first 7000
> most frq.
> Hanzi/Kanji).Thus the size will be reduced,
> preserving the information.IMHO
> this will yield a higher entropy.
The VMs has (IIRC) about 8000 different words (or
was that tokens?). Thus, if you use an arbitrary
character set of this size, you can replace
each word in the MS by one such character. The
new character entropy will be the same as the old
word entropy. Somewhere in the range of 10 to 11.
For any other type of transformation, you will
have to define precisely how you do the translation
from one character set to the other. Then you
need to know what type of entropy you want to
check. The one in the VMs that is really abnormal
is the conditional second order character entropy.
That means: if you know one character, the next one is
much too predictable, i.e. carries little additional
information. This is true for representations
of the VMs in the Currier and FSG alphabets, and
aggravated for representations in Eva or Frogguy.
A quick-and-dirty way to do what you are interested
in is simply 'simulate and do the maths' rather
than try to analyse. But it is not the same, and
should be repeated a number of times to see that
the resulting numbers are representative. For
example, it is not feasible to compute higher-order
entropy values this way.
Cheers, Rene
__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list