Hi all,
how can I calculate the entropy of a given text (let's say the VMS) encoded in an arbitraly set of symbols (let's say Hanzi ideograms).
I'd do a mppaing for each VMS token (chars bet '.' and such) and map them of one of the most common (let's assume the first 7000 most frq. Hanzi/Kanji).Thus the size will be reduced, preserving the information.IMHO this will yield a higher entropy.What I want to do, calculate this new entropy and compare that to the known entropy of oether languages.
(You don't really need any Hanzi for that, it's jsut an example - pure numbers will do).
If I encode the VMS in taht way, I should get rid of all the null/redundancy problems.The mapping of token to number (or Hanzi) will preserve this information.
Any comments?
Cheers
Claus
PS: this is just a theoratical thought of my left brain.No visual encoding neccessary ;-)