[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Talking about entropy



Title: Talking about entropy

Hi all,
how can I calculate the entropy of a given text (let's say the VMS) encoded in an arbitraly set of symbols (let's say Hanzi ideograms).

I'd do a mppaing for each VMS token (chars bet '.' and such) and map them of one of the most common (let's assume the first 7000 most frq. Hanzi/Kanji).Thus the size will be reduced, preserving the information.IMHO this will yield a higher entropy.What I want to do, calculate this new entropy and compare that to the known entropy of oether languages.

(You don't really need any Hanzi for that, it's jsut an example - pure numbers will do).
If I encode the VMS in taht way, I should get rid of all the null/redundancy problems.The mapping of token to number (or Hanzi) will preserve this information.

Any comments?

Cheers
Claus

PS: this is just a theoratical thought of my left brain.No visual encoding neccessary ;-)