Sorry for taking advantage of the list to learn something
about entropy. I have now loaded the java calculator
with the full text of Kalevala (long poetic epic in Finnish)
and with Galileo's Latin text of Sidereus Nuncius
(with some numbers) - both converted to lower case.
The results for character 1-3 entropies are these:
Kalevala (500 Kb) (1) 4.33 (2) 7.66 (3) 10.50
---> Monkey (32 Kb) 3.85 3.25 3.11
Sidereus Nuncius
by Galileo (72 Kb) (1) 4.23 (2) 7.57 (3) 10.22
---> Monkey (25 Kb) 4.00 3.21 2.52
It seems to me, therefore, that the java program (let's call
it JEC) shows unconditional entropy of pairs, triplets, etc.
The differences for 1st order is probably due to
(a) size of the sample, (b) recognizing 256 chars by JEC,
and (c) filtering off some non-literals by both program
which I haven't unified yet.
Does this make sense?