Hi all,
as an experiment, I encoded a part of the Genesis in Latin with a very primitive context encoding:
Result:
(1st line original text, 2nd encoded text):
in principio creavit deus caelum et terram
sh asbpsbrap mf khp nsnh mnsfan oj ejbtui
terra autem erat inanis et vacua et tenebrae super faciem abyssi et spiritus dei ferebatur super aquas
ejbtu kgafs ohic shiwfy oj gh fg oj ejxcewxc dyotl pqtchu kmlfyh oj dtcudxsl nsb pumrtuo c dyotl kcxyr
dixitque deus fiat lux et facta est lux
nwudxo o nsnh pyzt wrp oj pqtno oic wrp
et vidit deus lucem quod esset bona et divisit lucem ac tenebras
oj gosbv nsnh wruzm bwlp oibga lbpq oj nwsbudx wruzm kn ejxcewxq
appellavitque lucem diem et tenebras noctem factumque est vespere et mane dies unus
kbrwjvwsbvmim wruzm nwbo oj ejxcewxq ynqkpd pqtnjwnjn oic gkeuzrw oj xymr nwbu ftoi
dixit quoque deus fiat firmamentum in medio aquarum et dividat aquas ab aquis
nwudx bwldyd nsnh pyzt pyqefsxlgbo sh xcgoe kcxyqlz oj nwsbfga kcxyr km kcxgz
When I computed the new char frequency against the old one, I found a rather flat distribution:
451 e 99 p° 40 b
290 t 93 f° 20 g
267 i 93 d 16 x
241 a 93 c 9 h
228 u 91 i° 1 y
204 s 87 g°
200 o° 87 c°
178 m 82 l°
172 n° 80 z°
159 n 78 q°
157 j° 77 m°
147 r 76 v°
133 w° 74 e°
130 s° 72 r°
119 b° 70 o
114 y° 68 a°
112 d° 63 l
105 x° 62 °
105 h° 58 v
101 u° 55 q
101 t° 52 p
101 k° 41 f
(the chara with the ° are the encoded chars).
The next step: I'll encode the whole text and try to find our infamous 2nd order entropy.
The encoding alg. works in such way:
take the last encoded char, look it up in the key and note the position p.
take the next orig. char, look it up in the key->p1.Add p and p1->np.Get the npth char of the key and write it to the output.This can even be done on a word basis, as the structure is kept.
Claus