If someone is interesseted at some probabilties (char at token-postion), here are my statistics:
Relative probability <p> to find a specific char <c> at byte-pos <b>:
(p is relative to frequency of <c>)
c b p[%]
-------------
p 1 2.64087
q 1 6.71812
c 2 3.26424
f 2 1.70426
s 2 3.68421
t 5 2.12392
a 5 2.60217
h 5 2.38976
k 5 2.30475
e 6 3.49598
d 8 4.63362
i 8 4.3715
r 10 7.39938
l 10 5.74316
m 10 9.45165
n 10 9.58268
x 10 3.6
y 10 8.55244
Absolute probability <p> to find a specific char <c> at byte-pos <b>:
(p is relative to frequency of all chars)
c b p[%]
-------------
01 o 1.74165
01 q 1.98443
02 c 2.25286
02 o 2.69454
03 o 1.28743
04 h 1.70289
04 k 1.25038
05 a 1.88698
05 h 2.21752
06 e 3.42175
06 i 1.68694
07 e 1.63507
07 i 1.37577
08 d 3.05244
08 i 2.88774
10 l 2.97949
10 n 3.11456
10 r 2.70309
10 y 7.5895
Byte-positions are token length relativ:
b=1: begin, 2-4: 1st quarter, 5: middle 6-9: second quarter, 10: end