[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
State machine hypothesis...
Hi everyone,
I've turned Steve Ekwall's "folding key" idea into the "state machine
hypothesis", which goes like this:-
The SMH: each VMS symbol is encoded according to a current state,
with certain symbols also triggering a change of state.
If the SMH is true, then (at the very least):-
(1) each state should have its own unique statistical profile
(2) that profile should be largely independent of whatever sample you take
within it
(Please feel free to suggest better/stronger tests than these!)
Here are some preliminary results from several different state
arrangements: note that, to be utterly conservative, my test-bed starts
each line in an undefined state, and reverts back to an undefined state
whenever it reaches the symbol "c*h". Also: in the following tables, the
extra symbols A-F mean:-
A cfh
B ckh
C cph
D cth
E ch
F c*h
The figure in square brackets is the instance count of the highest ranking
symbol: and the symbols are sorted according to instance count (most
popular first, top 20 only).
(a) Using 8 gallows characters { f, k, p, t, cfh, ckh, cph, cth } as states.
f [381] . o E a y d e i l r s F - n k t p q = * m f , C A
cfh [78] . y o a - d i e s l r n p E F k = q * C m t f , B
k [12411] . e o y a i d l E k r n s q - t F * = m , B D p f
ckh [1201] . y o e d k l a E i s q r - n t F * = B m , D p f
p [1445] . o E a y d e i r l s k F n t p * q - f = , m C D
cph [241] . o y e d a i r l - s E n k t q F * p = C D m , B
t [6771] . o e y a d i E l r t k n s - F q * = m , D p B f
cth [1055] . o y e d - a l i k E r t s q n * F D = m B p , C
Comments:-
States f and p seem fairly similar
States k and t seem fairly similar
States cph and cth seem fairly similar
(b) Using 8 gallows characters as above, but using ch as a shift code, ie
if ch then
f <---> cfh, k <---> ckh, p <---> cph, t <---> cth
f [216] E . a o i r l y s d F n - e = k m p t * f , q C A
cfh [269] . y o e d a i s l r - k F p E n q t * = f m , C A
k [9499] . e o a y i E d l k n r s - F q t * = m , D p B f
ckh [4113] . o y e d k l a i E r q s - t n F * B = D m , p f
p [906] E . o a i r l y s d F n e * k t - p = q , m f C B
cph [1112] . o y e d a r l i k s q t p - E n F * f = , D m C
t [5051] . o a e y i E d l r n t k s - F * q = m , p D B f
cth [2775] . o y d e a l k i t E r - s q n F * D = m p , B C
Comments:-
States f and p seem extremely similar
States k and t seem fairly similar
(c) Using 8 gallows characters as above, but changing to an alternate set
of counts when a ch character is reached (and flipping thereafter):-
up to the next ch:-
f [212] E . a o i r l y s F n - d = k * m p t e , f q A C
cfh [71] . y o a - d i r s e l F E n p k = * q C m t , B D
k [8809] . e o a y i E l d k n r s F - q t * = m , p D B f
ckh [1012] . y o e d k l a E i s q - r F n t * = B , m p D f
p [875] E . o a i r l s F y d n * k - t e = p , m q f C B
cph [207] . o y d a e i r l - n s k E t q F * p = C D m , f
t [4605] . o a e y E i d l r n t s k - F * q = m , p D f B
cth [885] . y o d e - a l i E k r t s q n F * D = m , p B C
beyond the next ch:-
f' [2892] o . s k i a E t d y l F r n q e p * = , D B f - m
cfh' [1361] . o e y d k i l a E r s n q t F = * B - D , p m f
k' [690] . o e y d k a l i r E n - t q s * B D = F , m p C
ckh' [3101] . o e y d k l a i E r q t - s n * B F = D m , p f
p' [105] . o e d y a r i E l t p k s - n q * F f C D , A B
cph' [905] . o y e d a l r i k s q p t E n - F * f = , m D C
t' [446] . o e y d i l a E k t r n s - q D F * B = p m , C
cth' [1890] . o y e d a l k t i r E s - q n F * D = p , m B C
Comments:-
States f and p seem extremely similar
States k and t seem fairly similar
It looks like ch represents a completely separate state from the gallows
characters, apart from when it occurs in a cfh state (which maps to f' here).
Conclusions:
It looks like the SMH is well worth pursuing. At first glance, the
statistics appear to indicate roughly 9 states:-
State Entry symbol
1 f or p
2 k
3 t
4 cfh
5 ckh
6 cph
7 cth
8 ch (where previous state = cfh)
9 ch (where previous state <> cfh)
Once I make my test-code interactive, I'll post it on my site (it's written
in JavaScript, using the perl script I posted to the list to generate
include-able data).
Cheers, .....Nick Pelling.....
PS: this ties in closely with the field of data compression (which is my
main area of expertise). Following Claude Shannon's much-cited 1951 paper,
if the symbols in each state were ranked according to how often they
occurred (and the ranking index used as the output stream), then this would
give a very low entropy (ie highly-compressible) code stream... which would
have a very similar statistical profile to the VMS as a whole. It would be
quite cool if the author(s) of the VMS had predated Shannon by 450 years. :-)