[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

State machine hypothesis...



Hi everyone,

I've turned Steve Ekwall's "folding key" idea into the "state machine hypothesis", which goes like this:-

	The SMH: each VMS symbol is encoded according to a current state,
	with certain symbols also triggering a change of state.

If the SMH is true, then (at the very least):-
(1) each state should have its own unique statistical profile
(2) that profile should be largely independent of whatever sample you take within it


(Please feel free to suggest better/stronger tests than these!)

Here are some preliminary results from several different state arrangements: note that, to be utterly conservative, my test-bed starts each line in an undefined state, and reverts back to an undefined state whenever it reaches the symbol "c*h". Also: in the following tables, the extra symbols A-F mean:-
A cfh
B ckh
C cph
D cth
E ch
F c*h


The figure in square brackets is the instance count of the highest ranking symbol: and the symbols are sorted according to instance count (most popular first, top 20 only).

(a) Using 8 gallows characters { f, k, p, t, cfh, ckh, cph, cth } as states.

f	[381] . o E a y d e i l r s F - n k t p q = * m f , C A
cfh	[78] . y o a - d i e s l r n p E F k = q * C m t f , B
k	[12411] . e o y a i d l E k r n s q - t F * = m , B D p f
ckh	[1201] . y o e d k l a E i s q r - n t F * = B m , D p f
p	[1445] . o E a y d e i r l s k F n t p * q - f = , m C D
cph	[241] . o y e d a i r l - s E n k t q F * p = C D m , B
t	[6771] . o e y a d i E l r t k n s - F q * = m , D p B f
cth	[1055] . o y e d - a l i k E r t s q n * F D = m B p , C

Comments:-
	States f and p seem fairly similar
	States k and t seem fairly similar
	States cph and cth seem fairly similar

(b) Using 8 gallows characters as above, but using ch as a shift code, ie if ch then
f <---> cfh, k <---> ckh, p <---> cph, t <---> cth


f	[216] E . a o i r l y s d F n - e = k m p t * f , q C A
cfh	[269] . y o e d a i s l r - k F p E n q t * = f m , C A
k	[9499] . e o a y i E d l k n r s - F q t * = m , D p B f
ckh	[4113] . o y e d k l a i E r q s - t n F * B = D m , p f
p	[906] E . o a i r l y s d F n e * k t - p = q , m f C B
cph	[1112] . o y e d a r l i k s q t p - E n F * f = , D m C
t	[5051] . o a e y i E d l r n t k s - F * q = m , p D B f
cth	[2775] . o y d e a l k i t E r - s q n F * D = m p , B C

Comments:-
	States f and p seem extremely similar
	States k and t seem fairly similar

(c) Using 8 gallows characters as above, but changing to an alternate set of counts when a ch character is reached (and flipping thereafter):-

up to the next ch:-
f	[212] E . a o i r l y s F n - d = k * m p t e , f q A C
cfh	[71] . y o a - d i r s e l F E n p k = * q C m t , B D
k	[8809] . e o a y i E l d k n r s F - q t * = m , p D B f
ckh	[1012] . y o e d k l a E i s q - r F n t * = B , m p D f
p	[875] E . o a i r l s F y d n * k - t e = p , m q f C B
cph	[207] . o y d a e i r l - n s k E t q F * p = C D m , f
t	[4605] . o a e y E i d l r n t s k - F * q = m , p D f B
cth	[885] . y o d e - a l i E k r t s q n F * D = m , p B C

beyond the next ch:-
f'	[2892] o . s k i a E t d y l F r n q e p * = , D B f - m
cfh'	[1361] . o e y d k i l a E r s n q t F = * B - D , p m f
k'	[690] . o e y d k a l i r E n - t q s * B D = F , m p C
ckh'	[3101] . o e y d k l a i E r q t - s n * B F = D m , p f
p'	[105] . o e d y a r i E l t p k s - n q * F f C D , A B
cph'	[905] . o y e d a l r i k s q p t E n - F * f = , m D C
t'	[446] . o e y d i l a E k t r n s - q D F * B = p m , C
cth'	[1890] . o y e d a l k t i r E s - q n F * D = p , m B C

Comments:-
States f and p seem extremely similar
States k and t seem fairly similar
It looks like ch represents a completely separate state from the gallows characters, apart from when it occurs in a cfh state (which maps to f' here).


Conclusions:
It looks like the SMH is well worth pursuing. At first glance, the statistics appear to indicate roughly 9 states:-


	State	Entry symbol
	1	f or p
	2	k
	3	t
	4	cfh
	5	ckh
	6	cph
	7	cth
	8	ch (where previous state = cfh)
	9	ch (where previous state <> cfh)

Once I make my test-code interactive, I'll post it on my site (it's written in JavaScript, using the perl script I posted to the list to generate include-able data).

Cheers, .....Nick Pelling.....

PS: this ties in closely with the field of data compression (which is my main area of expertise). Following Claude Shannon's much-cited 1951 paper, if the symbols in each state were ranked according to how often they occurred (and the ranking index used as the output stream), then this would give a very low entropy (ie highly-compressible) code stream... which would have a very similar statistical profile to the VMS as a whole. It would be quite cool if the author(s) of the VMS had predated Shannon by 450 years. :-)