[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Complete PFSM example



I said I'd give an example of a PFSM with the probabilities included,
but now that I think about it, I think I need to explain something.

The PFSMs I provided were all generated by an evolutionary algorithm to
be optimal for a specific text, to minimize the information content of
that text as calculated by the PFSM.  

Now, if you've already chosen the state transitions for the PFSM, then
it's simple to figure the best probability assignments for that text -
just make the probability assignments match the actual frequency in the
text.  For example, if you run the text through the the PFSM, and a
certain state is visited 1234 times, and 567 of those 1234 times you see
a certain symbol, then give that symbol a probability 567/1234 for that
state.  These are the probability assignmens that will minimize the
information content of the text as calculated by a PFSM with those state
transitions.

So the tool I use doesn't evolve the transition probabilities, but only
the transitions themselves.  After the tool modifies the state
transitions, it runs the text through the PFSM and counts the
frequencies of each symbol in each state, and calculates the information
content from the frequencies.  The probabilities themselves never
actually show up in the app.

My first thought was to convert it all for you, but then I realized that
would lose good information.  It's even more interesting to see how
often a symbol occurs than to see just its probability in a given state.
So I decided instead to show you the information I actually use, which
is frequency counts, not probabilities.  I hope this isn't too
confusing.

Anyways, here's the 3-state PFSM again for English, with both transition
state and frequency count for the sample text.  At the bottom is the
total count of the number of times each state is visited when the sample
text is run through the FSM.  The EOW entry at the end of the table is
end-of-word

          0           1           2
---+-----------+-----------+-----------+
 A    0  3551     0  1768     0  1852 
 B    2  1284     2    25     2    65 
 C    1  1696     1   183     1     6 
 D    1  3929     1   131     1   305 
 E    0  1484     0  3592     0  5555 
 F    1  1492     2   151     1   102 
 G    1  2088     2    67     0     7 
 H    2  1584     2  3870     2     9 
 I    0  2048     0  1889     0  1664 
 J    2   274     -     0     2     3 
 K    2   648     1   241     1    49 
 L    2  2376     2   751     1   749 
 M    2  2025     2   140     2    26 
 N    0  5469     1   322     2    82 
 O    0  1881     0  3125     0  1778 
 P    1  1071     2   293     1   109 
 Q    2    78     2     3     2     1 
 R    1  3974     2   811     2   237 
 S    1  4132     1   901     1   210 
 T    1  5915     1  1217     1   195 
 U    0  1452     0   551     0   568 
 V    2   649     2    36     2    20 
 W    1  2061     2    94     2    23 
 X    0    76     -     0     -     0 
 Y    1   987     1   409     1   609 
 Z    2    37     1     2     0     4 
EOW   0  7314     0 12506     0  1440
---+-----------+-----------+-----------+
TOT     59611       33078       15668

You can see, for example, that 'y' occurs more or less the same number
of times in all states, while 'w' occurs frequently in state 0, and only
occasionally in states 1 or 2.  Some letters, like 'c', 'g', and 'h',
are reasonable common but occur only rarely in state 2.  The letters
'a', 'e', 'i', 'o', and EOW are the most common in state 2.

Looking at both the state transitions and also the transition counts, we
can provide a rough interpretation for each state:

* state 0 - after a vowel or at start of word, expect pretty much
anything

* state 1 - after some consonants, expect EOW, a vowel, or another
consonant 

* state 2 - after some consonants or consonant combinations, expect EOW
or a vowel

-Ben
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list