[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: VMs: RE: RE: Best-fit 2-state PFSMs
Hi everyone,
At 03:42 09/03/2005 +0000, Ben Preece wrote:
Every state in the PFSM assigns a probability to every input token. (If a
token is not allowed in a state, its probability is zero.) If the
probability of a token in a certain state is P, then when that token
appears in that state it generates log2(1/P) bits of information. The
input text can be run through the PFSM, and the information from every
input token totaled up (or averaged per token).
Quick implementation hack for the interested: for a given corpus, build a
square matrix containing the letter transition counts (ie a matrix[x,y]
recording how many times letter x is followed by letter y within that
corpus), and work from that instead. Much easier! :-)
The best-fit PFSM is the PFSM with the combination of transitions and
probability assignments that provides the smallest total (or average)
information when the sample text is run through it.
This is presumably the "evolution" stage you mentioned before, right? How
do you seed it?
FWIW, given that linguists already have no difficulty in picking out the
VMS' "vowels" in simple transcriptions like Currier, it seems fairly
unlikely to me that PFSMs will throw a lot of light at that level: but the
big cryptographic question for many (myself included) is whether PFSMs can
help us compare transcriptions / letter groupings.
You see, if EVA <qo>, <or> and <ol> are not so much diphthongs as entirely
independent letters, then you can't map <o> onto a single state in the way
you describe: handling this requires the corpus to have been pre-processed
in order to make a 1-to-1 letter-to-state mapping viable.
Perhaps the best approach might simply be to compare the curves of (best
fit PFSM's information content) vs (number of states in the PFSM) for
different transcriptions? I'd predict that the best transcription should
show a sharp drop in information content once a critical number of states
is included... just a thought! :-o
Cheers, .....Nick Pelling.....
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list