[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: VMs: RE: RE: Best-fit 2-state PFSMs



Hi everyone,

At 03:42 09/03/2005 +0000, Ben Preece wrote:
Every state in the PFSM assigns a probability to every input token. (If a token is not allowed in a state, its probability is zero.) If the probability of a token in a certain state is P, then when that token appears in that state it generates log2(1/P) bits of information. The input text can be run through the PFSM, and the information from every input token totaled up (or averaged per token).

Quick implementation hack for the interested: for a given corpus, build a square matrix containing the letter transition counts (ie a matrix[x,y] recording how many times letter x is followed by letter y within that corpus), and work from that instead. Much easier! :-)


The best-fit PFSM is the PFSM with the combination of transitions and probability assignments that provides the smallest total (or average) information when the sample text is run through it.

This is presumably the "evolution" stage you mentioned before, right? How do you seed it?


FWIW, given that linguists already have no difficulty in picking out the VMS' "vowels" in simple transcriptions like Currier, it seems fairly unlikely to me that PFSMs will throw a lot of light at that level: but the big cryptographic question for many (myself included) is whether PFSMs can help us compare transcriptions / letter groupings.

You see, if EVA <qo>, <or> and <ol> are not so much diphthongs as entirely independent letters, then you can't map <o> onto a single state in the way you describe: handling this requires the corpus to have been pre-processed in order to make a 1-to-1 letter-to-state mapping viable.

Perhaps the best approach might simply be to compare the curves of (best fit PFSM's information content) vs (number of states in the PFSM) for different transcriptions? I'd predict that the best transcription should show a sharp drop in information content once a critical number of states is included... just a thought! :-o

Cheers, .....Nick Pelling.....


______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list