[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: RE: RE: Best-fit 2-state PFSMs



Hi everyone,

At 00:10 11/03/2005 -0600, Dennis wrote:
What effect does the size of the text corpus have? The HMM needed a text corpus of ~6 Mbyte for a clear result. Would it tell us something to create an enormous synthetic Voynichese corpus by Gabriel or Jeff's method or Stolfi's Voynichese grammar and then analyze it?

Note that a grammar is a kind of generative probabilistic state machine, but not necessarily the same kind as described by Ben Preece.


For example, there's good reason to consider that EVA <o> functions differently in "qo" and "ol": while "qol" does occur in the VMs, it only normally occurs on pages where you typically find free-standing (ie non-"ol"/"al"-paired) "l" characters. So, a state machine where <o> maps to a single state would not be able to capture this behaviour satisfactorily.

In fact, this is basically true of any generative grammar where an individual letter (like <o>) appears in multiple "columns". The only easy way around it would be to pre-tokenise groups of letters (as Ben is already doing, though only for the usual suspects ATM), and compare those implicit transcriptions' n-state PFSMs.

What about defining a transcription alphabet that treats these digraphs/verbose cipher elements as single glyphemes and then analyzing the VMs in that transcription? That seems like a useful exercise.

That's what I'm suggesting. :-)


Cheers, .....Nick Pelling.....


______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list