[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: RE: RE: Best-fit 2-state PFSMs
Hi everyone,
At 00:10 11/03/2005 -0600, Dennis wrote:
What effect does the size of the text corpus have? The HMM
needed a text corpus of ~6 Mbyte for a clear result. Would it tell us
something to create an enormous synthetic Voynichese corpus by Gabriel or
Jeff's method or Stolfi's Voynichese grammar and then analyze it?
Note that a grammar is a kind of generative probabilistic state machine,
but not necessarily the same kind as described by Ben Preece.
For example, there's good reason to consider that EVA <o> functions
differently in "qo" and "ol": while "qol" does occur in the VMs, it only
normally occurs on pages where you typically find free-standing (ie
non-"ol"/"al"-paired) "l" characters. So, a state machine where <o> maps to
a single state would not be able to capture this behaviour satisfactorily.
In fact, this is basically true of any generative grammar where an
individual letter (like <o>) appears in multiple "columns". The only easy
way around it would be to pre-tokenise groups of letters (as Ben is already
doing, though only for the usual suspects ATM), and compare those implicit
transcriptions' n-state PFSMs.
What about defining a transcription alphabet that treats these
digraphs/verbose cipher elements as single glyphemes and then analyzing
the VMs in that transcription? That seems like a useful exercise.
That's what I'm suggesting. :-)
Cheers, .....Nick Pelling.....
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list