[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Would it be possible to summarize it for the list?



On Thu, 3 Mar 2005, Ben Preece wrote:
> I'll try to find my stuff on the VMs and write something up about it.
> One interesting thing I remember is that some letters, like 'a' and 'o',
> tend to end up with their own states;  no matter what came before the
> 'a', the 'a' will always take you to the same next state.  This means
> that what comes next after the 'a' depends only on the fact that it
> comes after the 'a', and not on anything that came before the 'a'.  The
> letters 'a' and 'o' have different states, and these are not the same
> state that indicates the start of a word.  So how would one interpret
> all that linguistically?

As Rene and others have said, this is probably an artifact of taking the
EVA transcription scheme as an analysis into characters.  If characters
are either larger than the EVA characters, or smaller, or both, then any
attempt to produce a canonical analysis of form will be trying to deal
simultaneously with both the underlying pattern and the imposed
transcriptional pattern, which introduces both false assertions of
independence (two characters for one) and true but concealing assertions
of dependence (one character for two).

As far as what one might want to use as input instead of strict EVA, I
agree more or less with Rene and Nick in thinking that sequences of i and
e are essentially units.  And I think that Stolfi and others' observations
that sets like dlrs or the gallows and benches tend to behave as if they
were notational alternates arise from these being mergers of a preceding
character element, e.g., i or e, with a dependent "flourish" or "loop"
element - a character that has to be attached to a preceding one - the
reverse of the "cannot be dependent" sort of character Guy was describing
recently for Arabic.  EVA characters that arise from combining a flourish
with a preceding e or i behave similarly, not because they are the same
character, but because they represent similar things.  For example, they
may all be the boudaries between a consonant (i or e sequence) and a vowel
(flourish).  That is, they may all be "syllable finals."

To be explicit in addressing Ben's question, I suspect that a or o are
perhaps "start of character only" marks, so that ai, aii, etc., are
different characters, as perhaps are i, ii, etc., without preceding a.
Ditto for e-sequences.  On the other hand, n, l, etc., are a preceding i
element (from a preceding character) and a following n-flourish, a
preceding i and a following l-flourish, and so on.  Similarly, EVA
transcriptional elements like d or y are preceding e-elements with
attached following flourish elements.  One proof of this is the cases
where multiple flourishes occur on a single e or i.

So, looking at each i and e (or c)  as a separate thing is like looking at
each stroke in a p or b as a separate thing, while looking at n and l as
single things is like looking at an ae or fl ligature in print as single
elements.  Both approaches make a certain amount of sense, but they are
not especially useful for decyphering a text in which these entities
occur.

I'm less clear on the interacting gallows and bench sequences, though they
clearly have patterned subanalyses, too.  I tend to think that the
bar-element that connects c and s (which look like e and e with an
attached flourish) with a following e (EVA represent bar + e as h) and
also appears in other contexts, if I recall, is effectively another
element of the "dependent' flourish class.

So, Ben, if I was running your code or a vowel vs. consonant analyzer or
anything like that I would start with a retrancribed file that either
analyzed things into smallest elements or first did that and then tried
recombining them in various ways as I think Nick suggested.  The first
approach amounts to trying to simultaneously account for the stroke
sequences that make up particular characters and the character sequences
that typically make up words, while the second approach assumes we have
solved the first problem and tackles the second problem on that
assumption.

As for why English takes more states than Latin, well, it combines words
that operate on different orthographic principles in one script, and it
also uses a lot more digraphs.  So you have both mixed data and mixed
levels.  There are a few other problems, too.



______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list