Re: VMs: Re: Voynichese as an Abugida

On Tue, 27 Jul 2004, Jorge Stolfi wrote:
> No one knows what is a "letter" in the Voynichese script. That is not
> surprising since the question is not trivial even for known languages:
> Should English "th", Spanish "ll", and Italian "tt" be considered one
> letter or two? Should "é" be considered two letters ("e" + "´")? The
> answer is usually based on tradition, and different languages have
> different traditions: Spanish counts "ll" as one letter for sorting
> purposes, for example, while Portuguese counts the equivalent "lh" as
> two separate letters.

Actually, I gather that the Spanish redefined ll, etc., as pairs of
letters not too long ago.  Curse them for eliminating a perfectly good
example!  Perhaps part of the problem here is that we don't have an
established terminology for the components of composite "characters."
Perhaps graphs and glyphs?  I suspect that the Unicode people have a
scheme for this.  Anyway, I agree that we don't know the underlying or
intended structure of the Voynichese script, though we do have an evolving
analysis of the attested patterns within it, a lot of it your work, of

> Since we do not know the "language" of the VMS, we can only give a
> "typographical" definition: a letter is a set of strokes that is often
> seen disconnected from other strokes. We are fortunate that the VMS
> was written using "printing letters" rather than "cursive", so most
> words are in fact a disconnected sequence of "glyphs", each consisting
> of a few connected strokes.
> Moreover, except for a few dozen "weirdo" glyphs, each occurring only
> a few times in the whole book, all glyphs seem to be taken from a
> small "alphabet", with the variations expected from hand writing.
> Where larger groups of connected strokes occur, they generally seem to
> be made by two or more of these same glyphs that were "accidentally"
> joined.

Does this cover primarily the case of i*R characters only also the
characters assumed to be unitary, though recognized as being potential
ligatures e.g., the gallows-bench continuum?

Anyway, I certainly appreciate eliciting this summary of the situation!

> If we try to make a catalog of glyph types, we get different results
> depending on how strict are our criteria for comparing glyphs. Many of
> us have followed the precedent of Friedman and his colleagues
> who---for instance---assumed that all plumes were equivalent,
> independently of their shape. (The plumes are those reverse-C strokes
> that rise above the "o" height and are unconnected at the top end.)
> This assumption is implicit in the FSG alphabet and most of its
> successors, including EVA. Under that assumption, there are only four
> glyphs with plumes --- "sh", "s", "r", and "n" --- that are not
> "weirdos".  Other people --- Glen may be one of them --- have chosen
> not to make this assumption, and so they recognize more than four
> plumed glyphs. Friedman and EVA have also assumed that the little
> "hook" that sometimes ends the arm of EVA p/f is not significant;
> I suspect it is, so by my count there are four one-legged gallows,
> not two.

Certainly some of the marks used in attested abugidas are remarkably
subtle.  I'm interested to know how this plume debate works out.

> Our perception of the alphabet is also modified by statistical
> analysis. For instance, the EVA letter "e" is special because it often
> occurs three times in a row. After uncounted tabulations, I have
> tentatively concluded that the pair "ee", even though it seems to be
> two disconnected glyphs most of the time, should be considered a
> single letter like "ch" or "sh". It occurs in the same contexts where
> those two occur, with similar relative frequencies, and that pattern
> of occurrences clearly separates the three glyphs --- "ch", "sh", and
> "ee" -- from other glyphs. In fact, it is possible that "ch" is the same
> as "ee", the top ligature being merely a device optionally used by the
> scribe to remove ambiguity.

I think this is laid out fairly well in Note 017, which I am still

> By similar arguments, the sequences "ii", and "iii" have come to be
> considered two single letters.

In effect this is what I am suggesting, too, just from examining tables of
the EVA, combined with your word grammar.  That is, that i ii and iii are
separate modifiers of the following R element.  I've suggested something
like three different vowels in syllables with R, with R alone implying
perhaps a fourth, but it would also be possible to think in terms of p vs.
b. vs. f vs. v, and so on.

> One also should take into account handwriting variations from page to
> page. For example, in the Zodiac pages (which, to my eyes, are the
> oldest in the book) one often sees an EVA "a" which is open at the
> bottom. In extreme cases, this "open a" can almost be confused with a
> "ch". But since the "normal a" is under-represented in those pages,
> and common words that are written with "normal a" elsewhere are
> written with "open a" on those pages, most transcribers have assumed
> that they are the same letter.

That seems reasonable to me.

> The point of all this discussion is to say that the size of the
> Voynichese alphabet depends on who is counting; but, according to most
> people, it has only between 20 and 30 distinct glyphs, even counting
> "ee", "ii", and "iii" as separate glyphs. Since a syllabic alphabet,
> or an abugida, would need at least 50 or so distinct glyphs, a strict
> syllabary seems to be ruled out.

It might require about about 25 consonants plus n modifiers for alternate
vowels or the lack thereof, and these modifiers might be perceived as
characters in themselves, especially if they were written before or after
in line instead of written above or below or inside as they are in many
attested abugidas.  In short 25 + n characters or bases plus diacritics
are required, assuming no special coding of complex margins.

As a hypothetical example, if l il iil iiil were, say, pa pi pu p, we
would still have the properties of an abugida, but code in terms that
would look like four characters, or even two, depending on what we were
calling a character.  What would distinguish this from an alphabetic
approach, in which l il iil iiil represented, say, p pa pi pu, would be
that a sequence like qokeedy would represent not something like
C-V-C-C-C-C, but something like C-V-CV-CV-CV-CV.  If some vowels were
implicit in adjacent consonants, and some vowel-like symbols signified the
absence of a vowel, assuming a strict alphabet might mislead the analyst.

> It may be that in Voynichese one of the vowels is left out, while the
> others are written as separate gliphs.  That is, instead of "pa pe pi
> po pu" one would write "p pe pi po pu".  However, at the stage we are,
> it seems hard to distinguish that system from a simple alphabetic one.

Unless I'm missing something, the minute we start dividing the character
set into consonants and vowels we will notice some anomalies if we assume
the wrong approach to vowel encoding.  Of course, with EVA and some of its
predecessors this is perhaps concealed by the choice of representations
for the elements repreented as Roman characters.  EVA is designed to be
pronounceable, and, useful as it is, it makes some assumptions about how
the script is organized (and linearized).

> I cannot comment much on your specific proposals that certain groups
> of glyphs represent syllables in some abugida system. That is possible
> and in fact the problem is that there are too many possible
> alternatives. In any case the rigid word structure is a problem,
> unless you assume that different conventions have been used for the
> beginning, middle, and end of the words. (Arabic has something like
> that, but the three variants of each letter can be seen as extreme
> calligraphic variations on the same original glyph---which is hardly
> the case in Voynichese.)

These are issues that I have yet to address, I'm afraid!

In regard to a Semitic language, an ingenious and philologically inclined
person might encode the root consonants and the inflectional/derivational
vowels separately, e.g., kitab as ktbia, muslim as slmmui, or mirabile
dictu can be mrbliaie dcti0u, though I suspect the latter would be harder
to write on the fly.

> The only way I can understand that rigid structure in terms of natural
> languages is by assuming that each word is a single syllable. That, as
> you are all bored to know, points towards an East Asian language.

Actually, a lot of languages elsewhere in the world tend toward an
isolating structure, too.  And many languages written for various reasons
with long words are easily parsed into small pieces, even though the act
of doing so might involve more conscious analysis than speakers can
normally make.

>   > To return to the logic of the system, I'm suggesting that a series
>   > like l, il, iil, iil represents, e.g., pa pe pi po or p pa pi pu
>   > or pa pi pu p or something like that. The set of grades in a
>   > series R IR IIR IIIR (once IIIIR?) doesn't provide for many vowel
>   > distinctions.
> Another problem is that these sequences, with very few exceptions,
> only occur at the end of the words.

Yes.  I'm not sure what to make of the gallows and bench stuff in this
context, as I think should be obvious.  I have, however, delt with
languages in which there were strong positional constraints on the
consonants.  Not typical of Europe, needless to say.

> Arabic has only three vowels (a,i,u), although it has a distinction
> between strong (consonantal) and weak ones. Arabic was the language of
> administration in parts of Spain and Portugal for several centuries,
> until 1480 or so (You can't get more 'Western' in Europe than that!).

Point taken, though I meant the indigenous languages, i.e., that such a
pattern would imply a language other than these, such as Arabic.  I just
didn't put it very well.

> Actually Greek had at least seven vowels (alpha, eta, epsilon, iota,
> omicron, upsilon, omega).

Hoist with my own orthographic petard.  In a structural and phonological
sense eta is long epsilon and omega is long omicron, though, the phonetics
of the Greek vowel system varies increasingly from this simple analysis
over time, and, of course, a lot of etas are from long a.  Anyway, I think
I can claim Greek as a five vowel language, albeit with seven vowel
symbols representing a, ee, e, i, o, u, oo, and complicated by near
homophony of ee and ei, oo and ou, etc.

> I am not sure about Latin; the alphabet had only aeiou, but the language
> may have had more.  (Besides there are those "ae" and "oe" ligatures ---
> are they Classical or Medieval?)

A, aa, e, ee, etc., as I understand it.  I think writing length with
macron was a nicety of the grammarians, but a fact of the language.  Ae
and oe, eventually written as ligatures, are ai and oi, but become e
phonetically (or is it ee?), and au becomes o.

> Italian and several other Romance languages have at least two sounds for
> "e" and two for "o", so I would guess that Latin did, too.

I'm not going to look this up right now, but I think these contrasts
develop from the loss of length, collapse of disphthongs, and interactions
with stress, syllable openness, etc.  The largest number of vowel and
diphthong contrasts is in French, but it doesn't look like Voinichese
script encodes vowels phonolpogically.  Roman characters can't do it
without extensive diacritics and/or digraphs, and Voynichese looks to be
short of vowels, rather than flush with them.  (Or nobody has deduced the
basis of these representaitonal scheme, which might well be possible!)

Thanks for the feedback and elucidations!

John Koontz
