Re: VMs: Voynichese as an Abugida

On Wed, 28 Jul 2004, Dennis wrote:
> Hello, Kemosabe - from a chemo-savvy chemical engineer.  ;-)

Does this imply a little web research?

> I find your analysis quite interesting!  (Where do the terms abugida and
> abjad come from, incidentally?)

I first ran into them quite recently as I searched the web for Voynich and

Abugida, see http://en.wikipedia.org/wiki/Abugida

The term abugida is attributed to Peter T. Daniels at
http://en.wikipedia.org/wiki/Peter_T._Daniels.  It seems that it is the
name of one of the Ethiopic scripts.  I assume it refers to letters of
sounds a-bu-gi-da, etc.  Strangely familiar!  It appears to refer to a
script in which each major character indicates a consonant or cluster with
one default vowel assumed, usually the same one, and usually following, or
so I deduce.  There are diacritics of some sort to indicate other vowels
that replace the default one, and to indicate the absence of a vowel.

The Devanagari and other Brahmi-derived scripts of South Asia are the most
familiar examples, think.  These scripts also include special characters
for initial vowels, and often have a large number of more or less obscure
ligatures for clusters.

Of course, students of scripts have been discussing such systems for a
long time, but apparently this term is of fairly recent origin.


> An abjad is a type of writing system where there is one symbol per
> consonantal phoneme, sometimes also called a consonantary. Abjads differ
> from alphabets in that they lack characters for vowels. The term takes
> its name from the old order of the Arabic alphabet's consonants Alif,
> Bá, Jim, Dál, though the word may have earlier roots in Phoenician or
> Ugaritic.

This description seems to imply that the term abjad has been around and is
perhaps of Arabic origin.  I wonder if it may not also be used technically
now as a result of Daniels' work.

What I have in this line is I.J. Gelb's A Study of writing, 1952
(reprinted 1974).  He discusses both types of script, but without using
these names.

> From time to time I've wondered whether Voynichese word divisions are
> not in fact syllable divisions.  This could explain the short word
> length for a European language.

Pardon my ignorance, but I have the impression that many labels consist of
single words.  Is this correct?  While a label might well take the form a,
aa, etc., or be an abbreviation, I'd consider that this was at least
potentially evidence that that words in themselves are not syllables.  Of
course, they might not be words either.

> 	French might be a good underlying language if this were the case,
> because French in its spoken form does not have clear word divisions.

We have to consider whether we mean French as it occurs today and as it
is analyzed by contemporary phonologists, or French as it occurred at some
date in the past, when the phonology was no doubt more conservative
and the orthography was the main, perhaps only, basis for what might pass
for a phonological analysis.

To some extent the notion that French does not have clear word divisions
is probably true of all languages.  Words are orthographic conventions.
Perhaps all languages have some degree of liaison and/or enclisis between
the units recognized as orthographic words, or some tendency to write
compounds as multiple words, or variably as multiple and single words. The
Sanskrit grammarians wrestle with this extensively, and it has become a
complex set of grammatical facts in the Insular Celtic languages and
various others around the world, e.g., Fulani and others in West Africa.

Anyway, I suspect that the Voynich script is based on something close to
the orthogaphic practice of any well-known Indo-European or Semitic
language it might encode, if it is of Medieval age.  If not, or if the
language itself is of some more exotic or invented origin, then the
orthographics practices may not encode some existing system.  Note that in
the former case we might have to deal with something like soft vs. hard c,
probably hand in hand with velars written differently before front vowels,
e.g., Italian che vs. ce, or French or Spanish que vs. ce.  Or not.

> I estimate that French might have ~500 fairly common syllables, although
> that is just a guess based on Louis XIV's Royal Cipher, which operated
> on syllables.  How would all this fit into your ideas?

It depends how you recognize syllables.  In an orthographically based
abugida for a typical Romance language, if we factor out the handling of c
and q, we'd expect to need symbols for V, CV, and CCV syllables, if we
ignore the codas, or syllable final consonants, and handle them as
variants of CV.  CCV syllables include sCV, CrV, and ClV.  The C set is
pretty much p b v f m t d s z n c^ j^ s^ z^ n^ k g r l l^ h, but not all
of these occur in clusters.  By c^ s^ z^ n^ l^ I mean the palatal and/or
alveo-palatal series.  You might need plain c and j (ts dz), too.  The
vowels are a e i o u, of course.  How you would handle diphthongs is
interesting, of course.  They multiply the number of vowels if treated as
simple vowels, but some sort of indication for w and y would handle
falling diphthongs.  Note that a true phonological analysis of the vowels
in even Italian produces more than aeiou, and French gets fairly complex.
I don't see the right sort of structure to handle this, so I suspect
orthographic vowels are more likely.

For example, using dash to separate "characters,"

la-s^a-te o-n^i spe-ra-n-ca vo-i ke-n-tra-te

Hope I got that right!

I could see someone providing separate ways to handle syllable final n and
a way to handle syllable initial n.

This sort of analysis would need, e.g., lV s^V tV V n^V spV rV nV cV vV kV
trV.  Some sort of diacritic would indicate if V was not a (or whatever
the base vowel was) or if it was to be omitted as with the -n- in
spe-ra-n-ca.  Or the -k- in mi-ra-bi-le di-k-tu.  Very likely spV would be
some systematic modification of pV, or, of course, a consistant approach
would suggest s-pV with two characters.  The same observations would apply
to trV.  However, I have the impression that the characteristics of the
script suggest a more complex approach.

Anyway, this approach requires something under 25 C characters times 6
modifications, probably systematic, for the different vowels (or lack
thereof), times perhaps 4 modifications for C, sC, Cr and Cl, which
amounts to O(600) characters, more if codas are indicated, too.  If the
systematic modifications are indicated with graphs that appear in many
cases to be separate characters, then, of course, it is a question of
O(600) syllables or character groups.  This is a ceiling.  Cases of plV +
no following vowel wouldn't be needed.

Note, of course, that this is all extremely hypothetical without some
proposed mapping to (or, rather, from) the Voynich script.  The very act
of trying to come up with such a mapping would to some extent permit
falsification of the hypothesis, and naturally transcribing samples
according to a proposed mapping would confirm or disconfirm a particular
mapping should any prove feasible.

> PS: Here is my website on some syllabic scripts, including Sequoyah's.
> http://www.geocities.com/ctesibos/new-inv/

