Re: VMs: Voynichese as an Abugida

Hi John

Interesting post. I too have wondered about syllabic representations.

A few thoughts of encouragement. Don't be put off by the apparently large number of syllables in English. The figures may be high for text but nothing like so high for actual sounds. Couple this with the fact that naive intuitions regarding syllables can be good - the written form for Cherokee, if I recall correctly, was invented by a non-linguist and serves well - and it seems likely that someone of, say, Kelly's abilities would have no trouble devising a serviceable syllabary for, say, Latin or other European 'phonetic' languages.

We should note that the consonant vowel distinctions in semitic languages are managed independently for morphological reasons (patterns of vowels are morphemes, and interleave with patterns of consonants, as morphemes). We would not expect to find that in VMS if it is a rendering of anything other than Hebrew/Arabic.

I'd go for consonants with distinct symbols, plus some sort of simple minded coding for vowel sounds (or even their omission).

I'll take another look at Stolfi's grammar when I get home tonight.



On 26 Jul 2004, at 07:51, Koontz John E wrote:

I am wondering if anyone has looked at the Voynich ms. script as an
abugida (alphasyllabary) or abjad (consonantal script)? I've actually
encountered these terms only this evening myself, though the phenomena to
which they refer are not new to me and are probably familiar to most of
you reading this.

Abugida refers to scripts in which consonant symbols C indicate inherently
a particular CV syllable, usually Ca. The consonant symbols are combined
with or modified by additional marks to indicate vowels other than the
default, or to indicate the absence of a vowel, maybe the presence of
other modifiers, like nasalization, e.g., C<nasal-V>, possibly
representing CVN, and so on. The Brahmic scripts of India (e.g.,
Dev(a)nagari for Sanskrit) are familiar types of this approach.
Tolkien's Tengwar are also an abugida, with a systematically generated set
of C-symbols.

Abjads refer to consonant-only scripts, primarily the Semitic scripts, in
which the consonant symbols C are augmented with modifying marks that
indicate associated vowels.

The distinction between abugida and abjad may correspond fairly well to
historical development - syllabary > abjad > abugida, with true alphabets
falling somewhere in that sequence - but logically, perhaps one should say
cryptographically, it is somewhat moot, turning on whether the C-grade or
some CV-grade is the unmarked term in a series.

I'm aware that the possibility that the Voynich script is syllabic is
generally rejected on the grounds that there are not enough distinct
symbols to represent the syllables of the Western European languages
likely to underly the script. I'm also aware that vowel-identification
procedures identify some of the characters in the text as likely vowels,
and threfore presumably the rest as consonants. I gather this sort of
analysis underlies the EVA transcriptions of the script.

However, I notice that the usual transcription tables and the more
sophisticated analysis in Stolfi's Grammar for Voynichese Words reveal a
system that lends itself to a tabular presentation of series (consonants?)
and grades (vowels?).

For example, Stolfi's similar R and N sets - in combination the EVA
characters d l r s n m x - I'll just call them R - occur alone and with
one to four i's preceding - I'll call these I. These really look like one
to four (once five) strokes with a distinguishing twiddle at the end,
which called to my mind Tolkien's approach with the Tengwar, if rather
vaguely, and so led to my speculations here. Conventionally, the last
stroke is taken with the twiddle as the R letter, in EVA transcription,
though I see that Frogguy transcription followed my instinct on this.
I'll stick with the EVA version, since that is what is used.

As Stolfi's grammar shows, the R series also occur with one or two of the
o a y characters preceding - I'll just call these O - or with o or a plus
one to three i's preceding - which I'll call OI. It seems that you can
have O(0:2)I(0:4)R(1:1) - where X(i:j) means i to j instances of an X.
If these were syllables, I'd assume they were RIO syllables, or CVN, where
C = syllable onset, V = syllable peak, and N = syllable coda, something
like pan or par. In other words, the syllables are inverted, though this
is perhaps more due to taking the constituents of a syllable code as
individual characters rather than a formulaic whole.

I am not clear whether the two of O sequences before R are always oo or aa
or yy or can be mixed arbitrarily. If only doubled, perhaps they
represent sporadic nn, rr, etc.

There are, of course, some holes in the pattern, e.g., only iiiin (once?)
has 4 i's and only id and ix occur in the i + d or x series, if I
understand the presentation. The holes should, of course, reflect rare or
impossible CV combinations.

To return to the logic of the system, I'm suggesting that a series like l,
il, iil, iil represents, e.g., pa pe pi po or p pa pi pu or pa pi pu p or
something like that. The set of grades in a series R IR IIR IIIR (once
IIIIR?) doesn't provide for many vowel distinctions. I'm assuming that
the O characters are something different - syllable codas - but perhaps
they simply augment the vowel set. There are languages with only three
(or four) vowels, but this is not typical of Western Europe, where the
Classical Languages have aeiou systems (with length and diphthongs), and
many of the modern languages added additional rounded front vowels.

The 7 R characters d l r s n m x also don't make a very large set of
series markers or consonants, especially since d and x are limited in
their combinations with i. In a typical European language we'd expect the
14 chracters in the set p b t d k g f v s z m n r l w y at minimum and
maybe some from h th dh sh zh ch j kh ny ly too. I'm being fairly
imprecise in representing these sounds orthographically, I realize.

In regard to these lists, Latin is closer to the minimum, with most
medieval and modern language in Europe having more consonantal

Of course, we'd expect a phonological analysis in line with the
orthographic traditions of the underlying language, if any, and not
necessarily in line with modern linguistic theory. For example w and y
might be handled as vowels and palatals might not be distinctly
represented - collapsed with velars and dentals plus certain vowels or
represented as geminates, clusters, etc. Nasal vowels would be likely to
be handled as vowel plus n or m, and so on.

Note that m is also a bit limited in its combinations with O (only o and a
plus m). If O letters are codas this is like allowing pan and par, but
not pal, to pick arbitrary examples.

In regard to the shortage of series I notice that there are a number of
additional patterned sets of characters that might provide additional
series. For example, the gallows sets p t cph cth and f k cfh ckh, in
those orders, each have 1 2 3 and 4 strokes reaching the base line, and in
that respect resemble the patterns in R series like l il iil iiil.
However, I don't see how to relate this to the ch and sh (looks like a
ligature or modification of ch), and the ch certainly looks like it is
involved in the cph cth cfh ckh forms.

Another possible case of series behavior involves e ee eee eeee and o a y
followed by e or ee, but the orthographic logic of the series here seems
somewhat different. Perhaps these are analogous to R iR iiR iiiR OR OOR?

Finally q and Oq - but only oq and yq occur - may be like the restricted d
and x series.

I have to admit that at the most optimistic I have perhaps 11 series here,
some rather defective, where I would expect upwards of 14.

I have not addressed the issue of complex syllable initials, which in
European languages are chiefly of the form SC (s or sh plus a simple
initial) and CR (stops plus r or l).

I haven't considered the possibility that the I distinctions might be
consonants and the R's vowels, because the numbers seem even worse.

Other weaknesses: It seems to me also that this analysis does nothing to
explain the repeated word phenomenon, or the rather restricted length of
words (ten characters maximum?). In regard to the latter, if anything
this makes words shorter, as syllables are encoded in somewhat longer
character sequences than an alphabetic system would employ.

Additionally, even if the script is on the basis suggested, it might be
naive to assume that the text is not encyphered in some way anyway.

John E. Koontz
