[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Voynichese as an Abugida
Interesting post. I too have wondered about syllabic representations.
A few thoughts of encouragement. Don't be put off by the apparently
large number of syllables in English. The figures may be high for text
but nothing like so high for actual sounds. Couple this with the fact
that naive intuitions regarding syllables can be good - the written
form for Cherokee, if I recall correctly, was invented by a
non-linguist and serves well - and it seems likely that someone of,
say, Kelly's abilities would have no trouble devising a serviceable
syllabary for, say, Latin or other European 'phonetic' languages.
We should note that the consonant vowel distinctions in semitic
languages are managed independently for morphological reasons (patterns
of vowels are morphemes, and interleave with patterns of consonants, as
morphemes). We would not expect to find that in VMS if it is a
rendering of anything other than Hebrew/Arabic.
I'd go for consonants with distinct symbols, plus some sort of simple
minded coding for vowel sounds (or even their omission).
I'll take another look at Stolfi's grammar when I get home tonight.
On 26 Jul 2004, at 07:51, Koontz John E wrote:
I am wondering if anyone has looked at the Voynich ms. script as an
abugida (alphasyllabary) or abjad (consonantal script)? I've actually
encountered these terms only this evening myself, though the phenomena
which they refer are not new to me and are probably familiar to most of
you reading this.
Abugida refers to scripts in which consonant symbols C indicate
a particular CV syllable, usually Ca. The consonant symbols are
with or modified by additional marks to indicate vowels other than the
default, or to indicate the absence of a vowel, maybe the presence of
other modifiers, like nasalization, e.g., C<nasal-V>, possibly
representing CVN, and so on. The Brahmic scripts of India (e.g.,
Dev(a)nagari for Sanskrit) are familiar types of this approach.
Tolkien's Tengwar are also an abugida, with a systematically generated
Abjads refer to consonant-only scripts, primarily the Semitic scripts,
which the consonant symbols C are augmented with modifying marks that
indicate associated vowels.
The distinction between abugida and abjad may correspond fairly well to
historical development - syllabary > abjad > abugida, with true
falling somewhere in that sequence - but logically, perhaps one should
cryptographically, it is somewhat moot, turning on whether the C-grade
some CV-grade is the unmarked term in a series.
I'm aware that the possibility that the Voynich script is syllabic is
generally rejected on the grounds that there are not enough distinct
symbols to represent the syllables of the Western European languages
likely to underly the script. I'm also aware that vowel-identification
procedures identify some of the characters in the text as likely
and threfore presumably the rest as consonants. I gather this sort of
analysis underlies the EVA transcriptions of the script.
However, I notice that the usual transcription tables and the more
sophisticated analysis in Stolfi's Grammar for Voynichese Words reveal
system that lends itself to a tabular presentation of series
and grades (vowels?).
For example, Stolfi's similar R and N sets - in combination the EVA
characters d l r s n m x - I'll just call them R - occur alone and with
one to four i's preceding - I'll call these I. These really look like
to four (once five) strokes with a distinguishing twiddle at the end,
which called to my mind Tolkien's approach with the Tengwar, if rather
vaguely, and so led to my speculations here. Conventionally, the last
stroke is taken with the twiddle as the R letter, in EVA transcription,
though I see that Frogguy transcription followed my instinct on this.
I'll stick with the EVA version, since that is what is used.
As Stolfi's grammar shows, the R series also occur with one or two of
o a y characters preceding - I'll just call these O - or with o or a
one to three i's preceding - which I'll call OI. It seems that you can
have O(0:2)I(0:4)R(1:1) - where X(i:j) means i to j instances of an X.
If these were syllables, I'd assume they were RIO syllables, or CVN,
C = syllable onset, V = syllable peak, and N = syllable coda, something
like pan or par. In other words, the syllables are inverted, though
is perhaps more due to taking the constituents of a syllable code as
individual characters rather than a formulaic whole.
I am not clear whether the two of O sequences before R are always oo
or yy or can be mixed arbitrarily. If only doubled, perhaps they
represent sporadic nn, rr, etc.
There are, of course, some holes in the pattern, e.g., only iiiin
has 4 i's and only id and ix occur in the i + d or x series, if I
understand the presentation. The holes should, of course, reflect
impossible CV combinations.
To return to the logic of the system, I'm suggesting that a series
il, iil, iil represents, e.g., pa pe pi po or p pa pi pu or pa pi pu p
something like that. The set of grades in a series R IR IIR IIIR (once
IIIIR?) doesn't provide for many vowel distinctions. I'm assuming that
the O characters are something different - syllable codas - but perhaps
they simply augment the vowel set. There are languages with only three
(or four) vowels, but this is not typical of Western Europe, where the
Classical Languages have aeiou systems (with length and diphthongs),
many of the modern languages added additional rounded front vowels.
The 7 R characters d l r s n m x also don't make a very large set of
series markers or consonants, especially since d and x are limited in
their combinations with i. In a typical European language we'd expect
14 chracters in the set p b t d k g f v s z m n r l w y at minimum and
maybe some from h th dh sh zh ch j kh ny ly too. I'm being fairly
imprecise in representing these sounds orthographically, I realize.
In regard to these lists, Latin is closer to the minimum, with most
medieval and modern language in Europe having more consonantal
Of course, we'd expect a phonological analysis in line with the
orthographic traditions of the underlying language, if any, and not
necessarily in line with modern linguistic theory. For example w and y
might be handled as vowels and palatals might not be distinctly
represented - collapsed with velars and dentals plus certain vowels or
represented as geminates, clusters, etc. Nasal vowels would be likely
be handled as vowel plus n or m, and so on.
Note that m is also a bit limited in its combinations with O (only o
plus m). If O letters are codas this is like allowing pan and par, but
not pal, to pick arbitrary examples.
In regard to the shortage of series I notice that there are a number of
additional patterned sets of characters that might provide additional
series. For example, the gallows sets p t cph cth and f k cfh ckh, in
those orders, each have 1 2 3 and 4 strokes reaching the base line,
that respect resemble the patterns in R series like l il iil iiil.
However, I don't see how to relate this to the ch and sh (looks like a
ligature or modification of ch), and the ch certainly looks like it is
involved in the cph cth cfh ckh forms.
Another possible case of series behavior involves e ee eee eeee and o
followed by e or ee, but the orthographic logic of the series here
somewhat different. Perhaps these are analogous to R iR iiR iiiR OR
Finally q and Oq - but only oq and yq occur - may be like the
and x series.
I have to admit that at the most optimistic I have perhaps 11 series
some rather defective, where I would expect upwards of 14.
I have not addressed the issue of complex syllable initials, which in
European languages are chiefly of the form SC (s or sh plus a simple
initial) and CR (stops plus r or l).
I haven't considered the possibility that the I distinctions might be
consonants and the R's vowels, because the numbers seem even worse.
Other weaknesses: It seems to me also that this analysis does nothing
explain the repeated word phenomenon, or the rather restricted length
words (ten characters maximum?). In regard to the latter, if anything
this makes words shorter, as syllables are encoded in somewhat longer
character sequences than an alphabetic system would employ.
Additionally, even if the script is on the basis suggested, it might be
naive to assume that the text is not encyphered in some way anyway.
John E. Koontz
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
Dr William Edmondson
School of Computer Science
University of Birmingham
Edgbaston B15 2TT
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: