[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: WG: average word length in VMS
Jorge Stolfi wrote:
> If I understood Jacques's description correctly, Sukhotin's algorithm
> looks for two subsets C and V of the alphabet that maximize the
> frequency of CV and VC transitions in the words.
Yes. But Sukhotin also wrote that his solution was not optimal.
I say "who cares?": it is computationally unbelievably inexpensive.
> I seem to recall that
> Sukhotin's algorithm applied to Voynichese produced only a few
> unconvincing results that led nowhere (probably echoes of the OKOKOKO
> model). Part of the problem may have been the multiletter
> Voynichese->EVA encoding, which tends to obscure the C-V alternations.
Not part of the problem. The whole problem, I'd say, or very close.
> But even if we were to use the true Voynichese alphabet, the KMC
> strucure would probably prevent the algorithm from finding
> enough CV and VC transitions to call home about.
Not sure. I once tried the algorithm on a word list of a language
of Vanuatu which was 90% consonants -- but had 10 vowels. Every
vowel identified, none misidentified.
> On the other hand, the KMC structure is not unlike the structure of
> single *syllables* in Latin and other natural languages. Syllable
> boundaries are partly a matter of convention; but, off of my head, I
> would guess that the Latin syllable can be said to have the general
> structure SCRVVN where all letters are optional except for one V; and
> S, R, and N are specific subsets of the consonnats:
>
> in prin ci pio cre a vit de us cae lum et te rram
> te rra au tem e rat i na nis et va cu a et te ne brae
> su per fa ci em a by ssi ...
>
> So it is tempting to identify the core letters K (gallows) of the KMC
> model with the main consonant C of the syllable; the mantle letters M (chairs)
> with the secondary consonants S and R; the crust letters C (dealers) with the
> vowels; and the final groups <iin>, <in>, etc. with the final consonants N.
Look, I know Latin, I know Chinese. The pattern you have uncovered
looks
strikingly like Chinese. Latin? Let me scratch my head. Scratch...
scratch...
scratch... scratch... er.... scratch... scratch... please don't wait
for me.
>
> This theory has some strengths; for instance it seems to fit the
> observation that dealers (=vowels) often occur alone, whereas gallows,
> tables, and finals (=consonants) almost never do.
If gallows are tonal marks they won't occur alone. Ditto if they
mark vowel/consonant length or/and aspiration. Ever seen a
spiritus asper occur alone in Greek, and a shaddah without
a letter underneath in Arabic? (Not to mention a Portuguese
tilde without an a, an o, or an e, but not an n!)
> Enter the Chinese hypothesis:
[...]
> Also Mandarin
> has only one or two finals, whereas Voynichese has three or four
> common ones, and a few rare ones.
Mandarin, until the turn of the century, had a glottal-stop
final, analysed by Sinologists as a tone (ru4-sheng1 "entrant tone").
It arose from the loss of earlier finals -p, -t, -k. Cantonese
has kept them, plus final -m, gone to -n in Mandarin. When you
take into account all the other Chinese "dialects" of which
I know about only a few....
> but I see five diacritics, so perhaps there were
> more than 4 tones back then.
That is the ru4-sheng1, really a glottal stop or
perhaps an unreleased k.
> I also see a "chum", so there may have
> been more final sounds too.
Well, a final -m, still not assimilated to -n.
> I have been told that Manchu and Mongolian, although they are
> unrelated to Chinese, may fit the bill too. But the best candidate
> outside China may be Tibetan. It is a syllabic language with a
> rudimentary tone system, consonant clusters
The consonant clusters are only in the writing. They are
mostly etymological. Used to be pronounced a long time ago.
It's much like Gaelic spelling.
> and a modest set of final
> sounds.
More than Mandarin. Lots of vowels, too -- ten, I think
> It has a native script, derived from some Indian model, which
> is alphabetic but is said to be extremely un-phonetic.
Etymological, as I just wrote.
> Curiously, the
> tones are denoted (inconsistently) by prefixing certain dummy
> consonants to the syllable, like b and r in "'byung rtsis"
> ("astronomy").
No, the tone is determined by the final consonant, which
is most often lost in pronunciation.
> (According to one source, the pleiades are called "sMen-du's" in Tibetan.
> Can we match that to EVA <doaro>?)
With a great deal of imagination. I prefer to link them to Basque
mendi "mountain"!