[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: WG: average word length in VMS



Jorge Stolfi wrote:
 
> If I understood Jacques's description correctly, Sukhotin's algorithm
> looks for two subsets C and V of the alphabet that maximize the
> frequency of CV and VC transitions in the words.

Yes. But Sukhotin also wrote that his solution was not optimal.
I say "who cares?": it is computationally unbelievably inexpensive.


> I seem to recall that
> Sukhotin's algorithm applied to Voynichese produced only a few
> unconvincing results that led nowhere (probably echoes of the OKOKOKO
> model). Part of the problem may have been the multiletter
> Voynichese->EVA encoding, which tends to obscure the C-V alternations.

Not part of the problem. The whole problem, I'd say, or very close.

> But even if we were to use the true Voynichese alphabet, the KMC
> strucure would probably prevent the algorithm from finding
> enough CV and VC transitions to call home about.

Not  sure. I once tried the algorithm on a word list  of a language
of Vanuatu which was 90% consonants -- but had 10 vowels. Every
vowel identified, none misidentified. 
 
> On the other hand, the KMC structure is not unlike the structure of
> single *syllables* in Latin and other natural languages. Syllable
> boundaries are partly a matter of convention; but, off of my head, I
> would guess that the Latin syllable can be said to have the general
> structure SCRVVN where all letters are optional except for one V; and
> S, R, and N are specific subsets of the consonnats:
> 
>   in prin ci pio cre a vit de us cae lum et te rram
>   te rra au tem e rat i na nis et va cu a et te ne brae
>   su per fa ci em a by ssi ...
> 
> So it is tempting to identify the core letters K (gallows) of the KMC
> model with the main consonant C of the syllable; the mantle letters M (chairs)
> with the secondary consonants S and R; the crust letters C (dealers) with the
> vowels; and the final groups <iin>, <in>, etc. with the final consonants N.

Look, I know  Latin, I know Chinese. The pattern you have uncovered
looks
strikingly like Chinese. Latin? Let  me scratch my head.  Scratch...
scratch...
scratch... scratch... er.... scratch... scratch... please don't wait
for me.
> 
> This theory has some strengths; for instance it seems to fit the
> observation that dealers (=vowels) often occur alone, whereas gallows,
> tables, and finals (=consonants) almost never do.

If gallows are tonal marks they won't occur alone. Ditto if they
mark vowel/consonant length or/and aspiration. Ever  seen a
spiritus asper occur alone in Greek, and a shaddah without
a letter underneath in Arabic? (Not to mention a Portuguese
tilde without an a, an o, or an e, but not an n!)
 
> Enter the Chinese hypothesis:
 [...]
> Also Mandarin
> has only one or two finals, whereas Voynichese has three or four
> common ones, and a few rare ones.

Mandarin, until the turn of the century, had a glottal-stop
final, analysed by Sinologists as a tone (ru4-sheng1 "entrant tone").
It arose from the loss of earlier finals -p, -t, -k. Cantonese
has kept them, plus final -m, gone to -n in Mandarin. When you
take into account all the other Chinese "dialects" of which
I know about only a few....

> but I see five diacritics, so perhaps there were
> more than 4 tones back then.

That is the ru4-sheng1, really a glottal stop or
perhaps an unreleased k.

> I also see a "chum", so there may have
> been more final sounds too.

Well, a final -m, still not assimilated to -n.
 
 
> I have been told that Manchu and Mongolian, although they are
> unrelated to Chinese, may fit the bill too. But the best candidate
> outside China may be Tibetan. It is a syllabic language with a
> rudimentary tone system, consonant clusters

The consonant clusters are only in the writing. They are 
mostly etymological. Used to be pronounced a long time ago.
It's much like Gaelic spelling.

> and a modest set of final
> sounds. 

More than Mandarin. Lots of vowels, too -- ten, I think

> It has a native script, derived from some Indian model, which
> is alphabetic but is said to be extremely un-phonetic. 

Etymological, as I just wrote.

> Curiously, the
> tones are denoted (inconsistently) by prefixing certain dummy
> consonants to the syllable, like b and r in "'byung rtsis"
> ("astronomy"). 

No, the tone is determined by the final consonant, which
is most often lost in pronunciation.

 
> (According to one source, the pleiades are called "sMen-du's" in Tibetan.
> Can we match that to EVA <doaro>?)

With a great  deal of imagination. I prefer to link them to Basque
mendi "mountain"!