[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dovetail battlements in Rome?

    > Can anybody explain how Sukhotin does the job of sorting out
    > vowels and consonants?
I recall that Jacques posted an explanation to the mailing list,
a couple of years ago.

The basis of the algorithm is the observation that in most languages
Vs and Cs have a tendency to alternate: Vs are mostly surrounded by
C's and vice-versa. So the algorithm tries to find a partition of the
alphabet in two classes X and Y that maximizes the number of XY and YX
pairs, and minimizes the number of XX and YY pairs.

    > Did anybody apply that method to VMS, and if yes, were the
    > symbols in VMS reliably shown to be either vowels or consonants?
Jacques did, and I gather that the results were incoclusive. 

I have tried to do roughly the same thing, by hand and by ad-hoc
algorithms. I did find some structure in the aphabet, which is now
part of the crust-mantle-core paradigm.

Basically, one can distingush several classes of letters (gallows,
benches, dealers, etc.) which have similar digraph statistics; but
there doesn't seem to be any simple mapping of those classes to a
plausible `vowels and consonants' bipartition. Moreover, although
those statistical classes seem clar-cut, and are fairly compatible
with the morphological classification of the symbols, if I slightly
change the similarity measure, I get a very different set of classes
--- also clear-cut and compatible.

But two days ago I noticed another weird thing about the digraph
frequencies, with sort of explains that ambiguity (and why Sukhotin's
algorithm couldn't possibly work). Stay tuned...

All the best,