[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Re: Moot points, getting long



On Thu, 5 Aug 2004, Gabriel Landini wrote:
> Furthermore, as Elmar noted recently, considering those <e>, <i> and <ch>
> represented as single or double characters does not explain any unknowns any
> better:
> 1. word-frequency statistics remain exactly the same, and
>
> 2. character agglomeration does not increase the entropy to any values near
> those of natural languages.

And note that even though it might be more convenient to have all basic
elements in the script represented by single characters, natural scripts,
i.e., the attested alphabetic and quasi-alphabetic scripts of the world,
include many cases of digraphs and higher level agglomerations
representing "simple sounds," and of single graphs representing complex
sounds, not to mention various schemes that systematically omit some
sounds.

I am, however, worried that the discretization of elements in the
transcriptions might produce artifacts in the statistics or thinking about
them. For example, if a is ei, what does that do to the status of a, e,
and i?  If c is e, what about schemes that tell us c does one thing and e
another?  If b and/or o are en and n is in, using n stricting for the
ending graph, what does that do to the analysis of e i o and n?

> Let's not forget about known languages, where strange things do also happen.
> For instance, in Spanish the letter "q" is *always* followed by "u" and then
> only by "i" or "e".

Or in Teton Dakotan, b and g are essentially always followed by l, and b
(only) occurs with extreme rarity in a few other words, e.g., bebela
'baby', a rare French loan (with native diminutive), plus one or two
native words, e.g., kabu 'to beat a drum'.

G does not occur alone, strictly speaking, but g-dot (gamma) can be
written g with little chance of confusion and some sources do this fairly
regularly, but not consistently, e.g., on popular dictionary uses g-dot in
the Teton to English side and g in the English to Teton side.  Some
sources write g in truncated k-final roots for what is more or less k or
eng, e.g., s^uNgmaNniNtu 'wolf, coyote'.  In a few words gm and gn occur
as syllable initials, e.g., wagmuN 'cat' or gnaNs^ka 'frog'.  Not in the
preceding, however, which is really s^uNk-maNniNtu.

Note that l does occur alone.

It also happens that Teton and related languages strongly tend to have
clusters and complex stops (aspirates, ejectives) only in certain
morphosyntactic positions, e.g., root initially, cf. ble 'lake', gleza
'spotted', etc., or in the (prefixal) inflection of certain kinds of
simple-initial roots, e.g., ble 'I go', le 'you go', ye 's/he goes', etc.
A vanishingly small set of words have bl root medially, e.g., waNbli
'eagle'.  It can be extremely difficult to find minimal pairs for some
sounds, because they don't occur in the same context in words.

I grant that I sincerely doubt that we have to deal with a Siouan language
in the VMs - I would go so far as to say the possibility could be rejected
a priori - but we should be careful in our assumptions about what is
natural and not.

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list