[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About Turkish



    > [Rene:] I'm still lacking a good explanation for the occurrence of
    > the character sequence 'ed' (in Eva) which only starts 
    > appearing little by little in the astro section, to become
    > regular in the B language part.
    
The letter "e" is a problem for word paradigms. It pretty much seems
to be a modifier for other letters. The big question is whether it is
a letter prefix, postfix, both, or neither.

There is one good argument for "e" as a letter prefix: the cosmo
diagram on f69r apparently shows the word "dolsedy" split as
d-o-l-s-ed-y. However, I was unable to build a satisfactory word
paradigm that handles "e" as a prefix. A subconscious block, perhaps?
Anyway, my paradigm still has "e" as a post-modifier for "benches"
(ch/sh/ee) and gallows.

Indeed, it turns out that almost all occurrences of "ed" in language-B
material are preceded by one of those letters. My interpretation is
that the "e" post-modifier, which was relatively rare in language A,
became much more common in laguage B. The appearance of "ed" digraphs
is only a conspicuous secondary effect of that change.

Here are the "new" words in each major B section:

Stars:

    193 0.01792 chedy
    136 0.01263 qokeedy
    118 0.01096 shedy
     94 0.00873 okeey
     61 0.00566 otedy
     61 0.00566 oteey
     59 0.00548 qokedy
     56 0.00520 oteedy
     55 0.00511 lchedy
     49 0.00455 okeedy
     40 0.00371 qokey

Herbal-B:

     59 0.02034 chedy
     39 0.01344 qokedy
     32 0.01103 shedy
     24 0.00827 okedy
     22 0.00758 cheky
     20 0.00689 otedy
     13 0.00448 okeedy
     13 0.00448 ytedy
     12 0.00414 kedy
     12 0.00414 qotedy
 
Biological:

    251 0.03676 shedy
    218 0.03193 chedy
    163 0.02387 qokedy
    153 0.02241 qokeedy
     86 0.01260 qokeey
     59 0.00864 lchedy
     49 0.00718 otedy
     47 0.00688 qotedy
     45 0.00659 okedy
     44 0.00644 sheedy

Note that most of the "new" words of language B end
with "{cs|sh|k|t}{e|ee}dy".

Note also that, besides all those "*edy" words, language B also 
displays frequent new words that have "e" but not "d", such as
"okeey" and "oteey" in Stars, "cheky" in Herbal-B, and "qokeey" in Bio.

Also "chey" and "shey", which do occur in herbal-A, are almost twice
as common in Stars. At the same time, ofther popular language-A words
like "chol" and "chor" disappear. 

So, if the difference between A and B can be described as a 
spelling change, I would rather say that it was a matter
of replacing "o" by "e" as a post-modifier for benches
and gallows.

However, I am afraid that the A/B split cannot be properly described
by a change in structure or spelling or whatever.  The difference 
seems to be a matter of vocabulary ---
which may well be due to a change of topic, nothing else.
In language B we see a bunch of "new" and quite popular
words. Most of them happen to use the "e" modifier, especially
the "Xedy" or "Xeedy" termination; but this is probably because 
the suffix has some special meaning or grammatical importance. 

By the way, I am getting increasinly disenchanted about n-gram based
analysis. It is like feeding all the Louvre paintings through a food
processor, room by room, and then trying to reconstruct the hystory of
art from the resulting piles of colored oatmeal. We would see all sort
of interesting trends, sure. But it would suffice a couple of big
sunset paintings in one room to make us date Picasso a little before
Giotto but well after Monet...

What I am trying to get at is that n-gram statistics are not only a
blurry shadow of word statistics, but are in fact dominated by a few
common words or word families. Thus we should be paying more attention
to whole words...

All the best,

--stolfi