[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: About Turkish
> [Rene:] I'm still lacking a good explanation for the occurrence of
> the character sequence 'ed' (in Eva) which only starts
> appearing little by little in the astro section, to become
> regular in the B language part.
The letter "e" is a problem for word paradigms. It pretty much seems
to be a modifier for other letters. The big question is whether it is
a letter prefix, postfix, both, or neither.
There is one good argument for "e" as a letter prefix: the cosmo
diagram on f69r apparently shows the word "dolsedy" split as
d-o-l-s-ed-y. However, I was unable to build a satisfactory word
paradigm that handles "e" as a prefix. A subconscious block, perhaps?
Anyway, my paradigm still has "e" as a post-modifier for "benches"
(ch/sh/ee) and gallows.
Indeed, it turns out that almost all occurrences of "ed" in language-B
material are preceded by one of those letters. My interpretation is
that the "e" post-modifier, which was relatively rare in language A,
became much more common in laguage B. The appearance of "ed" digraphs
is only a conspicuous secondary effect of that change.
Here are the "new" words in each major B section:
Stars:
193 0.01792 chedy
136 0.01263 qokeedy
118 0.01096 shedy
94 0.00873 okeey
61 0.00566 otedy
61 0.00566 oteey
59 0.00548 qokedy
56 0.00520 oteedy
55 0.00511 lchedy
49 0.00455 okeedy
40 0.00371 qokey
Herbal-B:
59 0.02034 chedy
39 0.01344 qokedy
32 0.01103 shedy
24 0.00827 okedy
22 0.00758 cheky
20 0.00689 otedy
13 0.00448 okeedy
13 0.00448 ytedy
12 0.00414 kedy
12 0.00414 qotedy
Biological:
251 0.03676 shedy
218 0.03193 chedy
163 0.02387 qokedy
153 0.02241 qokeedy
86 0.01260 qokeey
59 0.00864 lchedy
49 0.00718 otedy
47 0.00688 qotedy
45 0.00659 okedy
44 0.00644 sheedy
Note that most of the "new" words of language B end
with "{cs|sh|k|t}{e|ee}dy".
Note also that, besides all those "*edy" words, language B also
displays frequent new words that have "e" but not "d", such as
"okeey" and "oteey" in Stars, "cheky" in Herbal-B, and "qokeey" in Bio.
Also "chey" and "shey", which do occur in herbal-A, are almost twice
as common in Stars. At the same time, ofther popular language-A words
like "chol" and "chor" disappear.
So, if the difference between A and B can be described as a
spelling change, I would rather say that it was a matter
of replacing "o" by "e" as a post-modifier for benches
and gallows.
However, I am afraid that the A/B split cannot be properly described
by a change in structure or spelling or whatever. The difference
seems to be a matter of vocabulary ---
which may well be due to a change of topic, nothing else.
In language B we see a bunch of "new" and quite popular
words. Most of them happen to use the "e" modifier, especially
the "Xedy" or "Xeedy" termination; but this is probably because
the suffix has some special meaning or grammatical importance.
By the way, I am getting increasinly disenchanted about n-gram based
analysis. It is like feeding all the Louvre paintings through a food
processor, room by room, and then trying to reconstruct the hystory of
art from the resulting piles of colored oatmeal. We would see all sort
of interesting trends, sure. But it would suffice a couple of big
sunset paintings in one room to make us date Picasso a little before
Giotto but well after Monet...
What I am trying to get at is that n-gram statistics are not only a
blurry shadow of word statistics, but are in fact dominated by a few
common words or word families. Thus we should be paying more attention
to whole words...
All the best,
--stolfi