[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: About Turkish
> [Rene:] If Voynichese is a language that has articles, they are attached
> to the words (as in Arabic), unless Voynichese is a highly verbose
> encrytion of a language. Czech doesn't use articles (at least in
> the official language), so by gross extrapolation I'm assuming
> that this is true for more (most/all?) Slavionic languages of
> the late middle ages. Most Romance and Germanic languages do
> use articles (except of course Latin).
IIRC, Romanian also is exceptional in that the articles are suffixed
to the noun.
> But, it is also worth mentioning that Turkish is one of the most
> regular languages in the world. There are hardly any exceptions
> to the grammatical rules, if at all (!)
Hm... wasn't the author of the claim a Turk, perchance? 8-)
(But I have seen it stated elsewhere, I suppose it is true...)
> I can see how Turkish written in an Arabic script might share
> a number of statistical oddities with Voynichese.
Perhaps, but that is not what I was thinking of. See below.
BTW, here are the numbers and months in Turkish:
1 = bir 10 = on 100 = yüz
2 = iki 20 = yirmi 200 = iki yüz
3 = üç 30 = otuz
4 = dört 40 = kIrk
5 = beS 50 = elli
6 = altI 60 = altmIS
7 = yedi 70 = yetmiS
8 = sekiz 80 = seksen
9 = dokuz 90 = doksan
11 = on bir
12 = on iki
= ...
999 = dokuz yüz doksan dokuz
The fraction X/Y is expressed as "Y-IN X"
where "-IN" is the locative suffix -{t|d}V2:
2/3 = "uçte iki" ("in 3, 2")
15/100 = "yuzde on beS" ("in 100, 15")
The ordinal Nth is formed with the suffix -(V4)ncV4:
1st = birinci
2nd = ikinci
3rd = üçüncü
4th = dördüncü
...
11th = on birinci
...
The distributive is formed with -(S)V2r:
1 at a time = birer
2 at a time = ikiSer
3 at a time = üçer
...
6 at a time = altISar
...
Days of the week (usually followed by "günü" = "day-of")
sunday pazar
monday pazartesi
tuesday salI
wednesday çarSamba
thursday perSembe
friday cuma
saturday cumartesi
Month names (older names in parenthesis)
january ocak (ikinci kânun, son kânun)
february Subat
march mart
april nisan
may mayIs
june haziran
july temmuz
august aGustos
september eylûl
october ekim (birinci teSrin, ilk teSrin)
november kasIm (ikinci teSrin, son teSrin)
december aralIk (birinci kânun, ilk kânun)
> My guess from the above is that Turkish would have a tendency to
> form long words.
Yes. This is moderately true of modern Turkish, and I
have seen hints that it was much more so for classical Turkish.
I should try to get hold of a sample, and do some syllable
counting.
> These would be split up by the orthographic breaks
> caused by the script. ...
>
Turkish seems to have a fairly well-defined notion of word,
namely a stem and its suffixes (which are fairly distinct
sets). So I see three natural spacing rules for it:
(0) no spaces
(1) spaces between words
(2) spaces between elements (stems and suffixes)
The modern script uses rule (1). I don't know which rule
was used with the Arabic script (nor even whether
the question makes sense at all).
> An unanswerable question is, of course, whether one should expect
> that if someone 'invented' a new script for (in this case)
> Turkish, he would stick to the same orthographic breaks.
> I would say that the answer is yes if the VMs writer couldn't
> actually understand the input text he was copying/converting.
I believe that the length and structure of the VMS words
pretty much rules out (1) and is consistent with (2).
Moreover, (2) but not (1) seems to offer an explanation
for the long runs in the gallows-bit sequence.
> What is missing in this model [...] is the tendency of
> Voynichese words to follow strict patterns.
It seems that almost all Turkish suffixes are single syllables;
which, according to the textbook, can have only the the following
structures
V e.g. "o" = "he/she/it"
VC e.g. "ak" = "white"
CV e.g. "ve" = and
CVC e.g. "daG" = mountain
VCC e.g. "üst" = "top"
CVCC e.g. "genç" = "young"
Is it possible to match these patterns to the known Voynichese
word structure? Duh...
Let me try... presumably the gallows are part of the vowel, but not
all of it; so we could guess that (core and mantle) = vowel, crust =
consonants. But this does not seem to work --- the crust letters are
almost always found after the core-mantle; while, on a quick scan,
Turkish suffixes seem to be mostly of the CV, CVC, or CVCC types.
Besides, Voynichese words seem too complicated, and there seem to be
too many of them. So if they are indeed Turkish syllables, the
correspondence is not that simple...
> For this I can see one other explanation: Voynichese as a
> result of numbers written in the Arabic script.
>
> For a moment I thought that the recent discussion about the
> 50% probability of having a gallows character in a word was
> a confirmation of this, but it doesn't really fit.
> Stolfi suggested that the gallows could be part
> of 'low-bit' information (which of course is not the same as
> saying that Voynichese is a binary code!).
Indeed, it was the 50-50 split that made me think of codebook cipher.
But I don't believe in that explanation any more; I can't see how a
codebook cipher can generate those long runs in the gallows-bit
sequence. If Voynichese words are indeed numbers, the
sequence must involve some non-trivial algorithm with memory.
> I'm still lacking a good explanation for the occurrence of
> the character sequence 'ed' (in Eva)
This topic deserves a sepearate message...
All the best,
--stolfi