[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About Turkish

To: voynich@xxxxxxxx
Subject: Re: About Turkish
From: Jorge Stolfi <stolfi@xxxxxxxxxxxxx>
Date: Sun, 18 Jun 2000 23:14:26 -0300 (EST)
Delivered-to: reeds@research.att.com
In-reply-to: <394C7E12.916FCE99@voynich.nu>
References: <200006172339.UAA28582@coruja.dcc.unicamp.br> <394C7E12.916FCE99@voynich.nu>
Reply-to: stolfi@xxxxxxxxxxxxx
Sender: jim@xxxxxxxxxxxxx

    > [Rene:] If Voynichese is a language that has articles, they are attached
    > to the words (as in Arabic), unless Voynichese is a highly verbose
    > encrytion of a language. Czech doesn't use articles (at least in
    > the official language), so by gross extrapolation I'm assuming
    > that this is true for more (most/all?) Slavionic languages of
    > the late middle ages. Most Romance and Germanic languages do
    > use articles (except of course Latin).

IIRC, Romanian also is exceptional in that the articles are suffixed
to the noun.

    > But, it is also worth mentioning that Turkish is one of the most
    > regular languages in the world. There are hardly any exceptions
    > to the grammatical rules, if at all (!)

Hm... wasn't the author of the claim a Turk, perchance? 8-)
(But I have seen it stated elsewhere, I suppose it is true...)

    > I can see how Turkish written in an Arabic script might share
    > a number of statistical oddities with Voynichese.
    
Perhaps, but that is not what I was thinking of.  See below.

BTW, here are the numbers and months in Turkish:

      1 = bir     10 = on      100 = yüz     
      2 = iki     20 = yirmi   200 = iki yüz 
      3 = üç      30 = otuz  
      4 = dört    40 = kIrk  
      5 = beS     50 = elli  
      6 = altI    60 = altmIS
      7 = yedi    70 = yetmiS
      8 = sekiz   80 = seksen
      9 = dokuz   90 = doksan

     11 = on bir
     12 = on iki
        = ...
    999 = dokuz yüz doksan dokuz

    The fraction X/Y is expressed as "Y-IN X"
    where "-IN" is the locative suffix -{t|d}V2:
    
      2/3    = "uçte iki" ("in 3, 2")
      15/100 = "yuzde on beS" ("in 100, 15")
    
    The ordinal Nth  is formed with the suffix -(V4)ncV4:
    
      1st = birinci
      2nd = ikinci
      3rd = üçüncü
      4th = dördüncü
      ...
      11th = on birinci
      ...
      
    The distributive is formed with -(S)V2r:
    
      1 at a time = birer
      2 at a time = ikiSer
      3 at a time = üçer
      ...
      6 at a time = altISar
      ...
    
  Days of the week (usually followed by "günü" = "day-of")
   
     sunday    pazar      
     monday    pazartesi  
     tuesday   salI       
     wednesday çarSamba   
     thursday  perSembe   
     friday    cuma       
     saturday  cumartesi  
   
  Month names (older names in parenthesis)
  
     january    ocak (ikinci kânun, son kânun)
     february   Subat
     march      mart
     april      nisan
     may        mayIs
     june       haziran
     july       temmuz
     august     aGustos
     september  eylûl
     october    ekim (birinci teSrin, ilk teSrin)
     november   kasIm (ikinci teSrin, son teSrin)
     december   aralIk (birinci kânun, ilk kânun)


    > My guess from the above is that Turkish would have a tendency to
    > form long words.
    
Yes.  This is moderately true of modern Turkish, and I 
have seen hints that it was much more so for classical Turkish.

I should try to get hold of a sample, and do some syllable
counting.  

    > These would be split up by the orthographic breaks
    > caused by the script.  ...
    > 

Turkish seems to have a fairly well-defined notion of word,
namely a stem and its suffixes (which are fairly distinct 
sets).  So I see three natural spacing rules for it:

   (0) no spaces
   
   (1) spaces between words
   
   (2) spaces between elements (stems and suffixes)
   
The modern script uses rule (1). I don't know which rule
was used with the Arabic script (nor even whether
the question makes sense at all).

    > An unanswerable question is, of course, whether one should expect
    > that if someone 'invented' a new script for (in this case)
    > Turkish, he would stick to the same orthographic breaks.
    > I would say that the answer is yes if the VMs writer couldn't
    > actually understand the input text he was copying/converting.

I believe that the length and structure of the VMS words 
pretty much rules out (1) and is consistent with (2).
Moreover, (2) but not (1) seems to offer an explanation 
for the long runs in the gallows-bit sequence.

    > What is missing in this model [...] is the tendency of 
    > Voynichese words to follow strict patterns.

It seems that almost all Turkish suffixes are single syllables;
which, according to the textbook, can have only the the following
structures

  V     e.g. "o" = "he/she/it"
  VC    e.g. "ak" = "white"
  CV    e.g. "ve" = and
  CVC   e.g. "daG" = mountain
  VCC   e.g. "üst" = "top"
  CVCC  e.g. "genç" = "young"
  
Is it possible to match these patterns to the known Voynichese
word structure?  Duh...

Let me try... presumably the gallows are part of the vowel, but not
all of it; so we could guess that (core and mantle) = vowel, crust =
consonants. But this does not seem to work --- the crust letters are
almost always found after the core-mantle; while, on a quick scan,
Turkish suffixes seem to be mostly of the CV, CVC, or CVCC types.

Besides, Voynichese words seem too complicated, and there seem to be
too many of them. So if they are indeed Turkish syllables, the
correspondence is not that simple... 

    > For this I can see one other explanation: Voynichese as a
    > result of numbers written in the Arabic script.
    > 
    > For a moment I thought that the recent discussion about the
    > 50% probability of having a gallows character in a word was
    > a confirmation of this, but it doesn't really fit.
    > Stolfi suggested that the gallows could be part
    > of 'low-bit' information (which of course is not the same as
    > saying that Voynichese is a binary code!). 
    
Indeed, it was the 50-50 split that made me think of codebook cipher.
But I don't believe in that explanation any more; I can't see how a
codebook cipher can generate those long runs in the gallows-bit
sequence.  If Voynichese words are indeed numbers, the 
sequence must involve some non-trivial algorithm with memory.
    
    > I'm still lacking a good explanation for the occurrence of
    > the character sequence 'ed' (in Eva)
    
This topic deserves a sepearate message...

All the best,

--stolfi

References:
- About Turkish
  - From: Jorge Stolfi
- Re: About Turkish
  - From: Rene Zandbergen

Prev by Date: Pronouns, where are they?
Next by Date: Re: About Turkish
Previous by thread: Re: About Turkish
Next by thread: Re: About Turkish
Index(es):
- Date
- Thread