[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fw: Gallows bit sequences
Jorge accidentally sent this to my address alone vice the entire list:
----- Original Message -----
From: Jorge Stolfi <stolfi@xxxxxxxxxxxxx>
To: John Grove <John@xxxxxxxxxxxx>
Sent: Tuesday, June 13, 2000 11:39 AM
Subject: Re: Gallows bit sequences
>
> > [Jim Gillogly:] Without having done any computation, it looks
> > like there'll be more long runs of gallows and non-gallows than
> > a coin-flipping model would suggest, and that they're more
> > likely in the first line of a "paragraph" than
>
> Indeed. How could we explain them?
>
> Perhaps we are seeing the result of some automaton-like code (like
> those Dee tables that Jim Reeds wrote about recently.) I will leave
> this line of speculation to the experts; let me instead suggest a
> linguistic explanation.
>
> As far as I know, Turkish words usually consist of a stem (usually 1-3
> syllables) followed a string of suffixes (usually 1 syllable each).
> Turkish has a rich set of suffixes, which often translate as separate
> words in Indo-European languages.
>
> Sometimes the string of suffixes can be quite long. A recent thread
> in sci.lang discussed the "Guinness record" Turkish word
>
> Cekoslovakyalilastiramadiklarimizdanmiymissiniz
>
> which, if I recall correctly, means "are you one of those whom we were
> unable to Chekoslovakize?".
>
> Moreover, Turkish has this peculiar rule of "vowel harmony". The
> vowels are divided into two symmetric classes, "front" and "back".
> Generally speaking, in every Turkish stem, all vowels have the same
> class; every suffix has a "front" and "back" version; and one may
> only use suffixes that belong to same vowel class as the stem.
>
> So, suppose that Voynichese is Turkish, each VMS "word" is actually a
> Turkish stem or a suffix, and the gallows letters are vowel markers
> for the front/back quality. This theory could explain the long runs of
> 0's and 1's in the "gallows bit" strings.
>
> Perhaps the same case could be made for Hungarian, which I believe has
> similar rules for suffixing and vowel harmony (isn't it remotely
> related to Turkish?). Then there are other Turkic languages in Asia
> (Uzbek, Chechen, ... ?)
>
> If I am not mistaken, in the 1400's Turkish was commonly written in the
> Arabic script. That script is not as well suited to Turkish as it is
> to Arabic (which is one of the reasons why the country switched to the
> Roman alphabet early this century). So the VMS author would have had a
> good excuse for inventing a new alphabet, rather than using the
> standard one --- especially if he/she was a "cultural transplant"
> (a Turk in Europe, or an European in Turkey).
>
> By the way, I gather that, until quite recently, public baths were
> quite popular in Turkey (for both sexes), at least as much as in
> classical Rome.
>
> One obvious check for this theory is to see whether the ratio of front
> and back words in Turkish is close enough to the gallows/no-gallows
> ratio of Voynichese.
>
> Anyone knows the Turkish names for the planets? Or the main star in
Pisces?
> Or how one would say "left kidney" and "right kidney"? ;-)
>
> > [John Grove:] I believe the labels are extremely heavy in favour
> > of using Gallows. Just a quick glance at a zodiac page shows a
> > significantly higher number of words with Gallows than without.
>
> This is not that relevant now... but indeed, the ratio is about 3:2
> for labels. However labels are few compared to the text, so they have
> little effect on the overall ratio.
>
> The token counts for all sections are
>
> gallows in word
> -----------------------------
> ? 0 1 2 3 | tot SD
> ----- ----- ----- ----- ----- | ----- -----
> text 2257 17363 17439 323 3 | 37385 96
> labels 149 386 590 29 0 | 1154 16
> ----- ----- ----- ----- ----- | ----- -----
> both 2406 17749 18029 352 3 38539 98
> - 49.1% 49.9% 1.0% 0.0%
>
> (The percentages are over the good words only.)
>
> All the best,
>
> --stolfi
>