[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Curious coincidence

To: voynich@xxxxxxxx
Subject: Re: Curious coincidence
From: Jorge Stolfi <stolfi@xxxxxxxxxxxxx>
Date: Sun, 11 Jun 2000 20:07:30 -0300 (EST)
Delivered-to: reeds@research.att.com
In-reply-to: <39440A46.BF2E8E2C@voynich.nu>
References: <200006100037.VAA22543@coruja.dcc.unicamp.br> <39440A46.BF2E8E2C@voynich.nu>
Reply-to: stolfi@xxxxxxxxxxxxx
Sender: jim@xxxxxxxxxxxxx

    > [Rene:] If there really is a 50% chance of having a gallows or
    > not, how close are the numbers allowed to be? A difference of 80
    > seems almost too small.

Well, the variance of a 0-1 coin toss is 1/2, right? So the standard
deviation of the sum of N = 34806 independent coin tosses should be
sqrt(N/2) = 131. 

Thus 40 ( = 80/2) is a bit better than what we would expect,
but still not suspiciously too good, I would say.

(Beware that there *is* noise in my data, at the level of 100-200
tokens if not more. So even if the original text had a perfect 50-50
split, my counts would be only approximately equal.)

    > By the way, I presume that 'gallows' also include the pedestalled
    > ones....

It doesn't matter for this particular statistic, since in either
case the tabulated variable is the presence or absence of [ktpf].

    > Does your count include the labels (and other non-flowing text)?

It includes circular and "radial" text from the diagrams, but not
labels proper (such as the zodiac star labels), nor the key-like
sequences.

    > The three options above don't really explain why it should be
    > 50/50 and not, say, 40/60, unless you go to some kind of binary
    > encoding, as you suggest also.
    
It is not necessary to assume a full binary encoding. For, instance,
suppose the units-place decimal digits are encoded as

  0=nothing  1=k  2=e  3=ke  4=ch  5=kch  6=sh  7=tch  8=ee  9=tee
  
Encoding a string of largish numbers (e.g. entries from a codebook) with
this encoding would result in an even split between words with gallows
and words without. Again, it is *not* necessary that the codebook be
"random", as long as it is independent of the plaintext.
  
By the way, I recall a couple of letters in Kircher's correspondence
about his "universal language".  (I believe one of them was from 
Don Caramuel y Lobkowicz, Czech-born bishop/cardinal of Naples (?),
who was of course a close friend of our close friend Marci. 8-).

I got the impression that Kircher's language was some sort of codebook
scheme, where the word codes were written in roman numerals. Do you
happen to know something more about it?
    
    > It would have to mean also, that each word is 'constructed'. Assuming
    > for the moment a word-by-word (or by syllable) translation of some 
    > source text, then whether or not a gallows appears depends on some
    > property of the original word.
    > A 50% chance could appear in many circumstances, e.g. depending on the
    > number of characters in the original word (odd/even)
    
Ah yes, I didn't think of that. More genrally, a "pseudo-random"
encoding that is applied to each word individually (as opposed to the
whole text as a single string) could also explain the 50-50 split,
without messing up the Zipfian word frequencies and the peculiar word
structure.

    > stress on odd or even syllable, etc, etc. (This will not always
    > lead to 50% chance either).
    
Indeed.  So, if it's not a coincidence, we seem to be left with a
codebook scheme, word-by-word encription, or random noise...

All the best,

--stolfi

Follow-Ups:
- Re: Curious coincidence
  - From: Gabriel Landini
- Some first impressions
  - From: Woody Brison

References:
- Curious coincidence
  - From: Jorge Stolfi
- Re: Curious coincidence
  - From: Rene Zandbergen

Prev by Date: Re: About Thaddeus Hajek
Next by Date: Re: About Thaddeus Hajek
Previous by thread: Re: Curious coincidence
Next by thread: Re: Curious coincidence
Index(es):
- Date
- Thread