[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Arguments against a code book?



  > [Nick Pelling:] if Voynichese were a simple-minded code-book [...]
  > I think we might expect to see (1) rather more obvious matches
  > between labels and text (etc) than we do.
  
It has been several years since I looked at that question, but 
my recollection is that there *is* a modest amount of correlation
between the labels and the text.  It is not obvious that there 
should be more. For example, imagine you have a one-page article 
about Brazil, next to a map of the country with city and river
names: how many of those names do you expect to show up in the
article (or in the whole book)?

One interesting fact is that most labels occur only once in the 
whole book, although some of the multi-word labels contain 
common words.
  
  > (2) a language-like distribution of word frequency
  
That it does: Voynichese follows Zipf's law fairly close.

I am trying to write up a page comparing the word frequency
distribuiton of Voynichese to those of several languages. (Curiously,
among my samples modern English is the language that best follows
Zipf's law. Other Indo-European languages deviate from it somehow at
the high-frequency end. Arabic, Hebrew, and Geez deviate a lot more.
Asian monosyllabic languages deviate still more, at both ends. I will
not say here what Voynichese does 8-), but its deviations from ideal
Zipf are well within the range of variation of natural languages.

  > (3) uniform text structure across the text (unlike the differences
  > between Currier A & B we do see)

I do not see such a big difference; IMHO, Currier got that impression
only because he looked at two parts of the manuscript (Herbal A and B)
which seem to be at opposite ends of what is better described as a
spectrum. 

According to the only handwriting expert report we have, the VMS
script it is in one hand throughout. Word frequency plots like those
by Rene and myself here show fairly clear differences in word
frequency between sections, but many of the words do occur throughout
the book -- something that is not usually seen with different
languages, even closely related ones like Spanish and Portuguese.
Moreover the differences do not seem systematic, as they typically are
in natural languages (e.g. Portuguese "ção" -> Spanish "ción" ->
Italian "zione" -> English "tion" etc.)

Under the codebook hypothesis, the wordfreq differences between
sections are due to different subject matters; and the differences in
digraph frequencies may be due to the need to use new character
combinations as the author had to use new words and assigned
increasing numbers to them. E.g., if one used Roman numerals in
sequence, the letter pair "CD" would arise only after the codebook
grew to 400 entries, "DC" after 600, "CM" after 900, etc.

One strong argument for the codebook hypothesis is that the percentage
of words with gallows is surprisingly close to 50%, and ditto for the
words with "benches" (EVA "sh", "ch", "ee"). Moreover the two traits
seem to be almost completely independent, and words with two gallows
seem to be "exceptional" in many other ways. I do not know whether
such statistics can arise in natural languages (after my egregious
mistake about the binomial word lengths, I would not risk any guess on
this). However such stats do arise in roman numerals: exactly 50% of
the numbers in 0-999 have a "V", exactly 50% have a "L", none have two
"V"s or two "L"s, and there is zero correlation between these two
traits...

All the best,

--stolfi
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list