[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: Re: Inks and retouching
Jorge, et al,
Something I feel it needless to point out, but do it every so often just out
of impulse, is that modern word lists have little in common with word lists
from the time of the Voynich. Even using printed books from 1450 to 1550 is
hazardous, because the spellings are driven by the ease of typesetting, done
by a typesetter who got paid by the sheet (4 pages). One only has to look
at different editions of the same book to see this. Some editions are set
letter for letter, where the typesetter was viewing the copy and setting the
type by copy. The other is where the page is read to the typesetter while
he works. The edition is word for word, but not letter by letter. Some
ground breaking work was done on this by scholars studying Shakespeare's
First Folio, but I've seen how their research applies to many other books as
well. None of the typeset books I know of reflect the author's original
spellings and language, with possibly the lone exception of the Monas
Heiroglyphica, where John Dee stayed with the printer until the work was
completed. He probably saw the poor quality of typesetting, and needed to be
certain his work did not suffer such damage. All one has to do is look at
the various editions of Nostradamus to see how badly they've been dealt by
the printers, no two in agreement on virtually anything. My study is of
course English oriented, but the Nostradamus example is a clear indication
that this problem was widespread.
When you look at the English and spelling used in Anthony Askham's
translation of Sacro Bosco's "De Spheara" written in 1526/27, (MS 337 in
Beinecke's catalogue), you see exactly the problem I'm addressing. When
compared with the two printed books known to be written by him, one in 1548
and the other in 1550, the spellings have been modernized, and not even the
dialect is reflected in the printed works. In short, the two sets are not
even in the same "language". I've seen this other places as well, which is
why I try not to use printed books to form dictionaries, though I've done so
for herbals in English because this is my only available source. I own a
copy of MS337 which I use for statistical purposes, but Beinecke's catalogue
entry gives you a pretty good sense of the dialect, if you care to view it.
Another problem one runs into when attempting to formulate lists of words
for statistical analysis is the movement throughout Europe, in each country
at various times, against the latinization of the mother tongue. In England
for instance, during the 1530's, Jonathan Cheke championed the revival of
the Greek tongue, but at the same time strictly opposed the latinization of
the English tongue. He went so far as to devise a phonetic form of spelling
for the English language, to set it apart from other languages. Cheke and
Shakespeare would not have gotten along, since Shakespeare introduced over
1500 latin words into the common English usage.
Then there's a third problem, something I call a monkish polyglot, though
there are probably better names for this phenomenon. In areas where the
Catholic church had a foothold (universally throughout Europe up to the
early 1500's), the common church language was latin, so one might find monks
from many countries in one place, each with their own tongue. This led to
strange mixtures of latin and words from other languages coming together,
sometimes in one document. Since the Voynich is a private document, this is
something to be considered.
I have five different manuscript works from five different parts of England
composed within a decade of each other, with widely varying statistics on
spelling, word length, and word usage, and I don't think I have enough to
cover all the written dialects from England proper, much less Wales,
Ireland, etc. What does this say about other countries around the time of
Voynich construction?
Just a few thoughts,
GC
----- Original Message -----
From: "Jorge Stolfi" <stolfi@xxxxxxxxxxxxx>
To: <vms-list@xxxxxxxxxxx>
Sent: Wednesday, July 21, 2004 4:54 PM
Subject: VMs: Inks and retouching
>
> > [Eric:] I also plotted word length and... I got a binomial plot
> > for it???.
>
> This is Eric's data:
>
> 00 0
> 01 31
> 02 168
> 03 1342
> 04 4719
> 05 10199
> 06 16818
> 07 21118
> 08 22302
> 09 20426
> 10 16409
> 11 11697
> 12 7566
> 13 4451
> 14 2342
> 15 1158
> 16 479
> 17 250
> 18 81
> 19 32
> 20 14
> 21 4
> 22 1
> 23 2
>
> The distribution is "single-humped" but not binomial. Even without
> plotting you can see that it falls off more slowly at the high
> end than at the low end. (The peak is around 8; compare 3 with 13.)
>
> Contrast it with the Voynichese curve, wich is not only symmetrical
> but matches C*choose(9,k-1) almost to the pixel.
>
> Still, the near symmetry of the plot above is quite puzzling. You have
> seen my plots: for English (as for many other languages) I get a much
> longer tail, visibly a second hump. (I don't get that tail with Quran
> Arabic, or the Towneley Plays, or the Asian languages.)
>
> I suspected that the second hump could be due to joined words, but the
> source texts are of fairly good quality, and I spent many hours
> cleaning them up - uniformizing punctuation, disambiguating the "." of
> abbreviation, marking off foreign language bits, etc.
>
> However the UWA list is very "dirty" - it has many foreign words and
> proper names, acronyms, obscure words, etc. It is also very irregular
> in its coverage of plurals and other derived words:
>
> VOL VOLAR VOLATILIZE VOLCANICITY VOLCANOS
> VOL-AU-VENT VOLARY VOLATIZE VOLCANISM VOLE
> VOLAGE VOLATIC VOLBORTHITE VOLCANIST VOLENT
> VOLANT VOLATILE VOLCAN VOLCANIZE VOLES
> VOLANTE VOLATILITIES VOLCANIAN VOLCANO VOLET
> VOLAPUK VOLATILITY VOLCANIC VOLCANOES VOLGOGRAD
>
> Note the lack of VOLATILES, VOLATIZED, VOLCANISTS, etc.
>
> My guess is that the UWA list was in large part derived from dictionaries
> rather than actual texts. A dictionary entry -- especially a minor one --
> will typically list the root word but omit the regular derivatives,
> so you will get VOLATILIZE but not VOLATILIZATION, which in actual
> texts may be even more common than the verb itself.
>
> All the best,
>
> --stolfi
>
> ______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
> unsubscribe vms-list
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list