Re: Doubled words

Jorge Stolfi

> [Philip Neal:] If the current word is qokeey, there is a 6% > chance that the next word will be qokeey. - [This] distribution > is not characteristic of names, is very characteristic of all > the high frequency Voynich words, and is strong evidence for > Currier's view that the words are not words at all.

The repetitions of "qokeey" are indeed exceptional, but they don't prove
the concludion.  After all, only a few VMS words behave like that.
Moreover, repetitive names *do* occur in some languages: "Sing Sing",
"Bora Bora", "Ping-Ping" (the name of a Chinese friend of mine), ...

I can't disprove the Chinese hypothesis: it has some plausibility given the structure of Voynichese, but I don't think it is probable historically. I will think it through one of these days.

However, I maintain my belief that the distribution of doublets in
the Voynich B language is such that the 'words' cannot be words
of a European language enciphered in their normal order.

I take it that your statistics refer to the entire Takehashi
transcription excepting the labels. Given that there may be more
than one language in the VMS I prefer to concentrate on short
samples of text apparently in one language or dialect. The
following table is based on a sample of 4089 words of the
biological section. It lists the 10 most frequent words with

A their absolute frequency,
B their frequency as doublets,
C their frequency as triplets

	A	B	C
chey	61	1	0
qokeey	71	1	0
qokaiin	72	1	0
qokal	82	2	0
ol	109	11	1
qokedy	110	6	2
qokain	127	2	0
qokeedy	135	9	0
chedy	141	4	0
shedy	155	6	0

Here are the same statistics for a 4156 word extract from a Latin
text by Francis Bacon.

	A	B	C
de	31	0	0
aut	33	0	0
ad	37	0	0
est	42	0	0
quae	42	0	0
non	43	0	0
ut	44	0	0
atque	45	0	0
in	79	0	0
et	232	0	0

I think that this table on its own shows that Voynich Bio is not a
word for word encipherment of Latin. Note that all ten Latin words
are particles with no meaning in isolation. A decipherment of the
VMS into Latin which did not give words of this type very high
frequency would not be plausible. But equally, it is in the nature
of grammatical particles that they do not cluster in doublets like
the most frequent Voynich Bio words. Of the ten, only 'quae' and
'non' have any plausible meaning as the doublets 'quaequae' and
'non non'. As for triplets, what would 'ut ut ut' or 'ad ad ad'
mean? Sporadic doubling can be explained on the lines of your
'Amen Amen' or 'Lord Lord', but not regular doubling of very frequent

In the past I have generated similar statistics for a sample of
Luther's German showing that frequent words are mostly grammatical
particles and mostly do not form doublets. I have not got the results
at hand, but they looked similar to my table above. I am fairly
confident that you would get the same result for languages such as
English, French and Italian.

I shall continue to take the approach that the words are not words,
and that 'qokeey' is some kind of fingerprint of a concrete word.

In saying this, I do not mean to suggest that your work is without
merit. I think that this kind of structural analysis is the key to
the problem.

Best wishes

Philip Neal

