[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Noise or data ?
I believe Philip Neale has looked into this question, and hit problems
with finding a mechanism for one-way encryption which will produce the
linguistic features of Voynichese (especially with regard to final "m").
It was Prescott Currier in the 1970s who first showed that the frequency
of certain words and certain letters of Voynichese is dependent on their
position in the line: it was he who drew attention to final "m".
What I have said (and I think I was one of the first people to say it) is
that the monotonous internal structure of the words can be simplified if
you assume that they contain unwritten blanks: for instance
qokey
qokeedy
okeedy
opedy
etc can be seen as representing underlying forms
qo_k_e___y
qo_k_E__dy
_o_k_E__dy
_o_p_e__dy
etc. I have suggested various underlying word grammars at different times:
not all Voynichese
words fit them but I can claim to account for about 90% of word tokens on
these principles.
If that was all there is to it, I think it would be very easy to generate
Voynichese stochastically
on the lines Gordon has been trying. You would simply need what is called a
regular expression,
a sort of flow chart in which at each point you select one character or none
from a set
of choices specific to that point.
The trouble is the existence of other constraints on the placing of letters
within the word and the Currier results about differential frequencies at
different points in the line. It seems to me that some of these
cannot be explained *by regular expressions*. A stochastic explanation may
still be possible, but it
would involve a more complicated kind of state space (word grammar, line
grammar) which I think we
still have not got. Gordon has suggested various possibilities in off-list
communications to me.
If anyone has put together a list of features along these lines, it would
be very interesting to see them, and might help identify fruitful areas
for further research.
The ones known to me are these (I am not claiming a general priority here,
many of these have
been known for years):
pktf are sometimes in free variation with q at the beginning of a word, but
this is more frequently
the case where the word in question is the first word of a line, and nearly
obligatory where it is
the first word in a paragraph. The first word in a paragraph often contains
two characters from
this set, far more often than words elsewhere in a paragraph.
y, d, s are sometimes in free variation with q at the beginning of a word,
and more frequently so
when it is the first word in a line, but *not* when it is the first word in
a paragraph.
forms such as shedy, chedy, shey, chey (which I analyse as ____Se__dy,
____Ce__dy etc) are
most frequently found as the second or third words in a line, seldom as the
first word.
ktpf are in free variation with each other after initial qo, qol, qor, o,
ol, or. k and t are more frequent
than p and f. Normally, k is more frequent than t, but as Rene Zandbergen
pointed out on the list
a year or two ago, there are continuous sections of text where t is more
frequent than k.
the sequences ke, te are common but the sequences pe, fe are rare (even
allowing for the fact
that p and f are less common than k and t)
final p and f occur very occasionally: where they do, it is usually in the
middle of the first line
of a paragraph
the sequences el, er, eel, eer are rare
the sequences an, ain, al, ar, ol, or are common, but on and oin are
disproportionately infrequent.
am is in free variation with an, ain, al, ar at the end of a word, but this
is more frequently the case
where the word in question is the last or nearly the last in a line.
s is in free variation with y at the end of a word (e.g oteey, otees, qokey,
qokes). This more
commonly occurs after ee than after e: final s is common in some parts of
the manuscript (though
never more common that final y) and uncommon in other sections.
isolated words like the star labels seldom begin with q
sequences of three consecutive tokens of the same word (eg qokeey qokeey
qokeey) occur
more often than you would expect in natural language.
triplets of three consecutive tokens of three different words have fewer
repeated occurrences
than you would expect in natural language (e.g. there are no triplets like
'and of the' which occur
together again and again in English text).
Observations like these cannot be explained purely in terms of a regular
expression: they involve
what linguists call dependencies which are usually (in connection with
natural languages)
analysed using tree structures or bracketed lists. Which brings us back to
the phenomenon of
the line and paragraph as a structural unit. It was Currier who noticed
this, but neither he nor
anyone since has explained why this should be so.
Philip Neal
_________________________________________________________________
On the move? Get Hotmail on your mobile phone http://www.msn.co.uk/msnmobile
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list