[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Noise or data ?

To: vms-list@xxxxxxxxxxx
Subject: Re: VMs: Noise or data ?
From: "Philip Neal" <philipneal_vms@xxxxxxxxxxx>
Date: Wed, 21 May 2003 14:11:29 +0000
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx

I believe Philip Neale has looked into this question, and hit problems
with finding a mechanism for one-way encryption which will produce the
linguistic features of Voynichese (especially with regard to final "m").


It was Prescott Currier in the 1970s who first showed that the frequency
of certain words and certain letters of Voynichese is dependent on their
position in the line: it was he who drew attention to final "m".

What I have said (and I think I was one of the first people to say it) is
that the monotonous internal structure of the words can be simplified if
you assume that they contain unwritten blanks: for instance
qokey
qokeedy
okeedy
opedy

etc can be seen as representing underlying forms

qo_k_e___y
qo_k_E__dy
_o_k_E__dy
_o_p_e__dy

etc. I have suggested various underlying word grammars at different times: not all Voynichese words fit them but I can claim to account for about 90% of word tokens on these principles.

If that was all there is to it, I think it would be very easy to generate Voynichese stochastically on the lines Gordon has been trying. You would simply need what is called a regular expression, a sort of flow chart in which at each point you select one character or none from a set of choices specific to that point.

The trouble is the existence of other constraints on the placing of letters within the word and the Currier results about differential frequencies at different points in the line. It seems to me that some of these cannot be explained *by regular expressions*. A stochastic explanation may still be possible, but it would involve a more complicated kind of state space (word grammar, line grammar) which I think we still have not got. Gordon has suggested various possibilities in off-list communications to me.

If anyone has put together a list of features along these lines, it would
be very interesting to see them, and might help identify fruitful areas
for further research.

The ones known to me are these (I am not claiming a general priority here, many of these have been known for years):

pktf are sometimes in free variation with q at the beginning of a word, but this is more frequently the case where the word in question is the first word of a line, and nearly obligatory where it is the first word in a paragraph. The first word in a paragraph often contains two characters from this set, far more often than words elsewhere in a paragraph.

y, d, s are sometimes in free variation with q at the beginning of a word, and more frequently so when it is the first word in a line, but *not* when it is the first word in a paragraph.

forms such as shedy, chedy, shey, chey (which I analyse as ____Se__dy, ____Ce__dy etc) are most frequently found as the second or third words in a line, seldom as the first word.

ktpf are in free variation with each other after initial qo, qol, qor, o, ol, or. k and t are more frequent than p and f. Normally, k is more frequent than t, but as Rene Zandbergen pointed out on the list a year or two ago, there are continuous sections of text where t is more frequent than k.

the sequences ke, te are common but the sequences pe, fe are rare (even allowing for the fact that p and f are less common than k and t)

final p and f occur very occasionally: where they do, it is usually in the middle of the first line of a paragraph

the sequences el, er, eel, eer are rare

the sequences an, ain, al, ar, ol, or are common, but on and oin are disproportionately infrequent.

am is in free variation with an, ain, al, ar at the end of a word, but this is more frequently the case where the word in question is the last or nearly the last in a line.

s is in free variation with y at the end of a word (e.g oteey, otees, qokey, qokes). This more commonly occurs after ee than after e: final s is common in some parts of the manuscript (though never more common that final y) and uncommon in other sections.

isolated words like the star labels seldom begin with q

sequences of three consecutive tokens of the same word (eg qokeey qokeey qokeey) occur more often than you would expect in natural language.

triplets of three consecutive tokens of three different words have fewer repeated occurrences than you would expect in natural language (e.g. there are no triplets like 'and of the' which occur together again and again in English text).

Observations like these cannot be explained purely in terms of a regular expression: they involve what linguists call dependencies which are usually (in connection with natural languages) analysed using tree structures or bracketed lists. Which brings us back to the phenomenon of the line and paragraph as a structural unit. It was Currier who noticed this, but neither he nor anyone since has explained why this should be so.

Philip Neal

_________________________________________________________________
On the move? Get Hotmail on your mobile phone http://www.msn.co.uk/msnmobile

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

Prev by Date: Re: VMs: Serafini et Ely...
Next by Date: RE: VMs: could it not be a hoax (I would like it not to be it)
Previous by thread: Re: VMs: Noise or data ?
Next by thread: VMs: letter
Index(es):
- Date
- Thread