[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Random Text Generation
My theory goes as follows -
If an algorithm can be created that approximates Voynichese so well that
the generated text matches real Voynichese to, say, 90% accuracy (based on
letter, bigraph, trigraph, word frequencies, zipf laws, etc) AND it is
simultaneously found to be impossible to create a similar algorithm with
the same accuracy for a real language, the implication would be that the
VMS's scribe(s) used some sort of random word generation system.
Is there any fundamental flaw in this logic? I'm not out to prove one way
or the other whether the VMS is gibberish or not, but it seems to be a
means of determining this important fact.
An analogy: the approach I've always liked to use when writing computer
games is to approximate constrained decision making (ie, AI) with bounded
randomness. Of course, it's not true AI (though let's not get into a
semantic discussion about what that would actually be :-) ), but it can be
very convincing... iff you tweak the parameters enough. :-)
I think that what you're suggesting may well be driving down the same road:
creating a random walk through a Markov chain (or whatever) will - for sure
- give you plausible-looking results, that can be made to get ever closer
to Voynichese (or whatever language you're trying to simulate) as the
complexity of your model increases.
And I suppose this is my point: what you are essentially proposing is to
construct a generative model of Voynichese, by which plausible phrases (and
even sentences) may be output randomly. There have been a number of these
proposed already, that seem to be consistent with a large part of the
corpus of the text... yet give no obvious indication why some words exist
and others don't.
The problem is that many features of Voynichese seem to be somewhat at odds
with the structure of existing languages - for example, the large number of
unique (yet only slightly different) words relative to the size of the
text. Why should some words (in the model's word-space) exist and others
not? Tricky. :-|
One could argue, I suppose, that any language could be modelled with an
algorithm, as long as no limit was set on the complexity of the algorithm.
Hmmm... I can see the exam question now: "Language is a social algorithm of
indeterminate complexity. Discuss." :-)
Our VMS scribe(s), however, would have been limited to something very
simple like a table of letter patterns and a knuckle-bone dice or two.
Though I don't recall this kind of idea (for example, of using dice to
prompt an encipherer whether to insert a null in a ciphertext) as having
been proposed or used in this general era: yes, it *is* possible... though,
unless there's specific evidence to suggest otherwise, I'd say fairly
May I ask the more crypto-oriented participants here: has any statistical
survey of null placements in C15-C18 ciphertexts has been done? It's the
kind of thing that might have once come up in a Cryptologia footnote, I
My opinion is that you would need this kind of statistical bridge (between
randomness and empirical cryptographic evidence) to exist to support your
general proposition. :-o
But at the same time, trying to reconcile this with the (apparently)
binomial distribution of VMS word lengths (as pointed out by Jorge Stolfi
at the beginning of the year) is quite challenging:-
> Actually, the fact itself still stands: the Voynichese word length
> distribution is almost exactly like the binomial distribution
> binom(9,k-1) (i.e. the number of distinct words with k letters
> is proportional to the chance of obtaining k-1 heads in 9 coin
> and quite unlike that of Latin, English, or any other "ordinary"
Did the encipherer *really* flip a coin 9 times to get the word-length
(erm, minus one)? This sounds fairly unlikely (especially as this would
probably preclude "dain" from appearing so often, but not "d" or "ain" on
However, note that the alternative would appear to be an enciphering (or
similar transformation) process whereby this kind of distribution comes out
as a side-effect. If my dating of 1450-1460 is even relatively close, there
were few conceptual possibilities in the air then that would match this
However, the simplest system I can think of that may come close to fitting
this single criterion would be a form of numbering scheme based (in part)
on Roman numerals. As I've mentioned before, it would not surprise me in
the slightest if the very core of the VMS' code were to prove to be no more
than an embellished number code.
I have sometimes wondered, though, if some of Jorge's apparent binomial
distribution arises in part from the EVA stroke transcription. If compound
letters (such as EVA "aii", "aiii", "cc", and even "ccc") were to be
converted into a single symbol each, would the resulting set of word
lengths still conform to a binomial distribution?
Perhaps someone might like to test this out...? :-)
Cheers, .....Nick Pelling.....
PS: I don't currently know if any of the 15th Century number codes
mentioned in the crypto literature (such as that used for the most part by
the Strozzi) were ever written in Roman numerals - does anyone here know of
any? This would seem to be a sensible place to start looking for possible
structural similarities with the VMS' code...
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: