[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Random Text Generation

Hi Rob,

My theory goes as follows -

If an algorithm can be created that approximates Voynichese so well that the generated text matches real Voynichese to, say, 90% accuracy (based on letter, bigraph, trigraph, word frequencies, zipf laws, etc) AND it is simultaneously found to be impossible to create a similar algorithm with the same accuracy for a real language, the implication would be that the VMS's scribe(s) used some sort of random word generation system.

Is there any fundamental flaw in this logic? I'm not out to prove one way or the other whether the VMS is gibberish or not, but it seems to be a means of determining this important fact.

An analogy: the approach I've always liked to use when writing computer games is to approximate constrained decision making (ie, AI) with bounded randomness. Of course, it's not true AI (though let's not get into a semantic discussion about what that would actually be :-) ), but it can be very convincing... iff you tweak the parameters enough. :-)

I think that what you're suggesting may well be driving down the same road: creating a random walk through a Markov chain (or whatever) will - for sure - give you plausible-looking results, that can be made to get ever closer to Voynichese (or whatever language you're trying to simulate) as the complexity of your model increases.

And I suppose this is my point: what you are essentially proposing is to construct a generative model of Voynichese, by which plausible phrases (and even sentences) may be output randomly. There have been a number of these proposed already, that seem to be consistent with a large part of the corpus of the text... yet give no obvious indication why some words exist and others don't.

The problem is that many features of Voynichese seem to be somewhat at odds with the structure of existing languages - for example, the large number of unique (yet only slightly different) words relative to the size of the text. Why should some words (in the model's word-space) exist and others not? Tricky. :-|

One could argue, I suppose, that any language could be modelled with an algorithm, as long as no limit was set on the complexity of the algorithm.

Hmmm... I can see the exam question now: "Language is a social algorithm of indeterminate complexity. Discuss." :-)

Our VMS scribe(s), however, would have been limited to something very simple like a table of letter patterns and a knuckle-bone dice or two.

Though I don't recall this kind of idea (for example, of using dice to prompt an encipherer whether to insert a null in a ciphertext) as having been proposed or used in this general era: yes, it *is* possible... though, unless there's specific evidence to suggest otherwise, I'd say fairly unlikely. :-o

May I ask the more crypto-oriented participants here: has any statistical survey of null placements in C15-C18 ciphertexts has been done? It's the kind of thing that might have once come up in a Cryptologia footnote, I guess. :-)

My opinion is that you would need this kind of statistical bridge (between randomness and empirical cryptographic evidence) to exist to support your general proposition. :-o

But at the same time, trying to reconcile this with the (apparently) binomial distribution of VMS word lengths (as pointed out by Jorge Stolfi at the beginning of the year) is quite challenging:-

> Actually, the fact itself still stands: the Voynichese word length
> distribution is almost exactly like the binomial distribution
> binom(9,k-1) (i.e. the number of distinct words with k letters
> is proportional to the chance of obtaining k-1 heads in 9 coin tosses),
> and quite unlike that of Latin, English, or any other "ordinary"
> language.

Did the encipherer *really* flip a coin 9 times to get the word-length (erm, minus one)? This sounds fairly unlikely (especially as this would probably preclude "dain" from appearing so often, but not "d" or "ain" on their own).

However, note that the alternative would appear to be an enciphering (or similar transformation) process whereby this kind of distribution comes out as a side-effect. If my dating of 1450-1460 is even relatively close, there were few conceptual possibilities in the air then that would match this pattern.

However, the simplest system I can think of that may come close to fitting this single criterion would be a form of numbering scheme based (in part) on Roman numerals. As I've mentioned before, it would not surprise me in the slightest if the very core of the VMS' code were to prove to be no more than an embellished number code.

I have sometimes wondered, though, if some of Jorge's apparent binomial distribution arises in part from the EVA stroke transcription. If compound letters (such as EVA "aii", "aiii", "cc", and even "ccc") were to be converted into a single symbol each, would the resulting set of word lengths still conform to a binomial distribution?

Perhaps someone might like to test this out...? :-)

Cheers, .....Nick Pelling.....

PS: I don't currently know if any of the 15th Century number codes mentioned in the crypto literature (such as that used for the most part by the Strozzi) were ever written in Roman numerals - does anyone here know of any? This would seem to be a sensible place to start looking for possible structural similarities with the VMS' code...

To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list