I have been working on precisely this test, and it has revealed some interesting results. I anticipate my report will be ready in a few weeks. It will not make anybody happy. Rob ________________________________ From: owner-vms-list@xxxxxxxxxxx on behalf of Jacques Guy Sent: Thu 27/01/2005 20:07 To: vms-list@xxxxxxxxxxx Subject: Re: VMs: Another method different from Cardano Grilles 27/01/2005 6:16:20 PM, Elmar Vogt <elvogt@xxxxxxxxxxx> wrote: >Jacques Guy wrote: >> For how many words which DO NOT occur will those five wheels >> reconstruct? >Brilliant! It just came to me like that, without thinking about it. >I'm not one to discard the hoax theories so quickly, but Jacques, this >appears to be an excellent test for any hoax hypothesis: Are there words >which do not appear in the VM which the discs (or grilles or whatever we >use) ought to be producing in sufficient quantities? There is a much simpler and statistically valid test. Take the Voynich manuscript. Make a list of all the different words in it. If you are not sure about the validity of the definition of "word" (because you are not sure of what makes a boundary, for instance), make a list of all the trigraphs in it (or n-graphs, doesn't matter). Now calculate the probability of occurrence of each word (or trigraph or whatever) using the Cardano grilles, or the wheels, or whatever. For instance: <qok> We know that there are, say, 100 "letters" on the outer wheel, and <q> occurs twice. So, the probability of spinning a <q> is 2/100 Next, look at next wheel inside. Say, 60 "letters" and <o> occurs 10 times. Probability of spinning an <o> = 10/60 Finally, <k> on the inner wheel. Using the same method again, say we get 8/40 So, the probability of spinning <qok> is: 2/100 * 10/60 * 8/40 = 1/(50*6*5) = 1/1500 We have counted, say, 80,000 trigraphs in the VMS, so we should expect 80,000/1500 = 533 occurrences of <qok>. But we see, let's say, 734 (completely off the top of my head, I have no idea at all of the real figure) Let's record that: <qok> expected 533 observed 734 And we continue, doing the same for every trigraph (or word, if you prefer words) Next you calculate chi-squared. That gives you the probability of the actual text of the VMS being significantly different from what the wheel generates. In other words: the probability that it was NOT generated by your wheel (or Cardan grills, or whatever method). (You can also calculate phi if you want to know how far it deviates from your wheel, phi = sqrt(chi2/N)) Now, chi-squared is valid only when all expected absolute frequencies are >=5. In most statistics text books they advise you to merge some rows and columns (here, it's columns) until they are all >=5. You can do that here but I think that just IGNORING the digraphs or words with those expected frequencies is much better. Thus if, say, <qoteedy> has an expected frequency of 3 (as calculated from the layout of the wheel) you just do as if it was not there. ______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list
<<winmail.dat>>