[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: How many Rugg tables for the WotW
> [Stolfi:] Or one could generate the entire WotW novel (78000 words),
> with all the subtle asymmetries and long-range correlations noted
> above, with less than 160 Rugg tables.
> [Dennis:] How did you calculate this?
Well, with a single table you can reproduce an arbitrary 520 word
text, and 78000/520 = 150. (The "160" was a quick mental calculation,
sorry...)
The first table could be, say
*:no :* ¦ *:on :e ¦ *:wou:ld ¦ ha:v :e ¦
be:lie:ved ¦ *:in :* ¦ *:th :e ¦ *:las:t ¦
y:ear:s ¦ *:of :* ¦ *:th :e ¦ nine:tee:nth ¦
cen:tur:y ¦ *:th :at ¦ *:th :is ¦ *:wor:ld ¦
*:was:* ¦ *:be :ing ¦ wa:tch:ed ¦ k:een:ly ¦
*:and:* ¦ cl:ose:ly ¦ *:by :* ¦intel:lig:ences¦
gr:eat:er ¦ *:th :an ¦ *:man:'s ¦ *:and:* ¦
*:yet:* ¦ *:as :* ¦ *:mor:tal ¦ *:as :* ¦
... ... ... ...
tem:per:ate ¦ z:on :es ¦ *:th :at ¦ *:las:t ¦
to be scanned with a trvial grille (all three slots on the same row).
The splitting points above are basically random. Obviously they have
no effect on the first pass; they would be relevant only if the same
table were scanned a second time, with a different grille.
> this is the strongest argument against Rugg. A hoaxer would
> probably have needed an implausibly large number of tables
> and grilles. More likely, he would have used relatively
> fewer tables, and then we would then see more repetitions
> than in fact we do.
It is my feeling too.
In principle, the first 520 words generated by Rugg's method are
completely arbitrary. So, assuming a word entropy of 10 bits (typical
of many languages, including Voynichese), the first 520 tokens of the
output should contain about 5200 bits of information.
The three-way splitting of the table entries, which is not relevant
for the first pass, is largely expressed in the output of the second
pass. There are (K+1)(K+2)/2 ways to split a word with K letters; so,
assuming that the typical word has 6 letters, the next 520 tokens of
the output should contain about 5 bits per word, or about 2600 bits of
"new" information (not given by the first pass). However, some of this
information may remain hidden in the second pass, to be revealed only
on later passes.
>From the third pass onwards, however, the "new" information is limited
to the placement of the holes in the grille --- i.e. to 6--8 bits for
each batch of 520 words. Even if the movement of the grille is varied
among a few simple patterns (by rows or by columns, direct or reverse,
skipping k entries at each step, etc.), this choice amounts to another
handful of bits.
Variations on Rugg's basic proposal could insert more information
into the output text, but only a very limited amount.
Obviously the actual table need not have 520 entries, but the choice
of that parameter cannot provide much more than 10 bits. Starting
each pass with the grille at a random position (i.e., applying a
cyclic shift to the table entries before each pass) would add another
10 bits per pass.
Thus, in theory, we could verify whether the VMS was produced with
Rugg's grille method by analyzing a stretch containing at least two
scans of the same table, preferably three or more. First, we must
guess the table size N. Then, since we cannot be sure that the first
token of the sample is the start of a new pass, we must guess a number
M between 0 and 35,000, and discard the first M tokens of the text.
Now, if those guesses are correct, the first N tokens of the remaining
sample ("Batch 1") are the result of a single pass. Note that we can
always assume that the grille used in that pass is the trivial one
(with the three slots on the same row), and that the table is scanned
in the trivial way (step 1, starting from the first entry). Therefore
the N tokens of Batch 1 give us the entries of the table, in order,
except for the prefix-midfix-suffix splits.
Next we guess the parameters of the second scan (positions of the
grille slots, direction of scan, starting entry, etc.), and we try to
find splitting points for the table entries that, with those
parameters, would produce the next N tokens of the sample ("Batch 2").
This search may not be that difficult, since comparison of the
prefixes and suffixes of Batch 2 tokens with those of Batch 1 should
quickly exclude most possibilities.
If this search succeeds, it will probably give most if not all of the
splitting points. The next N tokens ("Batch 3") should provide an easy
check for this analysis, and fix most of the splittings that could not
be determined from Batch 2 (if any). For this pass, the only unknowns
are grille slots and the scanning path.
Although the number of bits to be guessed (a couple dozen per pass) is
not trivial, I believe that the search could be optimized to the point
of becoming feasible. For example, one could make a list P of all
pairs (i,j) such that token i starts with the same letter as token j,
and i-j is between 100 and 1000; and a similar list S for final
letters. Then build the list of differences (i-i', j-j') for all (i,j)
and (i',j') in P, and ditto for S; and look for anomalies in the
histograms.
I am not going to try this analysis myself, because in my view the
general Hoax Theory, and Rugg's version in particular, are only
slightly more likely than the UFO Theory; and I know of much better
ways to waste my time. Moreover, the expected negative result would
not convince the believers: I am sure they would simply "enhance" the
grille method by adding another free parameter which I failed to
consider (like, allow zig-zag scans), and thus claim that the analysis
proves nothing. Or worse, they could postulate that the grille was
scanned in some unconstrained "random" sequence: while this would add
about 4500 bits of "fresh" information per scan, it is not different
than saying that the words were picked from a Voynichese dictionary in
arbitrary order --- which unfortunately is how one writes a
*meaningful* text!
In other words, a Hoax Theory that requires about 10 bits worth of
unconstrained choices per word is a non-falsifiable theory that
includes almost any text, meaningful or not; and one that requires
much fewer bits should leave a detectable signature. So it is the
believers in the latter who owe us this sort of analysis --- and they
should please report back only when (if) they succeed.
All the best,
--stolfi
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list