[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: How many Rugg tables for the WotW
Well said, Jorge. Good work, Mark, Gabriel et al.
Ok, I guess the ball is in their court now...
Who'll prepare the rebuttal piece for Sci-Am?
Is it too early? Or perhaps too late?
Luis
>
> From: Jorge Stolfi <stolfi@xxxxxxxxxxxxx>
> Date: 2004/10/05 Tue AM 08:03:56 EDT
> To: vms-list@xxxxxxxxxxx
> Subject: VMs: How many Rugg tables for the WotW
>
>
> > [Stolfi:] Or one could generate the entire WotW novel (78000 words),
> > with all the subtle asymmetries and long-range correlations noted
> > above, with less than 160 Rugg tables.
>
> > [Dennis:] How did you calculate this?
>
> Well, with a single table you can reproduce an arbitrary 520 word
> text, and 78000/520 = 150. (The "160" was a quick mental calculation,
> sorry...)
>
> The first table could be, say
>
> *:no :* ¦ *:on :e ¦ *:wou:ld ¦ ha:v :e ¦
> be:lie:ved ¦ *:in :* ¦ *:th :e ¦ *:las:t ¦
> y:ear:s ¦ *:of :* ¦ *:th :e ¦ nine:tee:nth ¦
> cen:tur:y ¦ *:th :at ¦ *:th :is ¦ *:wor:ld ¦
> *:was:* ¦ *:be :ing ¦ wa:tch:ed ¦ k:een:ly ¦
> *:and:* ¦ cl:ose:ly ¦ *:by :* ¦intel:lig:ences¦
> gr:eat:er ¦ *:th :an ¦ *:man:'s ¦ *:and:* ¦
> *:yet:* ¦ *:as :* ¦ *:mor:tal ¦ *:as :* ¦
> ... ... ... ...
> tem:per:ate ¦ z:on :es ¦ *:th :at ¦ *:las:t ¦
>
> to be scanned with a trvial grille (all three slots on the same row).
>
> The splitting points above are basically random. Obviously they have
> no effect on the first pass; they would be relevant only if the same
> table were scanned a second time, with a different grille.
>
> > this is the strongest argument against Rugg. A hoaxer would
> > probably have needed an implausibly large number of tables
> > and grilles. More likely, he would have used relatively
> > fewer tables, and then we would then see more repetitions
> > than in fact we do.
>
> It is my feeling too.
>
> In principle, the first 520 words generated by Rugg's method are
> completely arbitrary. So, assuming a word entropy of 10 bits (typical
> of many languages, including Voynichese), the first 520 tokens of the
> output should contain about 5200 bits of information.
>
> The three-way splitting of the table entries, which is not relevant
> for the first pass, is largely expressed in the output of the second
> pass. There are (K+1)(K+2)/2 ways to split a word with K letters; so,
> assuming that the typical word has 6 letters, the next 520 tokens of
> the output should contain about 5 bits per word, or about 2600 bits of
> "new" information (not given by the first pass). However, some of this
> information may remain hidden in the second pass, to be revealed only
> on later passes.
>
> From the third pass onwards, however, the "new" information is limited
> to the placement of the holes in the grille --- i.e. to 6--8 bits for
> each batch of 520 words. Even if the movement of the grille is varied
> among a few simple patterns (by rows or by columns, direct or reverse,
> skipping k entries at each step, etc.), this choice amounts to another
> handful of bits.
>
> Variations on Rugg's basic proposal could insert more information
> into the output text, but only a very limited amount.
> Obviously the actual table need not have 520 entries, but the choice
> of that parameter cannot provide much more than 10 bits. Starting
> each pass with the grille at a random position (i.e., applying a
> cyclic shift to the table entries before each pass) would add another
> 10 bits per pass.
>
> Thus, in theory, we could verify whether the VMS was produced with
> Rugg's grille method by analyzing a stretch containing at least two
> scans of the same table, preferably three or more. First, we must
> guess the table size N. Then, since we cannot be sure that the first
> token of the sample is the start of a new pass, we must guess a number
> M between 0 and 35,000, and discard the first M tokens of the text.
>
> Now, if those guesses are correct, the first N tokens of the remaining
> sample ("Batch 1") are the result of a single pass. Note that we can
> always assume that the grille used in that pass is the trivial one
> (with the three slots on the same row), and that the table is scanned
> in the trivial way (step 1, starting from the first entry). Therefore
> the N tokens of Batch 1 give us the entries of the table, in order,
> except for the prefix-midfix-suffix splits.
>
> Next we guess the parameters of the second scan (positions of the
> grille slots, direction of scan, starting entry, etc.), and we try to
> find splitting points for the table entries that, with those
> parameters, would produce the next N tokens of the sample ("Batch 2").
> This search may not be that difficult, since comparison of the
> prefixes and suffixes of Batch 2 tokens with those of Batch 1 should
> quickly exclude most possibilities.
>
> If this search succeeds, it will probably give most if not all of the
> splitting points. The next N tokens ("Batch 3") should provide an easy
> check for this analysis, and fix most of the splittings that could not
> be determined from Batch 2 (if any). For this pass, the only unknowns
> are grille slots and the scanning path.
>
> Although the number of bits to be guessed (a couple dozen per pass) is
> not trivial, I believe that the search could be optimized to the point
> of becoming feasible. For example, one could make a list P of all
> pairs (i,j) such that token i starts with the same letter as token j,
> and i-j is between 100 and 1000; and a similar list S for final
> letters. Then build the list of differences (i-i', j-j') for all (i,j)
> and (i',j') in P, and ditto for S; and look for anomalies in the
> histograms.
>
> I am not going to try this analysis myself, because in my view the
> general Hoax Theory, and Rugg's version in particular, are only
> slightly more likely than the UFO Theory; and I know of much better
> ways to waste my time. Moreover, the expected negative result would
> not convince the believers: I am sure they would simply "enhance" the
> grille method by adding another free parameter which I failed to
> consider (like, allow zig-zag scans), and thus claim that the analysis
> proves nothing. Or worse, they could postulate that the grille was
> scanned in some unconstrained "random" sequence: while this would add
> about 4500 bits of "fresh" information per scan, it is not different
> than saying that the words were picked from a Voynichese dictionary in
> arbitrary order --- which unfortunately is how one writes a
> *meaningful* text!
>
> In other words, a Hoax Theory that requires about 10 bits worth of
> unconstrained choices per word is a non-falsifiable theory that
> includes almost any text, meaningful or not; and one that requires
> much fewer bits should leave a detectable signature. So it is the
> believers in the latter who owe us this sort of analysis --- and they
> should please report back only when (if) they succeed.
>
> All the best,
>
> --stolfi
>
>
> ______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
> unsubscribe vms-list
>
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list