[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: How many Rugg tables for the WotW



Well said, Jorge. Good work, Mark, Gabriel et al.
Ok, I guess the ball is in their court now...
Who'll prepare the rebuttal piece for Sci-Am?
Is it too early? Or perhaps too late?

Luis

> 
> From: Jorge Stolfi <stolfi@xxxxxxxxxxxxx>
> Date: 2004/10/05 Tue AM 08:03:56 EDT
> To: vms-list@xxxxxxxxxxx
> Subject: VMs: How many Rugg tables for the WotW
> 
> 
>   > [Stolfi:] Or one could generate the entire WotW novel (78000 words),
>   > with all the subtle asymmetries and long-range correlations noted
>   > above, with less than 160 Rugg tables.
> 
>   > [Dennis:] How did you calculate this?
> 
> Well, with a single table you can reproduce an arbitrary 520 word
> text, and 78000/520 = 150. (The "160" was a quick mental calculation,
> sorry...)
> 
> The first table could be, say
> 
>     *:no :*    ¦    *:on :e    ¦    *:wou:ld   ¦   ha:v  :e    ¦
>    be:lie:ved  ¦    *:in :*    ¦    *:th :e    ¦    *:las:t    ¦
>     y:ear:s    ¦    *:of :*    ¦    *:th :e    ¦ nine:tee:nth  ¦
>   cen:tur:y    ¦    *:th :at   ¦    *:th :is   ¦    *:wor:ld   ¦
>     *:was:*    ¦    *:be :ing  ¦   wa:tch:ed   ¦    k:een:ly   ¦
>     *:and:*    ¦   cl:ose:ly   ¦    *:by :*    ¦intel:lig:ences¦
>    gr:eat:er   ¦    *:th :an   ¦    *:man:'s   ¦    *:and:*    ¦
>     *:yet:*    ¦    *:as :*    ¦    *:mor:tal  ¦    *:as :*    ¦
>       ...             ...             ...             ...       
>   tem:per:ate  ¦    z:on :es   ¦    *:th :at   ¦    *:las:t    ¦
>   
> to be scanned with a trvial grille (all three slots on the same row).
> 
> The splitting points above are basically random. Obviously they have
> no effect on the first pass; they would be relevant only if the same
> table were scanned a second time, with a different grille.
> 
>   > this is the strongest argument against Rugg.  A hoaxer would
>   > probably have needed an implausibly large number of  tables
>   > and grilles.  More likely, he would have used relatively
>   > fewer tables, and then we would then see more repetitions 
>   > than in fact we do.  
> 
> It is my feeling too. 
> 
> In principle, the first 520 words generated by Rugg's method are
> completely arbitrary. So, assuming a word entropy of 10 bits (typical
> of many languages, including Voynichese), the first 520 tokens of the 
> output should contain about 5200 bits of information.
> 
> The three-way splitting of the table entries, which is not relevant
> for the first pass, is largely expressed in the output of the second
> pass. There are (K+1)(K+2)/2 ways to split a word with K letters; so,
> assuming that the typical word has 6 letters, the next 520 tokens of
> the output should contain about 5 bits per word, or about 2600 bits of
> "new" information (not given by the first pass). However, some of this
> information may remain hidden in the second pass, to be revealed only
> on later passes.
> 
> From the third pass onwards, however, the "new" information is limited
> to the placement of the holes in the grille --- i.e. to 6--8 bits for
> each batch of 520 words. Even if the movement of the grille is varied
> among a few simple patterns (by rows or by columns, direct or reverse,
> skipping k entries at each step, etc.), this choice amounts to another
> handful of bits.
> 
> Variations on Rugg's basic proposal could insert more information
> into the output text, but only a very limited amount.
> Obviously the actual table need not have 520 entries, but the choice
> of that parameter cannot provide much more than 10 bits.  Starting
> each pass with the grille at a random position (i.e., applying a
> cyclic shift to the table entries before each pass) would add another
> 10 bits per pass.
> 
> Thus, in theory, we could verify whether the VMS was produced with
> Rugg's grille method by analyzing a stretch containing at least two
> scans of the same table, preferably three or more. First, we must
> guess the table size N. Then, since we cannot be sure that the first
> token of the sample is the start of a new pass, we must guess a number
> M between 0 and 35,000, and discard the first M tokens of the text.
> 
> Now, if those guesses are correct, the first N tokens of the remaining
> sample ("Batch 1") are the result of a single pass. Note that we can
> always assume that the grille used in that pass is the trivial one
> (with the three slots on the same row), and that the table is scanned
> in the trivial way (step 1, starting from the first entry). Therefore
> the N tokens of Batch 1 give us the entries of the table, in order,
> except for the prefix-midfix-suffix splits.
> 
> Next we guess the parameters of the second scan (positions of the
> grille slots, direction of scan, starting entry, etc.), and we try to
> find splitting points for the table entries that, with those
> parameters, would produce the next N tokens of the sample ("Batch 2").
> This search may not be that difficult, since comparison of the
> prefixes and suffixes of Batch 2 tokens with those of Batch 1 should
> quickly exclude most possibilities.  
> 
> If this search succeeds, it will probably give most if not all of the
> splitting points. The next N tokens ("Batch 3") should provide an easy
> check for this analysis, and fix most of the splittings that could not
> be determined from Batch 2 (if any). For this pass, the only unknowns
> are grille slots and the scanning path.
> 
> Although the number of bits to be guessed (a couple dozen per pass) is
> not trivial, I believe that the search could be optimized to the point
> of becoming feasible. For example, one could make a list P of all
> pairs (i,j) such that token i starts with the same letter as token j,
> and i-j is between 100 and 1000; and a similar list S for final
> letters. Then build the list of differences (i-i', j-j') for all (i,j)
> and (i',j') in P, and ditto for S; and look for anomalies in the
> histograms.
> 
> I am not going to try this analysis myself, because in my view the
> general Hoax Theory, and Rugg's version in particular, are only
> slightly more likely than the UFO Theory; and I know of much better
> ways to waste my time. Moreover, the expected negative result would
> not convince the believers: I am sure they would simply "enhance" the
> grille method by adding another free parameter which I failed to
> consider (like, allow zig-zag scans), and thus claim that the analysis
> proves nothing. Or worse, they could postulate that the grille was
> scanned in some unconstrained "random" sequence: while this would add
> about 4500 bits of "fresh" information per scan, it is not different
> than saying that the words were picked from a Voynichese dictionary in
> arbitrary order --- which unfortunately is how one writes a
> *meaningful* text!
> 
> In other words, a Hoax Theory that requires about 10 bits worth of
> unconstrained choices per word is a non-falsifiable theory that
> includes almost any text, meaningful or not; and one that requires
> much fewer bits should leave a detectable signature. So it is the
> believers in the latter who owe us this sort of analysis --- and they
> should please report back only when (if) they succeed.
> 
> All the best,
> 
> --stolfi
> 
> 
> ______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
> unsubscribe vms-list
> 

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list