[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Number crunching the Fincher window



Zitat von Koontz John E <John.Koontz@xxxxxxxxxxxx>:

> On Tue, 14 Sep 2004 elvogt@xxxxxxxxxxx wrote:
> ...
> > Any good ideas?
> 
> I specialize in bad ideas, 

Aah, my kind of man!

To summarize, my interpretation of the Fincher algorithm is:

1) You prepare a master table with a number n of master senctences in 
Voynichese, all of approximately equal length, one sentence per line.

2) You place a piece of cardboard with a window cut in it somewhere on the 
master table, and copy the visible letters to the VM. The window would be one 
line high, and approx. x characters wide. (x is probably not a strict value -- 
why would it be?) I'll call the letters copied in one go a "batch".

3) The window might project over the left or the right edge of the master 
table. In this case, you copy only the visible letters to the VM.

4) You repeat steps 2 and 3 until Rudolph gets wet in his pants.

So, under this assumption, a new VM line wouldn't necessarily coincide with a 
new batch.

> ...
> My suggestion would be look only at sequences of length c. 16, or whatever
> the hypothetical Fincher Window width usally amounted to in "EVA glyph
> widths."  Say, 14 to 18.  One would have to assume that Fincher Windows
> would produce either sequences that were unique, or that overlapped other
> sequences where windows overlapped.

I don't quite understand. Actually, IMHO Fincher should produce comparatively 
few unique sequences, since it start from a limited set of building blocks, 
doesn't it?

>  (Do lines flow over into the next
> line in the production process, or do we assume each line is a new
> start?)  

The simplest idea would be just to go on writing over the end of a MS line.

> If a sequence overlaps another
> sequence by more than some number of characters n considered reasonable,
> you have a longer sequence.  I suspect n > 2 is a minimum.

But the problem is that I don't know where batch borders are, ie where the 
author moved his grille and started with a new copying run.

Let's assume his master sequences were:

This is a first master sequence sentence
And this is a second sentence, all of which
I will use to dumbfound Rudolph, ha ha!

He then copys, say, 8-letter batches:

first ma/to dumbf/hich/ will us

The third batch is shorter, since it went over the edge of the master table.

The VM section generated looks something like

first mato dumbfhich will us

Now let's assume I compare two sequences, "mato dum" and "dumbfhic" (anywhere 
in the VM). They overlap and I've got a match, so I reconstruct the master 
sequence to contain "mato dumbfhich", which of course is wrong.

> 
> If you have a "match," mark the two fragments as having 1 additional match
> (at the matching ends) (you may want to look at this informaiton later)
> and place the matched whole in the next round's set of pieces.  Merge it
> with any identical piece in that set.  Loop through the current set
> matching each piece to all other pieces at both ends.  Repeat the process
> with the next set of pieces. Repeat until no more matches are found.
> Sort the last set of pieces first by length (longest first) and then,
> within that, alphabetically and see what you have.
> 
> I'm not sure this is a correct or foolproof algorithm, but perhaps it's a
> start?

The problem is, that diversions from the master sequences can occur at any 
point, so I'll have to employ a stochastic approach, relying on the most 
_probable_ sequences. (Since more often than not a letter will be followed up 
by another one from the same master sequence, rather than establish a batch 
boundary.)

So, provided "dumbfound " is unique for the master table, I would assume it is 
always followed by the "R" of "Rudolph", _unless_ the copying batch happened to 
end after "dumbfound ". If the sequence occurs several times in the master 
table (which is not unreasonable), the situation gets more complicated. 
Likewise, "dumbfound " could switch to a different batch at any other place -- 
like after "dumb" or "dumbf", wherever the edge of the window happened to lie.

I feel it would be necessary to create a "tree" of occuring sequences and 
branches, but it looks awfully messy for 20000 entries to track them manually, 
and I don't have a good idea how to firmly grasp this by number crunching.

Cheers,

   Elmar


-------------------------------------------------
debitel.net Webmail
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list