# Re: VMs: Number crunching the Fincher window

```Zitat von Koontz John E <John.Koontz@xxxxxxxxxxxx>:

> On Tue, 14 Sep 2004 elvogt@xxxxxxxxxxx wrote:
> ...
> > Any good ideas?
>
> I specialize in bad ideas,

Aah, my kind of man!

To summarize, my interpretation of the Fincher algorithm is:

1) You prepare a master table with a number n of master senctences in
Voynichese, all of approximately equal length, one sentence per line.

2) You place a piece of cardboard with a window cut in it somewhere on the
master table, and copy the visible letters to the VM. The window would be one
line high, and approx. x characters wide. (x is probably not a strict value --
why would it be?) I'll call the letters copied in one go a "batch".

3) The window might project over the left or the right edge of the master
table. In this case, you copy only the visible letters to the VM.

4) You repeat steps 2 and 3 until Rudolph gets wet in his pants.

So, under this assumption, a new VM line wouldn't necessarily coincide with a
new batch.

> ...
> My suggestion would be look only at sequences of length c. 16, or whatever
> the hypothetical Fincher Window width usally amounted to in "EVA glyph
> widths."  Say, 14 to 18.  One would have to assume that Fincher Windows
> would produce either sequences that were unique, or that overlapped other
> sequences where windows overlapped.

I don't quite understand. Actually, IMHO Fincher should produce comparatively
few unique sequences, since it start from a limited set of building blocks,
doesn't it?

>  (Do lines flow over into the next
> line in the production process, or do we assume each line is a new
> start?)

The simplest idea would be just to go on writing over the end of a MS line.

> If a sequence overlaps another
> sequence by more than some number of characters n considered reasonable,
> you have a longer sequence.  I suspect n > 2 is a minimum.

But the problem is that I don't know where batch borders are, ie where the
author moved his grille and started with a new copying run.

Let's assume his master sequences were:

This is a first master sequence sentence
And this is a second sentence, all of which
I will use to dumbfound Rudolph, ha ha!

He then copys, say, 8-letter batches:

first ma/to dumbf/hich/ will us

The third batch is shorter, since it went over the edge of the master table.

The VM section generated looks something like

first mato dumbfhich will us

Now let's assume I compare two sequences, "mato dum" and "dumbfhic" (anywhere
in the VM). They overlap and I've got a match, so I reconstruct the master
sequence to contain "mato dumbfhich", which of course is wrong.

>
> If you have a "match," mark the two fragments as having 1 additional match
> (at the matching ends) (you may want to look at this informaiton later)
> and place the matched whole in the next round's set of pieces.  Merge it
> with any identical piece in that set.  Loop through the current set
> matching each piece to all other pieces at both ends.  Repeat the process
> with the next set of pieces. Repeat until no more matches are found.
> Sort the last set of pieces first by length (longest first) and then,
> within that, alphabetically and see what you have.
>
> I'm not sure this is a correct or foolproof algorithm, but perhaps it's a
> start?

The problem is, that diversions from the master sequences can occur at any
point, so I'll have to employ a stochastic approach, relying on the most
_probable_ sequences. (Since more often than not a letter will be followed up
by another one from the same master sequence, rather than establish a batch
boundary.)

So, provided "dumbfound " is unique for the master table, I would assume it is
always followed by the "R" of "Rudolph", _unless_ the copying batch happened to
end after "dumbfound ". If the sequence occurs several times in the master
table (which is not unreasonable), the situation gets more complicated.
Likewise, "dumbfound " could switch to a different batch at any other place --
like after "dumb" or "dumbf", wherever the edge of the window happened to lie.

I feel it would be necessary to create a "tree" of occuring sequences and
branches, but it looks awfully messy for 20000 entries to track them manually,
and I don't have a good idea how to firmly grasp this by number crunching.

Cheers,

Elmar

-------------------------------------------------
debitel.net Webmail
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

```