[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Number crunching the Fincher window

On Tue, 14 Sep 2004 elvogt@xxxxxxxxxxx wrote:
> Unfortunately, at this point ugly facts begun to rear their heads, because,
> looking at some 20000 sequences, I'm a bit stumped on how to assess them with
> either reasonable manual effort, or limited programming time.
> Any good ideas?

I specialize in bad ideas, but, in general, I think this problem is not
unlike the problem of sequencing DNA based on random length fragments of
the original.  This requires major computing power, but I think the
algorithms must be described.  I haven't really ever looked at this beyond
the black box description of the problem, I'm afraid.

My suggestion would be look only at sequences of length c. 16, or whatever
the hypothetical Fincher Window width usally amounted to in "EVA glyph
widths."  Say, 14 to 18.  One would have to assume that Fincher Windows
would produce either sequences that were unique, or that overlapped other
sequences where windows overlapped.  (Do lines flow over into the next
line in the production process, or do we assume each line is a new
start?)  If a sequence overlaps another
sequence by more than some number of characters n considered reasonable,
you have a longer sequence.  I suspect n > 2 is a minimum.

If you have a "match," mark the two fragments as having 1 additional match
(at the matching ends) (you may want to look at this informaiton later)
and place the matched whole in the next round's set of pieces.  Merge it
with any identical piece in that set.  Loop through the current set
matching each piece to all other pieces at both ends.  Repeat the process
with the next set of pieces. Repeat until no more matches are found.
Sort the last set of pieces first by length (longest first) and then,
within that, alphabetically and see what you have.

I'm not sure this is a correct or foolproof algorithm, but perhaps it's a
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list