[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: VBScript for finding repeating strings
> Could you provide a really simple slow description of what we are
> looking at in the triangular plot?
> How do you get from VMS to that image - step by step?
Gladly, but you'll have to set your mail reader to "courier"
otherwise it will come out as gibberish :-)
Suppose the input file is:
abc.cde.ab.fg
Then the algorithm will shift the file against itself in successive steps:
abc.cde.ab.fg
-abc.cde.ab.fg
distance = 1, no matches
abc.cde.ab.fg
--abc.cde.ab.fg
distance = 2, one match
the character "c" at positions 2 and 4, length = 1,
abc.cde.ab.fg
---abc.cde.ab.fg
distance = 3, one match
the space at positions 7 and 10
etcetera ...
abc.cde.ab.fg
--------abc.cde.ab.fg
distance = 8, one match
string "ab" at positions 0 and 8
If these are the only matches, this will give the following set of dots:
x=2, y=4
x=7, y=10
x=0, y=8
Actually, I ran the script and it produced more matches than I could see by
eye only:
File : abc.txt
Lines: 1
Chars: 15
String1 String2 Distance Length String
2 4 2 1 |c|
7 10 3 1 |.|
3 7 4 1 |.|
3 10 7 1 |.|
0 8 8 2 |ab|
And if we make an x-y plot of it it will look like this:
,..X.,.X..,....,
,....,....,....,
X....,....,....,
,..X.,....,....,
,....,....,....,
,....,....,....,
,.X..,....,....,
,....,....,....,
,....,....,....,
,....,....,....,
Now if you take a bigger input file, you get a bigger triangle.
In the above example I set the cutoff at 0, so every match is accepted.
In my VMS calculation I set the cutoff at 12, so only strings > 12 are
printed.
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list