[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: VBScript for finding repeating strings
Hi Petr and all,
I am new to the VMs list, although I have played around with VM for
some time already.
Petr, I thing your analysis of repeated strings is very interesting.
I have two questions or comments. In your analysis, strings that
occur more than twice will produce several entries in the output.
(1st with 2nd, 1st with 3rd, 2nd with 3rd...). Thus, very common
strings will produce many dots obscuring the more interesting
structure made by the more rare ones. An interesting modification
would be to use the information about the number of occurences for
example by coloring the dots in the charts according to the number of
their occurences, or removing completery the strings occuring more
times than certain limit. Of course, many other manipulations are
Second, I think the resulting structure is really interesting.
Unfortunately the dots are too dense to observe the fine structure
inside the clusters, but even in the gaps one can observe distinct
horizontal lines - possibly positions, that contain some general (not
specialized) text, parts of which repeat at many other positions.
Extremely interesting is the only HORIZONTAL line - close to y=14000.
It means that a part of the text exists, where many of the previously
used expressions are repeated, but these expressions are not the
common ones (they do not lie on distinct vertical lines). What does
it mean? This part is located at the end of part two and the line
extends exactly over the part two. Kind of a summary of the whole
part? what folio (folios) does it correspond to?
Btw.: Do you have anything to do with Czechia or Prague? I have seen
some Czech texts on our web site. I am from Prague.
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: