[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: VBScript for finding repeating strings

Hi Petr and all,
I am new to the VMs list, although I have played around with VM for 
some time already. 

Petr, I thing your analysis of repeated strings is very interesting. 
I have two questions or comments. In your analysis, strings that 
occur more than twice will produce several entries in the output. 
(1st with 2nd, 1st with 3rd, 2nd with 3rd...). Thus, very common 
strings will produce many dots obscuring the more interesting 
structure made by the more rare ones. An interesting modification 
would be to use the information about the number of occurences for 
example by coloring the dots in the charts according to the number of 
their occurences, or removing completery the strings occuring more 
times than certain limit. Of course, many other manipulations are 

Second, I think the resulting structure is really interesting. 
Unfortunately the dots are too dense to observe the fine structure 
inside the clusters, but even in the gaps one can observe distinct 
horizontal lines - possibly positions, that contain some general (not 
specialized) text, parts of which repeat at many other positions. 
Extremely interesting is the only HORIZONTAL line - close to y=14000. 
It means that a part of the text exists, where many of the previously 
used expressions are repeated, but these expressions are not the 
common ones (they do not lie on distinct vertical lines). What does 
it mean? This part is located at the end of part two and the line 
extends exactly over the part two. Kind of a summary of the whole 
part? what folio (folios) does it correspond to?



Btw.: Do you have anything to do with Czechia or Prague? I have seen 
some Czech texts on our web site. I am from Prague.

To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list