[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: VBScript for finding repeating strings
> strings that
> occur more than twice will produce several entries in the output.
> Thus, very common
> strings will produce many dots obscuring the more interesting
> structure made by the more rare ones. (***)
Yes, that's right. I'll make a plot where the most frequent strings are
removed and see what happens.
> example by coloring the dots in the charts
I would like to, but I haven't discovered how to manipulate the dots in the
Excel spreadsheet. Any wizard on the list who knows that? I would like to
make the dots smaller, so the picture isn't so cluttered.
> Unfortunately the dots are too dense to observe the fine structure
> inside the clusters
That's what I mean :-)
> Extremely interesting is the only HORIZONTAL line - close to y=14000.
> It means that a part of the text exists, where many of the previously
> used expressions are repeated, but these expressions are not the
> common ones (they do not lie on distinct vertical lines). What does
> it mean? This part is located at the end of part two and the line
> extends exactly over the part two. Kind of a summary of the whole
> part? what folio (folios) does it correspond to?
I think you grasp the meaning of the output better than I do :-) but I think
I'll try to locate the strings in the text. And I'll publish the raw output,
so others can play around with it.
In the meantime I've been looking at the strings that the algorithm produces
and doing some statistics on them. They look interesting but there might
also be some artifacts produced by what you said above (see ***).
> Btw.: Do you have anything to do with Czechia or Prague? I have seen
> some Czech texts on our web site. I am from Prague.
I was born in Prague, but at age 7 my parents emigrated with me to the
Netherlands. That was in '69. I'm Dutch, but I still speak Czech and visit
Prague regularly. I love big cities, but there are no real big cities in the
Netherlands. Amsterdam is just an approximation :-)
Greetings, Petr Kazil
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: