[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: VBScript for finding repeating strings



It took just 17 hours of number crunching. The full output is now here:
http://uair01.xs4all.nl/Voynich/String_Analysis/Repeated_strings2.html

After all these years I'm still not familiar enough with the VMS to tell
whether this makes sense or not. Does the VMS indeed fall into three
different parts? Am I seeing the same segmentation as Stolfi does here:
http://www.dcc.unicamp.br/~stolfi/voynich/98-06-19-page-plots/plots.html

There is also a list of Voynichese prefixes and suffixes that roll
spontaneously out of this approach. Again, I'm completely in the dark about
what it means, if it means anything :-) But it looks familiar.

> I am trying to understand the "A" images. Referring to Test 2B, I
> think the relatively open vertical bars indicate that few new
> qualifying strings were found in the interval (x-axis ca. 12000) and
> the relatively open horizontal bars show a lapse in the text in which
> there are few repetitions (y-axis ca. 48000).

I think you're mostly right. I say "mostly" because I myself don't yet know
what "right" is. But I think about it the same way as you.

If these are matching strings in a document :

-----xxxxxxxxxx---------xxxxxxxxxx------------

Then the X coordinate is the starting point of the first string, and Y the
starting point of the second string.

> Different subject matter?

The dark triangles are parts with a strong internal coherence. Maybe same
subject matter (see for example test07 where you can clearly see three
modules of C++) , but maybe also the same language (see for example test03
where you can see the difference between Dutch and English).

 The sparsely-filled open bars would form borders of
> triangles. Does the lower edge show first occurrences of qualifying
> strings? But do I see dots above gaps in the lower edge?

At the moment you guess is as good as mine. But it's something in this
direction.

>  How would a "random" text for comparison be constructed? Line shuffling?

I have used random input in test05, and as is to be expected there is no
apparent pattern.

> In the tests, can strings be partially on one line and partially on
> the next line?

Yes they can. I first join all lines by repacing the Newline with a Space.
And just to be sure, I scan for double spaces and remove them before the
analysis.

I have one strange effect that I don't understand. If I scan a small part of
the VMS (test04) I get a recognizeable pattern. Then if I remove all the
spaces (test08) I find a whole lot less matching strings. I wouldn't expect
spaces to be relevant, an my first guess would be that I find the same
matching strings, only with the spaces removed. But I get a really different
result. Or is it a bug in my script or a bug in my reasoning?

I certainly hope I'm not wasting your time with a faulty or irrelevant
result :-)

Greetings, Petr

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list