[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: RE: Numbercrunching "word" tuples



Hi Petr,
I'm always using "awk" for pattern analysis.It's easy to use and quite
fast.Scanning the entire VMS EVA transscription /about 4100 lines of text-
only the <...F> lines - takes about 1sec.I think there is nothing gained by
ignoring "spaces" as they really seem to be "word" separators.But when time
available, I can do this analysis too.As there are nothing like sentence
marks be seen in the VMS (AFAIK that was common at this time), you can make
up such double words in any language like "what is this? This is...." (in
Japanese for ex. Niwa niwa niwa niwa tori desu.)

My string analysis is finished, and to my surprise found with original text,
the longest string, which occurs more than  once was 22 char. long.And I
belive that some chars can replaced by other, because their behaviour is
exactly the same (and they somehow look alike or mybe "typos").With
replacement
of: che by she, qo by o, eed and eee by ed  (and maybe cthy=dy), I got the
following: (formatted by me):
Now: longesat string is 27 charts long!

The result of my longest string analysis (all meta symbols removed):

Longest strings:

      hedy ol      shedy okaiin  chcth
                      hy okaiin  chol   kaiin chckh
              aiin she y okaiin  chal    aiin
         y ot aiin shedy okaiin  shed
       edy ot aiin shedy okaiin  sh
        dy ot aiin shedy okaiin  she
              aiin shedy okaiin  shedy ol      shedy
                   shedy okaiin  shedy okedy  l
                    hedy otaiin  shedy okaiin  s
              aiin shedy otaiin  shedy ok
              otey shedy okal    shedy okaii
                   shedy okedy   shedy okedy  sh
                     edy okedy   shey  okedy okedy
                dy okedy okedy   shedy okaii
              kedy okedy okedy   shedy oka
                    kedy okedy   shey  okedy   oked
                   shedy okedy   okedy okedy   shedy
                      dy okedy   okedy okedy   sheda
                      ey okedy   okedy otedy l shed
                   shedy okedy   okedy okedy   sh
                     hey okedy   okedy otedy l she
                    kedy okedy   okedy okedy   she
                     edy okedy   okedy shedy   okai
                    kedy otedy   okedy okedy   oke
                     edy otedy   okedy okedy   oked
                      dy otedy   okedy okedy   okedy
                dy shedy ote y   shedy okaiin
               edy shedy ote y   shedy okaii
              aiin shedy ok aiin shecthy
                      dy ok aiin shedy shedy  tedy

Please look at the text with a non-proportional font, and you see a
structure, which I believe shows something about the grammatical structure
of the VMS language.This "sentences" are occuring at least twice during the
VMS.Up to now I looked only at strings with length > 20, but I think, the
structure will be consistent.

> -----Original Message-----
> From:	Petr Kazil [SMTP:kazil@xxxxxxxxxx]
> Sent:	Monday, March 25, 2002 6:21 PM
> To:	voynich@xxxxxxxxxxxxxx
> Subject:	VMs:  Numbercrunching "word" tuples
> 
> Very interesting for me, being a newbie. What tools and input
> transcription
> did you use? And how long is the input transcription? (If you give me a
> pointer I'll look up the rest myself.) Your list is very interesting. I
> can't but wonder about coincidences like the following:
> 
> 23 chedy qokaiin *
> 20 qokaiin chedy
> 
> 24 daiin daiin **
> 22 chol chol
> 15 qokeedy qokeedy
> 
> 19 shedy qokeedy ***
> 19 shedy qokedy
> 
> I don't have the tools yet to check it myself, but where would patterns
> like
> this appear frequently  in a natural language? I tried to find these
> patterns in a random English book
> but found few at first sight:
> 
> * I think that pairs like "if that" / "that if" might be the most
> frequent.
> ** At the moment I can only think of an example in Dutch that always
> confuses my spelling checker. It goes something like this: "Alle
> aandachtspunten die onderzocht zijn, zijn in orde bevonden." And then
> there's a comma in between the two "zijn".
> *** This one is easier: "of this", "of the" and "of a".
> 
> Not that this yields many insights, but it's an amusing exercise. And this
> kind of analysis could even be applied to chinese characters :-)
> 
> I am still inclined to write a "long pattern seeker" that would find long
> repeating patterns ignoring the spaces. Maybe longer patterns than 4 words
> would emerge?
> 
> Greetings, Petr