[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: RE: Numbercrunching "word" tuples
Hi Petr,
I'm always using "awk" for pattern analysis.It's easy to use and quite
fast.Scanning the entire VMS EVA transscription /about 4100 lines of text-
only the <...F> lines - takes about 1sec.I think there is nothing gained by
ignoring "spaces" as they really seem to be "word" separators.But when time
available, I can do this analysis too.As there are nothing like sentence
marks be seen in the VMS (AFAIK that was common at this time), you can make
up such double words in any language like "what is this? This is...." (in
Japanese for ex. Niwa niwa niwa niwa tori desu.)
My string analysis is finished, and to my surprise found with original text,
the longest string, which occurs more than once was 22 char. long.And I
belive that some chars can replaced by other, because their behaviour is
exactly the same (and they somehow look alike or mybe "typos").With
replacement
of: che by she, qo by o, eed and eee by ed (and maybe cthy=dy), I got the
following: (formatted by me):
Now: longesat string is 27 charts long!
The result of my longest string analysis (all meta symbols removed):
Longest strings:
hedy ol shedy okaiin chcth
hy okaiin chol kaiin chckh
aiin she y okaiin chal aiin
y ot aiin shedy okaiin shed
edy ot aiin shedy okaiin sh
dy ot aiin shedy okaiin she
aiin shedy okaiin shedy ol shedy
shedy okaiin shedy okedy l
hedy otaiin shedy okaiin s
aiin shedy otaiin shedy ok
otey shedy okal shedy okaii
shedy okedy shedy okedy sh
edy okedy shey okedy okedy
dy okedy okedy shedy okaii
kedy okedy okedy shedy oka
kedy okedy shey okedy oked
shedy okedy okedy okedy shedy
dy okedy okedy okedy sheda
ey okedy okedy otedy l shed
shedy okedy okedy okedy sh
hey okedy okedy otedy l she
kedy okedy okedy okedy she
edy okedy okedy shedy okai
kedy otedy okedy okedy oke
edy otedy okedy okedy oked
dy otedy okedy okedy okedy
dy shedy ote y shedy okaiin
edy shedy ote y shedy okaii
aiin shedy ok aiin shecthy
dy ok aiin shedy shedy tedy
Please look at the text with a non-proportional font, and you see a
structure, which I believe shows something about the grammatical structure
of the VMS language.This "sentences" are occuring at least twice during the
VMS.Up to now I looked only at strings with length > 20, but I think, the
structure will be consistent.
> -----Original Message-----
> From: Petr Kazil [SMTP:kazil@xxxxxxxxxx]
> Sent: Monday, March 25, 2002 6:21 PM
> To: voynich@xxxxxxxxxxxxxx
> Subject: VMs: Numbercrunching "word" tuples
>
> Very interesting for me, being a newbie. What tools and input
> transcription
> did you use? And how long is the input transcription? (If you give me a
> pointer I'll look up the rest myself.) Your list is very interesting. I
> can't but wonder about coincidences like the following:
>
> 23 chedy qokaiin *
> 20 qokaiin chedy
>
> 24 daiin daiin **
> 22 chol chol
> 15 qokeedy qokeedy
>
> 19 shedy qokeedy ***
> 19 shedy qokedy
>
> I don't have the tools yet to check it myself, but where would patterns
> like
> this appear frequently in a natural language? I tried to find these
> patterns in a random English book
> but found few at first sight:
>
> * I think that pairs like "if that" / "that if" might be the most
> frequent.
> ** At the moment I can only think of an example in Dutch that always
> confuses my spelling checker. It goes something like this: "Alle
> aandachtspunten die onderzocht zijn, zijn in orde bevonden." And then
> there's a comma in between the two "zijn".
> *** This one is easier: "of this", "of the" and "of a".
>
> Not that this yields many insights, but it's an amusing exercise. And this
> kind of analysis could even be applied to chinese characters :-)
>
> I am still inclined to write a "long pattern seeker" that would find long
> repeating patterns ignoring the spaces. Maybe longer patterns than 4 words
> would emerge?
>
> Greetings, Petr