[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: VMs: truncated repeating sequences
John,
Thanks for the thoughts.
I agree it is important not to jump the gun. There does appear
to be strong evidence here of a repeated method, but as you point
out it may not be the *only* method. What I have done so far are
"fast and loose" experiments just to throw up patterns, it now
needs to be worked through properly.
One feature of a window technique as outlined would be a mixture
of a few very common words which are fully represented in the
underlying text such as 'qokeedy' below:
______________
"aiin.shed|y.qokeedy.qo|tedy.qo"
^^^^^^^^^^^^^^
...and much rarer words which are have to be constructed from
consecutive extractions, such as 'shedol':
______________
| "aiin.shed|y.qokeedy.qotedy.qo"
^^^^^^^^^^^^^^
followed by:
______________
"aiin.cthor.ch|ol.chor." |
^^^^^^^^^^^^^^
if this were true all words in the VMs vocab should fall into one of
two categories: (a) single extraction and (b) constructed.
We need to check to see if we can see any evidence of these categories
in the frequencies of VMs words.
I fully take your point about the size and nature of the window,
at the moment it is just my impression looking at some of the sequences
that quite often in practice it amounted to between 10 and 16 chars
wide, but the sizes of the window and text below may not have been
aligned on "cells", and I have no doubt that multiple windows would
have been used overall.
Yet another aspect to be formally worked through is whether this
technique can create the word repetitions and near-repetitions that
we see in the VMs (and in the right frequencies), although most of
the master sequences I have found so far (30) do have similar words in
them.
I find myself actually wondering if each master sequence _was_ written
to look like a valid sentence or phrase. Given that the author knew he
would be extracting only pieces from each line he may have thought there
was no point in each line looking sensible in it's entirety. Maybe
some of the lines were actually just a collection of similar words that
he wanted to be used in a fashion similar to a dictionary or vocabulary
list. This would certainly lead to lots of word-similar repetitions
if a line such as "burk.bark.bank.book.bunch." were used.
However I am certain that the underlying text is NOT a random scattering
of chars.
This emails too long already...! :-)
Marke
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list