[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: VMS Word context similarities





Koontz John E wrote:

We expect two or three levels of organization in a text.

Yes. First you have a plaintext, then you chipher it, and then rewrite (and maybe rechipher) it with these fancy symbols.



- We expect the ordering imposed by the syntax of the language.

Even if there's ciphered text under this writing...



- We expect that letters (or really the underlying sounds) - We expect a

I am maybe annoying for you veterans (additionally that I might be wrong). So bare me..


Firstly you should ignore everything about language and consider it as raw data and handle it with maths.

(Obviously any of these assumptions may be defeated if the text is not
based on a phonological representation of a typical human language.)

I point this out because one of the difficulties of collocational (such as
Mark has done) or syntactic analysis (notably, Jorge Stolfi, of course) on
the VMs is that unless we can guarantee that we can properly distinguish
letters and words our analyses may involved uncertain or mingled levels.
For example, in the present context, we can't tell if a set reflects
perhaps a set of copulas coming between subjects and predicates, or a set
of common prefixes, or even vowels (between consonants).  Hypothetically a
collocation might even reflect a combination of phrase initial words,
prefixes, and word-initial letters, if the VMs is cleverly enough encoded.

Alternatively, in discussing difficulties with taking the "letter-space"
separated elements (EVA characters) as letters in the past I've pointed
out that if we don't know if the EVA letters are the actual
letter-elements, then a grammar of them might mingle canonical form and
morphology.

Still, assuming that we are dealing with a phonetic text and more or less
natural language, then if the VMs words represent something other than
words per se, they would probably still be more or less ordered.  For
generality we might want to allow that the perceived EVA characters and
perceived word divisions represent variably more or less than letters or
word, but are nevertheless ordered.  Or we might want to allow local
reordering (inverted, halves swapped, Pig-Latined, etc.).

In any event, if the VMs encodes a text in some language, then one way or
another we need to start by identifying the letter and word units.

Ciphered text doesn't necessarily show the structure of the language used to write the plaintext.


Repeated experiments of the sort Mark and othes report suggest that we're
somehow off a bit in this respect, but right in assuming the ordered text.

A question that occurs to me is whether all VMs words can be accounted for
in terms of sequences of shorter words.  I think someone must have looked
at this.

It occurs to me that letter-frequency lists don't usually list word
separator!

I was thinking that from word frequency list all words which are there only once (or which are very rare) should be ignored in the beginning.
To get word atoms
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list