[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: qo-words MORE
Dear Nick,
there are many natural languages with a very strange spacing -
(1) arabic language when spaces occures because word graphic
for example "w","r","d" have always a small space after them
(2) subsentence-oriented spasing as in chinese or japanese
(3) long-word-construction rules, where you can make words with unlimited
length as in turkish and german
I think that the additional spacing-rule (if it exists) should produce some
grammar change and, if we suppose that it is an unknown natural language, we
can find such new "improved" grammar.
> Dear Akinori,
>
> >I'm not an expert of cryptgraphy, so can you tell me if 15th Century
> >cryptlogists knew about attack through statistical analysis?
>
> Yes - in the very early 15th Century, Italian code-makers started to use
> multiple characters for vowels (and for frequently used letters in
> general). This is an indication that code-breakers were using vowels as a
> "lever" to break open the code, and that code-makers were responding - it
> was a cryptographic "arms race" back then. This corresponds to "attack
> through tacit statistical analysis".
>
> In 1474, Cicco Simonetta wrote the first known (though quite short)
> specifically cryptologic paper: "Regule ad extrahendum litteras ziferatas,
> sine exemplo" - which corresponds to "attack through explicit statistical
> analysis". You can see it here (in the Ciphers section):-
> http://www.library.yale.edu/Ilardi/il-toc.htm
>
> Cicco Simonetta was quite an extraordinary man - like Vladimir Putin (and
> George Bush Sr), he was a statesman who'd reached the top having originally
> run a state's secret service (Milan's, in Simonetta's case, which
> necessarily involved a lot of exposure to codes and ciphers). He also
> became extraordinarily rich - but was executed in 1480 after falling foul
> of a power struggle within the Sforza family.
>
> >My analysis was theory-driven, that was: VMS `words' are really
> >words. (No theory is a kind of theory :-)
> >This assumption was rejected by the observation. Now, there seems to be
> >the following (or more) possibilities about VMS `words':
> >
> >(1) VMS is just a bunch of nonsense. (I don't want to believe it)
> >(2) Word order is shuffled in some way, as someone pointed out in this list
> > (I tested it by shuffling English text. The contextual property of the
> > randomly shuffled text was very similar to that of VMS)
> >(3) Some meaningless garbage characters are mixed into words.
> > (For example, i/ii/iii are identical)
>
> Perhaps just as important is the observation that the apparent word-length
> has an artificial-looking distribution that you probably wouldn't get from
> real languages - this has been discussed quite extensively on-list in the
past.
>
> For me, when you combine (a) the extremely small alphabet, (b) the tendency
> for certain letters to appear at the beginning and end of "words" and (c)
> the artificial word-length stats, it seems to imply one hypothesis quite
> strongly: that spaces are probably inserted in a stream of characters by
> following some kind of *encoding rule*.
>
> I'm thinking of a superficial (ie, non-semantic) rule like: "insert a space
> after <in> or <ir> (etc), or before <q>, <of> or <f> (etc)... if it looks
> nice."
>
> This kind of thing would give the apparent (but misleading) structure to
> the text that we see - and that (hence) VMS "words" are merely superficial
> coding artefacts, and have no intrinsic meaning.
>
> That's not to say that there probably (IMO) isn't a deep structure to
> Voynichese - rather, that spaces are designed both to beautify the text and
> to misdirect code-breakers, and that the deep structure lies elsewhere. :-)
>
> Here are some possible ideas to test this general hypothesis:-
> (1) Given a table with Currier-style entries on both axes, representing
> pairs of letters [A,B], what is the ratio of (# of <AB> instances) to (# of
> <A B> instances) in the text? ie, given a left context and a right
> context, how likely is it that an artificial space would be inserted
> between them?
> (2) Given a corpus of VMS text with spaces removed, what proportion of the
> spacing as observed can be generated by a simple set of (purely
> letter-based) rules? ie, how much *generative* information is in the spaces?
> (3) Are there statistical differences in the role of "space" between
> sections, between languages, or between individual pages?
>
> Plenty to think about! :-)
>
> Best regards, .....Nick Pelling.....
>
> ______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
> unsubscribe vms-list
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list