[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: qo-words MORE



Dear Nick,
 there are many natural languages with a very strange spacing -
(1) arabic language when spaces occures because word graphic
for example "w","r","d" have always a small space after them
(2) subsentence-oriented spasing as in chinese or japanese
(3) long-word-construction rules, where you can make words with unlimited 
length as in turkish and german

I think that the additional spacing-rule (if it exists) should produce some 
grammar change and, if we suppose that it is an unknown natural language, we 
can find such new "improved" grammar.


> Dear Akinori,
> 
> >I'm not an expert of cryptgraphy, so can you tell me if 15th Century
> >cryptlogists knew about attack through statistical analysis?
> 
> Yes - in the very early 15th Century, Italian code-makers started to use 
> multiple characters for vowels (and for frequently used letters in 
> general). This is an indication that code-breakers were using vowels as a 
> "lever" to break open the code, and that code-makers were responding - it 
> was a cryptographic "arms race" back then. This corresponds to "attack 
> through tacit statistical analysis".
> 
> In 1474, Cicco Simonetta wrote the first known (though quite short) 
> specifically cryptologic paper: "Regule ad extrahendum litteras ziferatas, 
> sine exemplo" - which corresponds to "attack through explicit statistical 
> analysis". You can see it here (in the Ciphers section):-
>          http://www.library.yale.edu/Ilardi/il-toc.htm
> 
> Cicco Simonetta was quite an extraordinary man - like Vladimir Putin (and 
> George Bush Sr), he was a statesman who'd reached the top having originally 
> run a state's secret service (Milan's, in Simonetta's case, which 
> necessarily involved a lot of exposure to codes and ciphers). He also 
> became extraordinarily rich - but was executed in 1480 after falling foul 
> of a power struggle within the Sforza family.
> 
> >My analysis was theory-driven, that was: VMS `words' are really
> >words. (No theory is a kind of theory :-)
> >This assumption was rejected by the observation. Now, there seems to be
> >the following (or more) possibilities about VMS `words':
> >
> >(1) VMS is just a bunch of nonsense. (I don't want to believe it)
> >(2) Word order is shuffled in some way, as someone pointed out in this list
> >     (I tested it by shuffling English text. The contextual property of the
> >      randomly shuffled text was very similar to that of VMS)
> >(3) Some meaningless garbage characters are mixed into words.
> >     (For example, i/ii/iii are identical)
> 
> Perhaps just as important is the observation that the apparent word-length 
> has an artificial-looking distribution that you probably wouldn't get from 
> real languages - this has been discussed quite extensively on-list in the 
past.
> 
> For me, when you combine (a) the extremely small alphabet, (b) the tendency 
> for certain letters to appear at the beginning and end of "words" and (c) 
> the artificial word-length stats, it seems to imply one hypothesis quite 
> strongly: that spaces are probably inserted in a stream of characters by 
> following some kind of *encoding rule*.
> 
> I'm thinking of a superficial (ie, non-semantic) rule like: "insert a space 
> after <in> or <ir> (etc), or before <q>, <of> or <f> (etc)... if it looks 
> nice."
> 
> This kind of thing would give the apparent (but misleading) structure to 
> the text that we see - and that (hence) VMS "words" are merely superficial 
> coding artefacts, and have no intrinsic meaning.
> 
> That's not to say that there probably (IMO) isn't a deep structure to 
> Voynichese - rather, that spaces are designed both to beautify the text and 
> to misdirect code-breakers, and that the deep structure lies elsewhere. :-)
> 
> Here are some possible ideas to test this general hypothesis:-
> (1) Given a table with Currier-style entries on both axes, representing 
> pairs of letters [A,B], what is the ratio of (# of <AB> instances) to (# of 
> <A B> instances) in the text? ie, given a left context and  a right 
> context, how likely is it that an artificial space would be inserted 
> between them?
> (2) Given a corpus of VMS text with spaces removed, what proportion of the 
> spacing as observed can be generated by a simple set of (purely 
> letter-based) rules? ie, how much *generative* information is in the spaces?
> (3) Are there statistical differences in the role of "space" between 
> sections, between languages, or between individual pages?
> 
> Plenty to think about! :-)
> 
> Best regards, .....Nick Pelling.....
> 
> ______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
> unsubscribe vms-list


______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list