[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Brute Force attack on VMS

To: Jacques Guy <jguy@xxxxxxxxxxxxxxxx>
Subject: Re: Brute Force attack on VMS
From: mskala@xxxxxxxxxxxxxxxxx
Date: Thu, 31 Aug 2000 19:30:42 -0700 (PDT)
Cc: voynich@xxxxxxxx
Delivered-to: reeds@research.att.com
In-reply-to: <39AEFCC1.8977C3BA@alphalink.com.au>
Sender: jim@xxxxxxxxxxxxx

On Fri, 1 Sep 2000, Jacques Guy wrote:
> is feasible. Now, of all the billions of alternative decipherments it
> produces, how do we tell which is the right one, short of examining
> them? Not only that, but since we really do not know the language of
> the VMS... it is like having ten trillion roomfuls of monkeys hacking

The overwhelming majority of decipherments won't make sense in any
language and can thus be ignored.  If we're using a computer, it's easy to
program it to recognize things that might make sense, at least well enough
to narrow the problem down to few enough candidates that humans can
examine them all.  A more significant problem is what happens where there
are several decipherments that all appear to make sense.  I think that
problem sorts itself out when we consider the question in information
theoretic terms.

When we make a guess about how the manuscript should be deciphered, that
can be seen as our supplying a certain amount of information that wasn't
there to begin with.  If the output of our attempted decipherment contains
very much more information than we supplied, then we can guess that it
must be correct - the extra information must have come from the
ciphertext.  On the other hand, if the amount of information in the
decipherment is comparable to the amount we supplied, it doesn't look good
for the decipherment being correct.

I may be slightly abusing the mathematical definition of information here,
but I'm not sure of a better word to describe what I mean.  Let me give an
example.  Suppose I have this cipher:

  "gppcbs"

I could say "Well, that's a Vigenere cipher, with the key 'klpkxh'; the
plaintext is 'weasel'."  But the amount of information I'm supplying
("it's a Vigenere cipher" and "the key is 'klpkxh'") is a significant
fraction of (indeed, quite a bit more than) the amount of information in
'weasel'.

On the other hand, I could say "It's a Ceasar-shift, each letter of the
ciphertext is the one after the corresponding letter of plaintext; the
plaintext is 'foobar'."  The amount of information in "Caesar shift by one
place" seems to be much smaller.  I think it should be obvious which of
these decipherments is preferable.

We can quibble about the exact amount of information added by the
decipherer in these two decihperments, and even though the information
theorists have a definition it's not much help because it requires that we
assign probabilities to the events we're measuring and we could never
agree on those probabilities.  But I feel certain that if we ever find a
correct decipherment, the amount of information it needs to supply will be
so much smaller than any other candidate decipherments as to make the
argument brief.

This criterion is the reason I'm not convinced by claims of decipherments
that require the manuscript to be written in some natural language with
the vowels removed.  There the amount of information the decipherer must
add (by putting the vowels back in) is much too big a fraction of the
information content of the final product.

Matthew Skala
mskala@xxxxxxxxxxxxxxxxx              I'm recording the boycott industry!
http://www.islandnet.com/~mskala/

References:
- Re: Brute Force attack on VMS
  - From: Jacques Guy

Prev by Date: Re: Brute Force attack on VMS
Next by Date: Entropy VS Word Length
Previous by thread: Re: Brute Force attack on VMS
Next by thread: Re: Brute Force (+n cipher)
Index(es):
- Date
- Thread