[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: John Chadwick (Linear B) of corpus size. Comments invited.
I've been away on a trip, and only just saw this correspondence
about Chadwick's formula. I agree with Jacques, it would be
horrible if "Chadwick's formula" were to be presented to the
world in a widely distributed popular book. I think Andras
is right about the genesis of the n^2 formula: if one is to
make use of bigram counts (as Kober and Ventris did) one needs
enough data that the bigram count distribution is distinguishable
from some uninteresting null hypothesis distribution, and the
traditional rule of thumb for this is that the expected count per
cell should be 5 or more. (I am sure this is a pessimistic rule
of thumb, but the principle that the bigram distribution be
noticeably different from the null hypothesis is sound.)
On May 12, 15:22, Karl Kluge wrote:
> Subject: Re: John Chadwick (Linear B) of corpus size. Comments invited.
>
> I seem to recall Jim Reed's paper on Trithemus containing reference to
> the number of characters one needs (on info theoretic grounds) to solve
> a monoalphabetic. Jim?
This is the "unicity distance formula" of Claude Shannon,
"Communication Theory of Secrecy Systems", Bell Labs Technical
Journal, vol 28, 1949, pp. 656-715; see section 15. A very clear
explanation of this is in a little book about information theory
by Gordon Raisbeck, which I seem to have mislaid or lost. It
is cited in our Friedmans' "The Shakespearean Ciphers Examined",
pp. 22-26. They refer to Shannon's section 16, which is about
validity of solution, which cites as examples of invalid
cryptographic solutions the Bacon-Shakespeare ciphers and the
Voynich MS. See also essays by Cy Deavours and myself on
the "unicity distance" in Cryptologia, vol. 1, numbers 1, and
respectively, both 1977. And there is another by Martin
Hellman, in IEEE Trans Info Theory, May 1977. (The Deavours
and Reeds essays are anthologized in "Cryptology Yesterday, Today,
and Tomorrow", Artech House, 1987.)
--
Jim Reeds, AT&T Labs - Research
Shannon Laboratory, Room C229, Building 103
180 Park Avenue, Florham Park, NJ 07932-0971, USA
reeds@xxxxxxxxxxxxxxxx, phone: +1 973 360 8414, fax: +1 973 360 8178