[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: The Key -- [case against "qo"]



Hi Jonathan,

At 00:59 15/02/2004 -0800, Jonathan Lopez wrote:
One of the more interesting this I have noticed is
when translating certain texts that the english
language has alot of redundent letters. For example
"Q,C, and K" all useless letters. Looking at the VMS i
get the feeling that alot of the "useless" letters are
removed.

FWIW Peter Fenwick's (1997) classic text compression paper "Symbol Ranking Text Compression with Shannon Recodings" briefly describes Claude Shannon's 1951 experiments into the predictability (information content) of English:-


In 1951 C.E. Shannon published his classic paper on the information content of
English text, establishing the well-known bounds of 0.6 ? 1.3 bits per letter [Shannon
51]. What is perhaps less recognised is the method by which he obtained those results,
and it is that which is used here as the basis of a text compressor.


Shannon actually describes two methods. In both of them a person is asked to predict
letters of a passage of English text. Shannon also shows that the responses to the
predictions are equivalent to the original text and that an ?identical twin? or its
mathematical equivalent could be used to recover the original input. In both cases the
person effectively prepares a ranked list of the probable symbols, most probable first,
and presents this list to the comparator.


1. In the first method, the person predicts the letter and is then told ?correct?, or is
told the correct answer.


2. In the second method, the person must continue predicting until the correct
answer is obtained. The output is effectively the position of the symbol in the
list, with the sequence of ?NO? and the final ?YES? responses a unary-coded
representation of that rank or position.


Cheers, .....Nick Pelling....


______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list