[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: The Key -- [case against "qo"]
At 00:59 15/02/2004 -0800, Jonathan Lopez wrote:
One of the more interesting this I have noticed is
when translating certain texts that the english
language has alot of redundent letters. For example
"Q,C, and K" all useless letters. Looking at the VMS i
get the feeling that alot of the "useless" letters are
FWIW Peter Fenwick's (1997) classic text compression paper "Symbol Ranking
Text Compression with Shannon Recodings" briefly describes Claude Shannon's
1951 experiments into the predictability (information content) of English:-
In 1951 C.E. Shannon published his classic paper on the
information content of
English text, establishing the well-known bounds of 0.6 ? 1.3 bits
per letter [Shannon
51]. What is perhaps less recognised is the method by which he
obtained those results,
and it is that which is used here as the basis of a text compressor.
Shannon actually describes two methods. In both of them a person
is asked to predict
letters of a passage of English text. Shannon also shows that the
responses to the
predictions are equivalent to the original text and that an
?identical twin? or its
mathematical equivalent could be used to recover the original
input. In both cases the
person effectively prepares a ranked list of the probable symbols,
most probable first,
and presents this list to the comparator.
1. In the first method, the person predicts the letter and is then
told ?correct?, or is
told the correct answer.
2. In the second method, the person must continue predicting until
answer is obtained. The output is effectively the position of the
symbol in the
list, with the sequence of ?NO? and the final ?YES? responses a
representation of that rank or position.
Cheers, .....Nick Pelling....
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: