[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
identify a text's author or language
Dear all,
the is someone at an Italian University, who claims to identy an Author
and/or his language by using the ZIP algorithm.
1. take any text greater than n Bytes, compress it with ZIP "known text"
2. Add more text and compress it too - this is the "unknown" text
3. compare difference of length of compressed text in step 1 and 2 . If you
yield a minimum difference, they claim, the "unknown" text is derived form
the "known" text's language or even from the same author.
This procedure reminds me of the "entropy test", which was done on the VMS
years ago.
Any comments?
Claus