[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: identify a text's author or language
At 11:01 PM 1/29/02 +0000, Jacques Guy wrote:
29/01/02 11:50:17, "Anders, Claus" <Claus.Anders@xxxxxxxxxxxxx> wrote:
>1. take any text greater than n Bytes, compress it with ZIP "known text"
>2. Add more text and compress it too - this is the "unknown" text
>3. compare difference of length of compressed text in step 1 and 2 . If you
>yield a minimum difference, they claim, the "unknown" text is derived form
>the "known" text's language or even from the same author.
I would say "congruent with" or "drawn for the same corpus", rather
than "derived from". But this is nit-picking.
I'd agree. It would also be useless with something taken from physical and
oral transmission, to text,
or based on something secret or esoteric.
Eg: Carlos casteneda, L. Ron Hubbard
The question: how small is "minimum"?
I would also say that producing the zipped files is unncessary, and,
in fact, amounts to throwing out a great deal of information, since
you end up with a single figure. It would be far more informative
to compare the two Huffmann trees computed in the first stage of
(All this is off the top of my head, before I forget it)
I like the sentence structure analysers. The shareware ones are adequate, but
i'd like to have the industrial grade ones used by the three letter
a more in depth analysis. Because it can find sentence deviations which can be
cut-and-paste's or emotional content. There is a nice balance between
and psychological analysis. Not just informational analysis.