[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: identify a text's author or language

To: voynich@xxxxxxxx
Subject: Re: identify a text's author or language
From: Jacques Guy <jguy@xxxxxxxxxxxxxxxx>
Date: Tue, 29 Jan 2002 23:01:42 -0000
Reply-to: jguy@xxxxxxxxxxxxxxxx

29/01/02 11:50:17, "Anders, Claus" <Claus.Anders@xxxxxxxxxxxxx> wrote:

>1. take any text greater than n Bytes, compress it with ZIP "known text"
>2. Add more text and compress it too - this is the "unknown" text
>3. compare difference of length of compressed text in step 1 and 2 . If you
>yield a minimum difference, they claim, the "unknown" text is derived form
>the "known" text's language or even from the same author.

I would say "congruent with" or "drawn for the same corpus", rather
than "derived from". But this is nit-picking.

The question: how small is "minimum"?

I would also say that producing the zipped files is unncessary, and,
in fact, amounts to throwing out a great deal of information, since
you end up with a single figure. It would be far more informative 
to compare the two Huffmann trees computed in the first stage of
the algorithm.

(All this is off the top of my head, before I forget it)

Follow-Ups:
- Re: identify a text's author or language
  - From: Gold residence

Prev by Date: Re: Dana's Botany
Next by Date: Re: identify a text's author or language
Previous by thread: Re: identify a text's author or language
Next by thread: Re: identify a text's author or language
Index(es):
- Date
- Thread