[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Duplicate word search

To: Claus Anders <Claus_Anders@xxxxxxxxxxx>
Subject: Re: Duplicate word search
From: mskala@xxxxxxxxxxxxxxxxx
Date: Sat, 10 Mar 2001 08:04:37 -0800 (PST)
Cc: voynich@xxxxxxxx
In-reply-to: <KBELLPCIOGPHOGBGEOHGGEFBCAAA.Claus_Anders@t-online.de>

On Sat, 10 Mar 2001, Claus Anders wrote:
> To compute the hash value for each word difference, I used the formula:
> d=sum(sqrt(c1n*c1n-c2n*c2n)) with c1n the nth character of word 1 and c2n
> the same for word 2. if d was lower than a certain e , the word with the
> higher frequency was chosen.

I think you might get better results with
sqrt(sum((c1n-c2n)*(c1n-c2n))).  That's a much more conventional
"distance" measurement, and has the advantage that you don't need to do
any "choosing" of which word should be c1 and which should be c2 - the
argument of sqrt() is guaranteed to be nonnegative.

Matthew Skala
mskala@xxxxxxxxxxxxxxxxx                   :CVECAT DELENDA EST
http://www.islandnet.com/~mskala/

Follow-Ups:
- AW: Duplicate word search
  - From: Claus Anders

References:
- Duplicate word search
  - From: Claus Anders

Prev by Date: Georgius is Baresch, oh well...
Next by Date: Re: Jakob Bartsch / Bartschius / Barschius
Previous by thread: Duplicate word search
Next by thread: AW: Duplicate word search
Index(es):
- Date
- Thread