[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: Duplicate word search

To: <mskala@xxxxxxxxxxxxxxxxx>
Subject: AW: Duplicate word search
From: Claus_Anders@xxxxxxxxxxx (Claus Anders)
Date: Sun, 11 Mar 2001 16:45:18 +0100
Cc: "vms Liste" <voynich@xxxxxxxx>
Importance: Normal
In-reply-to: <Pine.LNX.4.21.0103100801100.29375-100000@diamond.ansuz.sooke.bc.ca>

Mathew,
you are right by all means, this was what i exactly did. My fault was in
describing not in implementing the algorithm. One of the interseting result
was the elimination of most "*" and some "typos".
Claus

-----Ursprungliche Nachricht-----
Von: mskala@xxxxxxxxxxxxxxxxx [mailto:mskala@xxxxxxxxxxxxxxxxx]
Gesendet: Samstag, 10. Marz 2001 17:05
An: Claus Anders
Cc: voynich@xxxxxxxx
Betreff: Re: Duplicate word search

On Sat, 10 Mar 2001, Claus Anders wrote:
> To compute the hash value for each word difference, I used the formula:
> d=sum(sqrt(c1n*c1n-c2n*c2n)) with c1n the nth character of word 1 and c2n
> the same for word 2. if d was lower than a certain e , the word with the
> higher frequency was chosen.

I think you might get better results with
sqrt(sum((c1n-c2n)*(c1n-c2n))).  That's a much more conventional
"distance" measurement, and has the advantage that you don't need to do
any "choosing" of which word should be c1 and which should be c2 - the
argument of sqrt() is guaranteed to be nonnegative.

Matthew Skala
mskala@xxxxxxxxxxxxxxxxx                   :CVECAT DELENDA EST
http://www.islandnet.com/~mskala/

References:
- Re: Duplicate word search
  - From: mskala

Prev by Date: Re: Jakob Bartsch / Bartschius / Barschius
Next by Date: Re: Georgius is Baresch, oh well...
Previous by thread: Re: Duplicate word search
Next by thread: WG: Böhm/Bohemia and awk (OT=other topic ;-))
Index(es):
- Date
- Thread