[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: Duplicate word search



Mathew,
you are right by all means, this was what i exactly did. My fault was in
describing not in implementing the algorithm. One of the interseting result
was the elimination of most "*" and some "typos".
Claus

-----Ursprungliche Nachricht-----
Von: mskala@xxxxxxxxxxxxxxxxx [mailto:mskala@xxxxxxxxxxxxxxxxx]
Gesendet: Samstag, 10. Marz 2001 17:05
An: Claus Anders
Cc: voynich@xxxxxxxx
Betreff: Re: Duplicate word search


On Sat, 10 Mar 2001, Claus Anders wrote:
> To compute the hash value for each word difference, I used the formula:
> d=sum(sqrt(c1n*c1n-c2n*c2n)) with c1n the nth character of word 1 and c2n
> the same for word 2. if d was lower than a certain e , the word with the
> higher frequency was chosen.

I think you might get better results with
sqrt(sum((c1n-c2n)*(c1n-c2n))).  That's a much more conventional
"distance" measurement, and has the advantage that you don't need to do
any "choosing" of which word should be c1 and which should be c2 - the
argument of sqrt() is guaranteed to be nonnegative.

Matthew Skala
mskala@xxxxxxxxxxxxxxxxx                   :CVECAT DELENDA EST
http://www.islandnet.com/~mskala/