[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Duplicate word search

To: "Jim Comegys" <Comegys_J@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>, <voynich@xxxxxxxx>
Subject: Duplicate word search
From: Claus_Anders@xxxxxxxxxxx (Claus Anders)
Date: Sat, 10 Mar 2001 11:28:27 +0100
Importance: Normal

Jim Comegys wrote:

>Dear Voynicheros,

>If one of you is computer capable and has a bit of free time, I am doing a
>comparison of duplicates and near duplicates in the VMS, sequences like EVA
>ytchal ytchal and cphor ytchor and the like.  It is slow and dull to search
>these things out visually, does anyone have a program to seek out these?
Or
>better yet a list because maybe you have studied the matter?

>While we are on the matter, where is a good list of every Voynich word and
>its frequency? I can no longer access the old Mik Clarke site.

>Thank you very much, and have a good week-end.

>Jim Comegys, Madera, California

Hi Jim,
I've something similar: I wrote a little awk-script (yes again) which
applied a number to each word (like a hash algorithm) and computed for each
'word' the 'distance' to other. The result was, that a few (~100) words
could easy be misspelling of others (with higher frequency). The only
problem was to define the hash code, which I did visually, i.e 'characters'
with similar look where code with a close number and very different one got
a large number difference.
To compute the hash value for each word difference, I used the formula:
d=sum(sqrt(c1n*c1n-c2n*c2n)) with c1n the nth character of word 1 and c2n
the same for word 2. if d was lower than a certain e , the word with the
higher frequency was chosen.

So, most of the * characters in the transcription could be eliminated.

Claus

Follow-Ups:
- Re: Duplicate word search
  - From: mskala

Prev by Date: More on label anomalies
Next by Date: WG: Böhm/Bohemia and awk (OT=other topic ;-))
Previous by thread: Re: More on label anomalies
Next by thread: Re: Duplicate word search
Index(es):
- Date
- Thread