Hi Robert,
here a proposal for such a pseudo-code:
1. make a list of all token (strip off any non char)
2. compare each token against each (excluding the same token)
3. test if both token have the same length.
4. if no: test if the difference of the length is 1 and the shortest in completely embedded in the other one: increase the count for the 1st (or last) char, if the short token starts at the 2nd (1st) char of the longer token.if not embedded or diff > 1- discard comparison
5. if length is the same: compare each char from left to right:if different, mark both chars as candidate.test until next different cahr found (discard comparison) or token is compelete processed.Incrrease count for both cadidates in that case.
6. When all tokens are processed, you'll end up with a count for each char, telling you how often the char distiguishes one simlar token from another.
This can easily implemented with gawk, but will take some time.
Is that, what you wanted to accomplish?
Cheers
Claus
-----Ursprüngliche Nachricht-----
Von: Robert Antony Hicks [mailto:rob_hicks_vms@xxxxxxxxxxx]
Gesendet: Freitag, 21. Februar 2003 12:07
An: vms-list@xxxxxxxxxxx
Betreff: VMs: Trying to create a test for letter 'value'
I wish to create a test for the 'value' of letters within the VMS. I have
the basic programming skills to make a utility, but I need a bit of help
establishing what algorithm to use. I'd better explain what I mean by
value-
I define the value, V, of a particular letter as a measure of how frequently
the letter is the difference between two otherwise identical words.
For example,
Consider a text containing just four words : okol okoy okcy kcy
There are 5 different letters in the text : o, k, l, y and c.
Letter o has V=2/4=0.5 because it distinguishes between okol and okcy and
okcy and kcy out of the three words.
Letter k has V=0 because it does not distinguish between any of the words.
Letter l has V=1/4=0.25 because it distinguishes between okol and okoy out
of the three words.
Letter y has V=1/4=0.25 because it distinguishes between okol and okoy out
of the three words.
Letter c has V=1/4=0.25 because it distinguishes between okoy and okcy out
of the three words.
Thus, a value 'ranking' for the five letters is -
o V=0.5
lyc V=0.25
k V=0
I hope this makes some semblance of sense. I need help with creating an
algorithm for this - can anybody assist with pseudo-code?
Even better - does such a test already exist, and if so, what is it called
and where can I find it?
Rob
_________________________________________________________________
Express yourself with cool emoticons http://messenger.msn.co.uk
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list