[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Some methods - do you know them ?



30/10/2003 11:31:00 PM, "PK#01" <pklist01@xxxxxxxxx> wrote:

>I'm reading "Deciphering the Indus Script" by Asko Prpola and on pages
>97-100 he mentions some methods for attacking relationships within unknown
>languages:

>- the syntagmaic approach that studies the probabilities of sign
>co-occurence

Hmm...

>- an approach based on the work by Zellig Harris

I can only think of what Harris proposed for the segmentation
of continuous text into its constituent morphs. Let me call it
"the surprise factor". You read an English text. So far, you
have read "fea". You are not very surprised when the next letter
turns out to be 't': "feat". You are even less surprised when
the next is 'h': "feath", and from then on, there are no surprises
until you have read "feather". From then on, you are no longer
so sure. The next letter could be an 's', or it could be
anything, being the beginning of another word. That is what
Harris wrote, just translated into plain, pedestrian English.

>- the paradigmatic approach based on the degree of similarity between two
>signs - he gives an example that finds synonyms and antonyms in an english
>text

You mean, I suppose (I haven't read Parpola's book) the degree of
similarity between the contexts of two signs? I tried that many
years ago (about 20-25 years) on other data, and it worked nicely.
Stephen (or Steven?) Finch did it too in his PhD thesis, on
computer-generated text, and it worked nicely too.
But to do it on the VMs, you need to segment the text first,
which no-one has done. I did try a text segmentation algorithm,
in 1977 or so, and it worked... sort of. Not reliably enough, 
though. It was based on Sukhotin's segmentation algorithm,
but I had added a measure of the "surprise factor" which 
relied on the hypergeometric distribution (I remember having
reinvented Pascal's triangle for the purpose).

>Before I start looking into the algorithms (they're not in the book, I have
>to dig deeper in the literature) I wouldl ike to know if someone ever tried
>this on the VMS.

You'll find all the necessary algorithms in the archives, adapted from
the French (by yours faithfully) themselves translated from the Russian. 
Well, I think, they are all there. It's a long time ago.



______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list