[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Text custering approaches (finding synonyms & paragraphs in text)

To: <vms-list@xxxxxxxxxxx>
Subject: VMs: Text custering approaches (finding synonyms & paragraphs in text)
From: "PK#01" <pklist01@xxxxxxxxx>
Date: Wed, 24 Dec 2003 12:00:15 +0100
References: <Pine.LNX.4.44.0312241047030.15122-100000@lin.ehi.ee>
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx

> So, the people do not care very much, is it
> written "olodabas", " oladbas", "olodobos", maybe more. And the poor
scientist
> 500 years later believes, these all are different words...

This might be tested by using the methods described in:

Unsupervised discovery of morphologically related words based on
orthographic and semantic similarity
http://www.cogsci.ed.ac.uk/sigphon/papers/BaroniMatiasekTrost02.pdf

Finding Semantically Related Words in Large Corpora
http://nlp.fi.muni.cz/publications/tsd2001_smrz_pary/tsd2001_smrz_pary.pdf

However I tried this approach on Lovecraft's "At the mountains of madness"
and didn't have much success with it. I was able to separate two classes of
words (colors versus numbers) but I wasn't able to make useful clusters out
of a random sample of words. So my first guess is that VMS is too small to
apply statistical clustering alogrithms. But I may be worong.

+ new idea: text clustering

While looking for the above article I found this interesting approach. I'll
have to think about it, it's not easy stuff. But it might help us to find
sentences or paragraphs in the VMS, if they exist:

Detecting Subject Boundaries Within Text: A Language
Independent Statistical Approach
http://acl.ldc.upenn.edu/W/W97/W97-0305.pdf

MULTI-PARAGRAPH SEGMENTATION OF EXPOSITORY TEXT
http://www.sims.berkeley.edu/~hearst/papers/tiling-acl94/acl94.html

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

References:
- Re: VMs: left & right word entropy
  - From: Mart Vabar

Prev by Date: Re: VMs: Libra
Next by Date: Re: VMs: Text custering approaches (finding synonyms & paragraphs in text)
Previous by thread: Re: VMs: left & right word entropy
Next by thread: Re: VMs: left & right word entropy
Index(es):
- Date
- Thread