VMs: And another one ...

It will take some hard study to grasp this, I fear.
That's why I ask first if you think it's worth the effort :


Abstract. This paper presents a simple unsupervised learning algorithm for
synonyms, based on statistical data acquired by querying a Web search
engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information
(PMI) and Information Retrieval (IR) to measure the similarity of pairs of
words. PMI-IR is empirically evaluated using 80 synonym test questions from
the Test of English as a Foreign Language (TOEFL) and 50 synonym test
from a collection of tests for students of English as a Second Language
(ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is
with Latent Semantic Analysis (LSA), which achieves a score of 64% on
the same 80 TOEFL questions. The paper discusses potential applications of
new unsupervised learning algorithm and some implications of the results for
LSA and LSI (Latent Semantic Indexing).

