[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: word length counts & Computational Linguistics
Thanks... ... I will try to dig myself in it...
On Tue, 8 Jul 2003, [iso-8859-1] Brett Cotton wrote:
> Matt,
>
> I took a while to get round to replying to this one, had to check a couple of things. On suffixes there is a bit of info. in D'Imperio in the tables at the back but I am not sure about statistical info. However ...
> I read an interesting article in the Journal of Computational Linguistics from a few years back (I'll dig out the reference). The basic idea is that John Goldsmith has discussed a method of computer analysis of an unknown (but Indo European?) language and to generate a morphological analysis, i.e. grammar, suffixes, etc. He has implemented this as a Windows proggy called Linguistica 2001 that can be downloaded here:
> http://humanities.uchicago.edu/faculty/goldsmith/Linguistica2000/
> and there is a pdf of his paper there as well. (Get a copy of "Easy PDF Converter" to convert to .txt :) )
>
> I have extracted the abstract and this follows:
>
> This study reports the results of using minimum description length (MDL) analysis to model
> unsupervised learning of the morphological segmentation of European languages, using corpora
> ranging in size from 5,000 words to 500,000 words. We develop a set of heuristics that rapidly
> develop a probabilistic morphological grammar, and use MDL as our primary tool to determine
> whether the modifications proposedby the heuristics will beadopted or not. The resulting grammar
> matches well the analysis that would be developed by a human morphologist.
> In thefinal section, we discuss the relationship of this style of MDL grammatical analysis to
> the notion of evaluation metric in early generative grammar.
>
> *** end of abstract
>
> Cheers,
> Brett
>
> Mart Vabar <mesinik@xxxxxx> wrote:
>
>
> On Fri, 4 Jul 2003, GC wrote:
>
> > 96 pages
> > 31,412 glyphs or characters
> > 8,175 words
> > 2,940 unique words
>
> how much it changes, if we cut a character or a pair in longer words?
> has anybody counted how many suffixes VMS has?
>
> ______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
> unsubscribe vms-list
>
>
>
>
> ---------------------------------
> Yahoo! Plus - For a better Internet experience
>
--
kontaktinfo ja telefonid:
http://www.ehi.ee/~mesinik
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list