[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: word length counts & Computational Linguistics




Thanks... ... I will try to dig myself in it...


On Tue, 8 Jul 2003, [iso-8859-1] Brett Cotton wrote:

> Matt,
>  
> I took a while to get round to replying to this one, had to check a couple of things. On suffixes there is a bit of info. in D'Imperio in the tables at the back but I am not sure about statistical info. However ...
> I read an interesting article in the Journal of Computational Linguistics from a few years back (I'll dig out the reference). The basic idea is that John Goldsmith has discussed a method of computer analysis of an unknown (but Indo European?) language and to generate a morphological analysis, i.e. grammar, suffixes, etc. He has implemented this as a Windows proggy called Linguistica 2001 that can be downloaded here:
> http://humanities.uchicago.edu/faculty/goldsmith/Linguistica2000/
> and there is a pdf of his paper there as well. (Get a copy of "Easy PDF Converter" to convert to .txt :) ) 
>  
> I have extracted the abstract and this follows:
>  
> This study reports the results of using minimum description length (MDL) analysis to model
> unsupervised learning of the morphological segmentation of European languages, using corpora
> ranging in size from 5,000 words to 500,000 words. We develop a set of heuristics that rapidly
> develop a probabilistic morphological grammar, and use MDL as our primary tool to determine
> whether the modifications proposedby the heuristics will beadopted or not. The resulting grammar
> matches well the analysis that would be developed by a human morphologist.
> In thefinal section, we discuss the relationship of this style of MDL grammatical analysis to
> the notion of evaluation metric in early generative grammar.
>  
> *** end of abstract
>  
> Cheers,
> Brett
> 
> Mart Vabar <mesinik@xxxxxx> wrote:
> 
> 
> On Fri, 4 Jul 2003, GC wrote:
> 
> > 96 pages
> > 31,412 glyphs or characters
> > 8,175 words
> > 2,940 unique words
> 
> how much it changes, if we cut a character or a pair in longer words?
> has anybody counted how many suffixes VMS has?
> 
> ______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
> unsubscribe vms-list
> 
> 
> 
> 
> ---------------------------------
> Yahoo! Plus - For a better Internet experience
> 

-- 
kontaktinfo ja telefonid:
http://www.ehi.ee/~mesinik

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list