[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: word database and binomial distribution



--- Jorge Stolfi <stolfi@xxxxxxxxxxxxx> wrote:
> I suspect that the "second hump" of most of my
> samples -- which
> incidentally are all *literary* works -- comes
> precisely from long
> derived words like "unsympathetic",
> "disillusionment", "periodically",
> "intelligences" (all from the first couple of pages
> of "War of the
> Worlds").

Ah. Hmmm. The database doesn't seem to suffer from
that - at least not systemically on the surface. The
words "unsympathetic", "disillusionment" and
"periodically" are in the database, while
"intelligences" is not (and oddly enough, neither is
"systemically" :). There are some plurals also in
there - "voles" - but not all. I'll try to find some
more reliable, large scale data and work the
experiment I mentioned and see if there is any real
linguistic effect at play here or just some result of
data mangling.


>   > we get before we say it is really binomial?
> 
> Good question. 
> 
> For starters, we could suppose that the original
[....massive snipping...]
> Then we burn some
> incense, and invoke the Bayes Oracle...
> 
> Anyone volunteers to carry out this analysis?  That
> could get the VMS
> out of the "News" section of Nature and into the
> "Research" section...

Maybe we should burn the incense first. I'll kick this
idea around a bit more when I have a fresher brain. 

Eric



		
__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail 
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list