[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: word database and binomial distribution
--- Jorge Stolfi <stolfi@xxxxxxxxxxxxx> wrote:
> I suspect that the "second hump" of most of my
> samples -- which
> incidentally are all *literary* works -- comes
> precisely from long
> derived words like "unsympathetic",
> "disillusionment", "periodically",
> "intelligences" (all from the first couple of pages
> of "War of the
> Worlds").
Ah. Hmmm. The database doesn't seem to suffer from
that - at least not systemically on the surface. The
words "unsympathetic", "disillusionment" and
"periodically" are in the database, while
"intelligences" is not (and oddly enough, neither is
"systemically" :). There are some plurals also in
there - "voles" - but not all. I'll try to find some
more reliable, large scale data and work the
experiment I mentioned and see if there is any real
linguistic effect at play here or just some result of
data mangling.
> > we get before we say it is really binomial?
>
> Good question.
>
> For starters, we could suppose that the original
[....massive snipping...]
> Then we burn some
> incense, and invoke the Bayes Oracle...
>
> Anyone volunteers to carry out this analysis? That
> could get the VMS
> out of the "News" section of Nature and into the
> "Research" section...
Maybe we should burn the incense first. I'll kick this
idea around a bit more when I have a fresher brain.
Eric
__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list