[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: excessive frequency of doubles...



Hi Marke,

At 09:22 18/08/2004 +0100, Marke Fincher wrote:
In the figures below you can see that the actual number of doubled
words is in many cases way beyond what you should expect if the
words were created independently by a random process.

At 14:52 17/08/2004 +0100, Marke Fincher wrote:
My example was for a specific word (w) with probability (1/p).
So the probability of a double of that _particular_ word (w.w) is 1/(p*p).

Actually, I think your table of z-values might even be *understating* this issue. :-o


For a word that occurs n times in an s-long sample, the probability of any given word being that word is (n/s) - but the probability of the following word also being that same word is ((n-1)/(s-1)), because you've used up one of your n instances and one of your (s samples). Asymptotically, that's very close to (n/s) - but in your table, many of the instance counts are extremely low, which would surely alter many of the numbers quite a bit?

Just a thought! :-)

Cheers, .....Nick Pelling.....


______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list