[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: excessive frequency of doubles...



Zitat von Marke Fincher <markefincher@xxxxxxxxxxxxxxxxxxxxx>:

> 
> In the figures below you can see that the actual number of doubled 
> words is in many cases way beyond what you should expect if the 
> words were created independently by a random process.
> 

Uhm... yes, but this is to be expected.

It is just reasonable to assume that some events happen more frequently than 
average, as much as some events will be more rare than average.

All of your events are distinguished by a very low number of occurences; most 
higher-than-expected values are due to a single occurence of a word doubling, 
and the dramatically high values at the beginning of the table are due to low 
overall frequencies of the words in question: The top 7 entries of your list 
are held by words which occur less than six times in the VM, and happen to be 
doubled exactly _once_ each. How statistically significant is this?

To really get meaning out of your tables it'd be necessary to either check also 
the other end of the spectrum (are there words which are doubled _less_ often 
than expected?), or limit yourself to frequent words.

Only when you get a lot of events, they begin to become statistically 
meaningful. (One of my teachers used to say, "Statistics begins with 3.")

(Don't get me wrong: I also think that the VM is non-random text. It's just, 
your numbers don't really support the non-random assumption. And actually I'm 
quite surprised that the apparent word-doubling of the VM doesn't stand out 
more prominently from the statistics.)

Cheers,

   E.


-------------------------------------------------
debitel.net Webmail
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list