[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Re: Re: Word distribution



On Sunday 07 March 2004 1:15 am, Jeff wrote:

> In my own opinion I believe Nick is 100% correct here so beware! Words
> instance counts ARE unusually low.

I thought that Nick was saying that the frequency=1 words were unusually 
high... 

The nature of the data in the Zipf's plots (that is, the ranks and the 
frequencies are discrete quantities) and the plot in log-log scale result in a 
situation in which small variations in the slope make big differences in the 
two quantitites plotted at the extremes (when looked at in a linear scale).
 
So per haps we should not be surprised if the actual counts of unique words in 
the vms is higher or lower than other language samples. What one should 
probably compare across languages is in the *order of magnitude* of the 
rank_n counts in similarly sized texts (i.e. the logarithm of the counts for 
a particular rank or a range of ranks).
Zipf's law is based on this precise detail, so even if the number of single 
words is (let's say) double of what one would find in another languages, it 
still would not affect the plot that much as to consider it not-"Zipfian".
Remember that this is a law that concerns the inter-relation of the 
frequencies, not a single particular frequency. 

Cheers

Gabriel

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list