[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Re: Re: Word distribution
On Sunday 07 March 2004 1:15 am, Jeff wrote:
> In my own opinion I believe Nick is 100% correct here so beware! Words
> instance counts ARE unusually low.
I thought that Nick was saying that the frequency=1 words were unusually
high...
The nature of the data in the Zipf's plots (that is, the ranks and the
frequencies are discrete quantities) and the plot in log-log scale result in a
situation in which small variations in the slope make big differences in the
two quantitites plotted at the extremes (when looked at in a linear scale).
So per haps we should not be surprised if the actual counts of unique words in
the vms is higher or lower than other language samples. What one should
probably compare across languages is in the *order of magnitude* of the
rank_n counts in similarly sized texts (i.e. the logarithm of the counts for
a particular rank or a range of ranks).
Zipf's law is based on this precise detail, so even if the number of single
words is (let's say) double of what one would find in another languages, it
still would not affect the plot that much as to consider it not-"Zipfian".
Remember that this is a law that concerns the inter-relation of the
frequencies, not a single particular frequency.
Cheers
Gabriel
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list