[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Re: Re: Word distribution



My point was that the occurance counts of individual words was on the low
side not how many low word counts occured in total. There are more words
with low occurances than with high. Sorry for any confusion.

Jeff

P.S. I am now finding evidence that might vindicate other assumptions made
by Nick.

----- Original Message -----
From: "Gabriel Landini" <G.Landini@xxxxxxxxxx>
To: <vms-list@xxxxxxxxxxx>
Sent: 07 March 2004 15:54
Subject: Re: VMs: Re: Re: Word distribution


> On Sunday 07 March 2004 1:15 am, Jeff wrote:
>
> > In my own opinion I believe Nick is 100% correct here so beware! Words
> > instance counts ARE unusually low.
>
> I thought that Nick was saying that the frequency=1 words were unusually
> high...
>
> The nature of the data in the Zipf's plots (that is, the ranks and the
> frequencies are discrete quantities) and the plot in log-log scale result
in a
> situation in which small variations in the slope make big differences in
the
> two quantitites plotted at the extremes (when looked at in a linear
scale).
>
> So per haps we should not be surprised if the actual counts of unique
words in
> the vms is higher or lower than other language samples. What one should
> probably compare across languages is in the *order of magnitude* of the
> rank_n counts in similarly sized texts (i.e. the logarithm of the counts
for
> a particular rank or a range of ranks).
> Zipf's law is based on this precise detail, so even if the number of
single
> words is (let's say) double of what one would find in another languages,
it
> still would not affect the plot that much as to consider it not-"Zipfian".
> Remember that this is a law that concerns the inter-relation of the
> frequencies, not a single particular frequency.
>
> Cheers
>
> Gabriel
>
> ______________________________________________________________________
> To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
> unsubscribe vms-list

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list