[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: meaning of entropy

To: vms-list@xxxxxxxxxxx
Subject: VMs: meaning of entropy
From: "Rafal T. Prinke" <rafalp@xxxxxxxxxx>
Date: Thu, 14 Aug 2003 16:46:34 +0200
References: <20030814101912.4212.qmail@web40408.mail.yahoo.com>
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx

Rene Zandbergen wrote:

> When running the entropy calculations, you should
> find almost identical values for (h0,) h1 and h2,
> but a high value for h3 (which can hardly be
> calculated reliably anyway).

As entropy has been discussed so much on the list recently,
I returned to my attempt to understand its various
terms. Last time I tried that (in February), I think
managed to get an idea of what entropy is and how it
is used for text analysis. What still buffles me are
those orders Rene mentions and Monkey calculates up
to 120th.

Am I correct in assuming that "h1" is "first order"
or otherwise predictability of the next character when
the preceding one is known (and the same for words).

Now, "h2" is the same calculated for pairs of characters,
"h3" for triplets, etc. Is that correct?

But Rene says: "Character-pair entropy is sometimes called 
second-order entropy, while the conditional single-character 
entropy is also sometimes called second-order entropy."
I do not remember seeing the distinction mentioned
in list discussions - so which is the Monkey terminology?

Finally, what is the meaning of "h0"? I know it is
"the base-2 logarithm of the number of different words 
(or characters) found" (Jacques in Monkey.doc) - but what
do the calculated values say about the text?

I have just located a fairly new set of text analysis
programs by Dmitry V. Khmelev (Toronto Univ.) - which
include "cross-entropy" between texts and some other
interesting concepts which might be helpful for VMS stats
(if only I could grasp the basics):

http://www.math.toronto.edu/dkhmelev/PROGS/tacu/index-eng.html

Best regards,

Rafal
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

Follow-Ups:
- Re: VMs: meaning of entropy
  - From: Rene Zandbergen

References:
- Re: VMs: algorithm to generate VMS like text
  - From: Rene Zandbergen

Prev by Date: RE: VMs: algorithm to generate VMS like text
Next by Date: Re: VMs: meaning of entropy
Previous by thread: Re: VMs: algorithm to generate VMS like text
Next by thread: Re: VMs: meaning of entropy
Index(es):
- Date
- Thread