[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: meaning of entropy
--- "Rafal T. Prinke" <rafalp@xxxxxxxxxx> wrote:
(about entropy)
> What still buffles me are those orders Rene
> mentions and Monkey calculates up to 120th.
Well, don't go beyond 3rd to be safe :-)
> Am I correct in assuming that "h1" is "first order"
> or otherwise predictability of the next character
> when the preceding one is known (and the same for
> words).
Actually, h1 is the single-character entropy,
independent of context (independent of the
preceding ones). If all 26 characters are equally
frequent, this equals 2log(26). That, in fact,
is sometimes witten as h0, which is simply the
theoretical upper limit of h1.
What you describe above is the conditional single-
character entropy, which in MONKEY is called h2.
It can be calculated as the entropy of character
pairs minus that of characters.
h3 as calculated by monkey is also a conditional
single-character entropy, but assumes that the
preceding 2 characters are know. It can be
calculated as the difference between entropy of
character triplets and that of pairs. In general,
the more previous characters are given, the more
'constrained' the next one will be, and this
causes the value to reduce as order increases.
The problem with higher orders is that in order to
get proper statistics of longer sequences, one
needs an extraordinarily long text.
Other practical problems exist, e.g. w.r.t.
word spaces. As from third order, the 'dependence'
will also start including word-bridging character
pairs. This may not be the right thing.
> But Rene says: "Character-pair entropy is sometimes
> called
> second-order entropy, while the conditional
> single-character
> entropy is also sometimes called second-order
> entropy."
> I do not remember seeing the distinction mentioned
> in list discussions - so which is the Monkey
> terminology?
MONKEY outputs conditional values.
Hope that clarifies, Rene
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list