[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: learning
On Saturday 16 Aug 2003 11:15 pm, Nick Pelling wrote:
> Technically speaking, Monkey appears to be outputting the mean average
> expressibility (in bits) of the input stream, which is an indirect measure
> of entropy - entropy should really be expressed on a 0.0 (perfectly
> predictable) to a 1.0 (perfectly unpredictable) range (using a percentage
> is quite acceptable). However, as you have to normalise out the size of the
> alphabet, this tends to be not so useful in practice... I'll explain.
>
> (Entropy = 0.0) ==> perfectly predictable
> (Entropy = 1.0) ==> perfectly unpredictable
If you look back into the archives I wondered about this many times
(expressing the entropy value as a percentage of the maximum entropy). The
problem is that the this normalised-unpredictability really depends on the
number of different characters (the alphabet) in the source. So sources with
large alphabets can be more unpredictable (in the sense that one has more
choices) than those with smaller ones and this is of course reflected in the
value of entropy.
So if one "normalises" the value to the maximum that can be achieved with a
particular alphabet size, the same string can have a different value if the
alphabet was taken into consideration. i.e.: abbabaaabbbbabab in a system
that is *known* to have 2 states compared to a system that has 100000 states.
In the second case, the source would *seem* to be more predictable, yet the
message is the same.
Have a look at the nice posting by Jim R. about "counting commas" in the evmt
site.
> PPS: there's one final twist which entropy programs can get subtly wrong:
> you have to remember that, for an nth-order entropy, you have undefined
> values for the first (n-1) characters. It's not wildly important, but it
> tripped me up once (many years ago), so I thought I ought to share that
> too. :-)
Not only at the begining, but also at the end.
The differences are trivial if the source is long. A common workaround is to
"wrap around" the sequence, so the end and the start join.
Cheers,
Gabriel
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list