[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Worry - information loss in transcription - pictures ...
your E-mail is potentially interesting, but I can't
quite follow it.
> * Entropy of EVA = 221899 x 4.0 = 887596.00 bits
> * Entropy of simple glyphs (+ ee) = 198098 x 4.08 =
> 808239.84 bits
> * Entropy of pair transcription (+ ee) = 155349 x
> 4.36 = 677321.64 bits
What's the 4.0 mean? And what about the 4.08?
4.0 / 4.08 / 4.36 are the h1 values (ie, the average number of bits per
token) for each transcription. So, multiplying that figure by the number of
token instances gives the (context-free) total size (in bits) of each
transcription. Because the transcription changes the token count, it's
important here to show the comparison in absolute terms (ie, number of
bits) rather than in relative terms (ie, number of bits per token).
You're looking at single-character entropy, which
is a bit on the low side for the VMs, but it's
the pair entropy (or the conditional single-
character entropy) which is really anomalous.
That's next on my list... :-)
> And isn't it strange how <o> and <y> are so common,
> yet so very rarely
> occur beside each other? Glyph transcription + ee +
> oy + yo ==> (oy = 0.07%
> and yo = 0.05%).
This is precisely the origin of the low pair entropy.
I'm comfortable with <o> acting as a kind of "shift" character (because of
or/ol/ok/ot etc) - even though that still fails to explain a large
percentage of occurrences of <o>, but not quite so comfortable about
positing the same thing for <y>. I wouldn't say these *are* the origin of
the low pair entropy so much as they *point towards* the origin of it - but
it'll take a bit of work to figure out what that origin is...
Cheers, .....Nick Pelling.....
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: