[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Worry - information loss in transcription - pictures ...

To: vms-list@xxxxxxxxxxx
Subject: Re: VMs: Worry - information loss in transcription - pictures ...
From: Nick Pelling <incoming@xxxxxxxxxxxxxxxxx>
Date: Mon, 01 Sep 2003 17:45:38 +0100
In-reply-to: <20030901115233.98306.qmail@web40412.mail.yahoo.com>
References: <5.2.1.1.0.20030901105038.036e4148@pop3.blueyonder.co.uk>
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx

Hi Rene,

your E-mail is potentially interesting, but I can't
quite follow it.

> * Entropy of EVA = 221899 x 4.0 = 887596.00 bits
> * Entropy of simple glyphs (+ ee) = 198098 x 4.08 =
> 808239.84 bits
> * Entropy of pair transcription (+ ee) = 155349 x
> 4.36 = 677321.64 bits

What's the 4.0 mean? And what about the 4.08?

4.0 / 4.08 / 4.36 are the h1 values (ie, the average number of bits per token) for each transcription. So, multiplying that figure by the number of token instances gives the (context-free) total size (in bits) of each transcription. Because the transcription changes the token count, it's important here to show the comparison in absolute terms (ie, number of bits) rather than in relative terms (ie, number of bits per token).

You're looking at single-character entropy, which
is a bit on the low side for the VMs, but it's
the pair entropy (or the conditional single-
character entropy) which is really anomalous.

That's next on my list... :-)

> And isn't it strange how <o> and <y> are so common,
> yet so very rarely
> occur beside each other? Glyph transcription + ee +
> oy + yo ==> (oy = 0.07%
> and yo = 0.05%).

This is precisely the origin of the low pair entropy.

I'm comfortable with <o> acting as a kind of "shift" character (because of or/ol/ok/ot etc) - even though that still fails to explain a large percentage of occurrences of <o>, but not quite so comfortable about positing the same thing for <y>. I wouldn't say these *are* the origin of the low pair entropy so much as they *point towards* the origin of it - but it'll take a bit of work to figure out what that origin is...

Cheers, .....Nick Pelling.....


______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

Follow-Ups:
- Re: VMs: Worry - information loss in transcription - pictures ...
  - From: Rene Zandbergen
- VMs: Siyaqat...
  - From: Nick Pelling

References:
- Re: VMs: Worry - information loss in transcription - pictures ...
  - From: Nick Pelling
- Re: VMs: Worry - information loss in transcription - pictures ...
  - From: Rene Zandbergen

Prev by Date: Re: VMs: Image Source, Accuracy of Transcriptions
Next by Date: Re: VMs: Yet another weird hypothesis ...
Previous by thread: Re: VMs: Worry - information loss in transcription - pictures ...
Next by thread: Re: VMs: Worry - information loss in transcription - pictures ...
Index(es):
- Date
- Thread