[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: shorthands
On Thursday 29 January 2004 15:22, Nick Pelling wrote:
> I've already looked at a large number of shorthand histories (both in the
> BL and in other libraries with shorthand collections)
Assuming that the vms is not nonsense, the problem with shorthands alone
(despite that the words are short already and many vms characters look like
latin abbreviations) is that the entropy of the corpus should be *increased*
rather than what is observed (a decrecrease).
An entropy increase back to the level of other non-abbreviated languages does
not take place only through agglomeration of some characters (e.g. iin -> m)
because the results in Curva and Gava alphabets (that do precisely this)
still show too low entropy. ( http://web.bham.ac.uk/G.Landini/evmt/commas.htm
at the bottom of the page).
So where is the missing information?
I see two further possibilities:
a) It has been lost: the encoding is lossy (e.g. one character can replace
several possible strings). This would reduce the lexicon somehow, but I can't
imagine how to estimate this.
or
b) There is further information that we have not been able to account for. For
instance same-looking characters may be position dependent (what I mentioned
the other day, starting and ending <y> may stand for different abbreviations,
<o> is different from the <o> in <qo>, Strong/Glen <sh> plumes, and so on.
If that was the case, the text would "appear" to have low entropy when it does
not.
Cheers,
Gabriel
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list