[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: VMs: Image Source, Accuracy of Transcriptions
I think Glen's point is that if "ii' is really "u" and 'iin' is 'w' or something else then a character frequency count in EVA would give way too high an occurrence of 'i' (which it does).
I have seen Latin texts that have faded to the point where words like "nismes" (the place) looks like 'iiiiiiiiii' Now that certainly would screw around with statistics!
The difference between Glen and I is that I am attempting(!) to get a character frequency and he is looking at sequence frequency (ie using Currier which gives iiin' as one glyph). Both are valid as long as you understand a) what individual glyphs are in the first case, and b) that one glyph may be a sequence of characters in the second case.
Being as most 'iii' sequences end in 'n' he has a good point. It is safer to use one unified collection than to break it up into possibly too many parts.
What I want to find is the happy medium. And that, my friends is on the list of things to do...
******************************
Larry Roux
Syracuse University
lroux@xxxxxxx
*******************************
>>> jguy@xxxxxxxxxxxxxxxx 08/31/03 01:22PM >>>
31/08/2003 12:58:43 PM, "Larry Roux" <LRoux@xxxxxxx> wrote:
>I agree with you that EVA is not the best font to use for
statistics
I don't know what this hoo-ha is about transcription systems.
The one criterion is: is the transcription lossy?
Answer: yes, of course.
Next question: how lossy?
Now that is the only important question.
The business of "is <in> one glyph or two?" is
irrelevant. It is like complaining about German <sch>
(or French <gn> and Italian <gn> and <gli>). To
process them as a single unit each, instead of two
or three, is trivial.
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list