[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: VMs: Image Source, Accuracy of Transcriptions
Yes, Jacques is right. It is rather of little importance what anyone uses to
describe 'their' version of the character or glyph. However, if they want to
communicate it to the group - there has to be some means to quickly
communicate which character we're talking about.
Some people tend to think there is such a thing as a 'u' while others
don't. However, if we all call it 'ii' and treat it as either 'u' or 'ii' or
part of 'iir' 'iin' 'iil' 'iij' or 'iih', 'iith', 'iiph' etc... we all know
what character we're supposed to be looking at.
I am particular to i's and c's being related to one another with the same
five ligature endings - but hey -- that doesn't matter. What matters is that
when I say mention the 'word' daiin everybody knows which word I'm talking
about whether they see it as daun or daiun or duin or whatever... glyph:
they still understand the word I'm character string that I'm looking at.
Statistics on the other hand are ofcourse problematic because they depend
heavily on the character set - even Glen's are a compiliation of statistics
based on 'his' view of what a glyph or character is. I still disagree with
the concept that 'c' is a glyph by itself -- Glen's set doesn't account for
things like the @ character ...er glyph either.
There are 'n's, and 'in's, and 'iin's that don't seem to show up in Glen's
glyphs. But, hey - everybody can have their own take on what is a character.
I think in Glen's set ther isn't a 'pure N' - it at least has one 'i' in
front of it - although in the VMs there are plenty of versions of the same
glyph/character mixed into the ever expanding stroke set of @, an, ain,
aiin, aiiin; and these aren't the only common ones followin the 'i'
either -- ar, air, aiir, aiiir also exist among the other ligature
variations to lesser degrees.
To me 'c' and 'i' are important in that they seem to be utilized to make
all the above glyph/character sets with a selected ligature to finish them
off; and the FACT that we have variations of 0 to 4 c or i strokes in a row
before a ligature seems quite interesting to me -- whether you treat two of
them as 'u' or not. I agree that statistics are certainly hard to agree upon
when we can't agree upon the character set - and counts of 'i's or 'c's in
the overall manuscript seem to some extent rather ludicrous - yes, in my
view they should be part of the character that closes them off. Gabriel has
asked the question before how someone could tell the difference between a
'ccc', 'cc + c', 'c+cc' etc... The short answer is - there is only one
character (IMHO) until one hits a closing ligature. There is no ccc as it
needs a finishing mark.... Therefore ccb, cccb, cb, or b are four different
characters to count. There is no such thing in my stroke order concepts for
a single stand-alone c or i.
That's all old news, I know. I haven't harped about the patterns I see much
lately because I certainly can not create an alphabet out of the patterns
that would work -- because too few choices are given a higher frequency over
the others - all characters would be reduced to a few consonants.
Lots of old babbling... better just quit for the time being, eh? Despite
Glen's declaration that he knows what the answer will be - I'm still not
entirely convinced that he's on any better footing than the rest of us. Keep
hunting and sharing what you have even if we all rant from time to time.
[Just don't say you've solved it - unless you really have 8-) Sorry, to
those who think they had it solved. Nobody in their right mind would agree
with you yet.]
John.
-----Original Message-----
From: owner-vms-list@xxxxxxxxxxx [mailto:owner-vms-list@xxxxxxxxxxx]On
Behalf Of Larry Roux
Sent: Sunday, August 31, 2003 2:11 PM
To: vms-list@xxxxxxxxxxx
Subject: RE: VMs: Image Source, Accuracy of Transcriptions
I think Glen's point is that if "ii' is really "u" and 'iin' is 'w' or
something else then a character frequency count in EVA would give way too
high an occurrence of 'i' (which it does).
I have seen Latin texts that have faded to the point where words like
"nismes" (the place) looks like 'iiiiiiiiii' Now that certainly would screw
around with statistics!
The difference between Glen and I is that I am attempting(!) to get a
character frequency and he is looking at sequence frequency (ie using
Currier which gives iiin' as one glyph). Both are valid as long as you
understand a) what individual glyphs are in the first case, and b) that one
glyph may be a sequence of characters in the second case.
Being as most 'iii' sequences end in 'n' he has a good point. It is safer
to use one unified collection than to break it up into possibly too many
parts.
What I want to find is the happy medium. And that, my friends is on the
list of things to do...
******************************
Larry Roux
Syracuse University
lroux@xxxxxxx
*******************************
>>> jguy@xxxxxxxxxxxxxxxx 08/31/03 01:22PM >>>
31/08/2003 12:58:43 PM, "Larry Roux" <LRoux@xxxxxxx> wrote:
>I agree with you that EVA is not the best font to use for
statistics
I don't know what this hoo-ha is about transcription systems.
The one criterion is: is the transcription lossy?
Answer: yes, of course.
Next question: how lossy?
Now that is the only important question.
The business of "is <in> one glyph or two?" is
irrelevant. It is like complaining about German <sch>
(or French <gn> and Italian <gn> and <gli>). To
process them as a single unit each, instead of two
or three, is trivial.
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list
---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.512 / Virus Database: 309 - Release Date: 19/08/2003
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.512 / Virus Database: 309 - Release Date: 19/08/2003
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list