[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: Worry - information loss in transcription - pictures ...
- To: vms-list@xxxxxxxxxxx
- Subject: Re: VMs: Worry - information loss in transcription - pictures ...
- From: Gabriel Landini <G.Landini@xxxxxxxxxx>
- Date: Mon, 1 Sep 2003 10:39:46 +0100
- In-reply-to: <006901c36e68$db9a9120$9600000a@lan>
- Organization: The University of Birmingham, UK
- References: <DJEOJIIHHOEOJPMMOPKDOEOCCEAA.glenclaston@comcast.net> <006901c36e68$db9a9120$9600000a@lan>
- Reply-to: vms-list@xxxxxxxxxxx
- Sender: owner-vms-list@xxxxxxxxxxx
- User-agent: KMail/1.5.3
On Friday 29 August 2003 21:05, PK#01 wrote:
> GC writes - in his well stated opinion :
> > I predicted that if EVA took center-stage over efforts to improve earlier
> > transcriptions, it had the potential to misguide the thinking of an
> > entire generation of VMS enthusiasts who were never exposed to earlier
> > transcriptions before coming in contact with EVA.
> >
> > And
> > forget Stolfi's interlinear - he's ripped the heart out of every other
> > researcher by transliterating their works into EVA.
>
> I dared to study the text itself, but I fear GC has a point here and it
> would be bad just to skip over it. What do the other "old timers" think of
> this comment.
You asked! ;-)
I am surprised that there is still so much misunderstanding about
transcription alphabets.
EVA was made for transcriptional purposes because there were too many things
being left out by the previous alphabets (except Frogguy, but Frogguy is [to
me] a bit more difficult to handle while transcribing). I particularly found
EVA easier to read and remember than any other alphabet, but I may be biased
:-)
For some, it seems to have gone unnoticed that both FSG and Currier
transcriptions are *non-lossy* when translated into EVA. There are no
characters or combinations in those original transcriptions that could not be
represented in EVA.
This means that the FSG and Currier transcriptions can be translated into EVA
and back without having lost anything. The comment above about the Stolfi's
interlinear file is, therefore, incorrect.
FSG did not support Currier's 6 and 7, and Currier does not support EVA b, n,
u, v, z or '. This means that EVA sees even more characters than FSG or
Currier. I see that as a clear advantage over FSG and Currier.
If an EVA transcription (let's say Takahashi's) needs to be translated into
FSG, then FSG coding is lossy (it cannot understand some EVA characters).
Same with Currier's.
Neither Currier nor FSG transcriptions can write combinations like (ith),
d(a')iin, (ai)r, (q'o) or the semi-gallows spanning several characters. One
has to think that those instances must have been either transcribed wrongly,
ignored or read as something else.
While I agree that the <in>, <iin>, <iiin> groups *may* be single characters,
one gets things like:
f77r.25 ...oiiiin
and
f1r.4 ...okan
which make me feel far from sure whether this sequence of <i> strokes is 1, 2,
3 or 4 characters and whether <n> is really part of it.
In any case, if one wants to translate from EVA to FSG (or Currier), then
Bitrans will do it and even show what cannot be translated (because FSG uses
uppercase letters and EVA lowercase). This means that lower case letters in
the Bitrans output is what the FSG group could not have transcribed exactly.
Some may think that there are new details that need to be represented (for
instance the atypical <r> and <s> which do not seem to be one or the other).
In that case, obviously, EVA falls short. In the interlinear file which we are
working on with Rene these instances are marked with [r|s] meaning that it
may be one or another, so if one comes up with a 3rd option (i.e. a new
character), then it is a matter of searching for [r|s] and correct only those
instances. This saves an incredible amount of time.
A similar story is the plume variations in the <sh> group. We looked into this
long ago and thought that such variations seem to be a continuum and choosing
the style of plume quickly becomes a subjective exercise, specially when
reading from bad copies of the ms. Still one could search for those instances
<sh> and correct only those.
However if the issue is whether <iii> , <cth>, etc. should be represented as
1, 2 or 3 characters, then the solution is trivial because we can do analysis
in whatever agglomerated scheme we want just by using Bitrans.
The Curva and Gava alphabets are examples of agglomerating schemes that merge
several groups into new characters. Anybody can modify and add their own
preferences with Bitrans. Indeed, it takes much less time to write a Bitrans
table than transcribing a few lines of the ms.
Funnily, in these agglomerated alphabets, many statistics remain similar or
even the same.
For example, the low entropy is still noticeable (for those who do not believe
it, just check the counting commas pages).
Character counts are obviously different, as there will be new "letters" and
word lengths become even shorter in these agglomerations, but the observed
Zipf's laws and *word* entropies remain completely unaffected.
Sorry for the lengthy post.
Cheers,
Gabriel
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list