[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Worry - information loss in transcription - pictures ...



On Friday 29 August 2003 21:05, PK#01 wrote:
> GC writes - in his well stated opinion :
> > I predicted that if EVA took center-stage over efforts to improve earlier
> > transcriptions, it had the potential to misguide the thinking of an
> > entire generation of VMS enthusiasts who were never exposed to earlier
> > transcriptions before coming in contact with EVA.
> >
> > And
> > forget Stolfi's interlinear - he's ripped the heart out of every other
> > researcher by transliterating their works into EVA.
>
> I dared to study the text itself, but I fear GC has a point here and it
> would be bad just to skip over it. What do the other "old timers" think of
> this comment.

You asked! ;-)

I am surprised that there is still so much misunderstanding about 
transcription alphabets.
EVA was made for transcriptional purposes because there were too many things 
being left out by the previous alphabets (except Frogguy, but Frogguy is [to 
me] a bit more difficult to handle while transcribing). I particularly found 
EVA easier to read and remember than any other alphabet, but I may be biased 
:-)

For some, it seems to have gone unnoticed that both FSG and Currier 
transcriptions are *non-lossy* when translated into EVA. There are no 
characters or combinations in those original transcriptions that could not be 
represented in EVA.
This means that the FSG and Currier transcriptions can be translated into EVA 
and back without having lost anything. The comment above about the Stolfi's 
interlinear file is, therefore, incorrect.

FSG did not support Currier's 6 and 7, and Currier does not support EVA b, n, 
u, v, z or '. This means that EVA sees even more characters than FSG or 
Currier. I see that as a clear advantage over FSG and Currier.

If an EVA transcription (let's say Takahashi's) needs to be translated into 
FSG, then FSG coding is lossy (it cannot understand some EVA characters).
Same with Currier's.
Neither Currier nor FSG transcriptions can write combinations like (ith), 
d(a')iin, (ai)r, (q'o) or the semi-gallows spanning several characters. One 
has to think that those instances must have been either transcribed wrongly, 
ignored or read as something else.


While I agree that the <in>, <iin>, <iiin> groups *may* be single characters, 
one gets things like:

f77r.25 ...oiiiin
and
f1r.4 ...okan

which make me feel far from sure whether this sequence of <i> strokes is 1, 2, 
3 or 4 characters and whether <n> is really part of it.

In any case, if one wants to translate from EVA to FSG (or Currier), then 
Bitrans will do it and even show what cannot be translated (because FSG uses 
uppercase letters and EVA lowercase). This means that lower case letters in 
the Bitrans output is what the FSG group could not have transcribed exactly.

Some may think that there are new details that need to be represented (for 
instance the atypical <r> and <s> which do not seem to be one or the other).
In that case, obviously, EVA falls short. In the interlinear file which we are 
working on with Rene these instances are marked with [r|s] meaning that it 
may be one or another, so if one comes up with a 3rd option (i.e. a new 
character), then it is a matter of searching for [r|s] and correct only those
instances. This saves an incredible amount of time.

A similar story is the plume variations in the <sh> group. We looked into this 
long ago and thought that such variations seem to be a continuum and choosing 
the style of plume quickly becomes a subjective exercise, specially when 
reading from bad copies of the ms. Still one could search for those instances 
<sh> and correct only those.

However if the issue is whether <iii> , <cth>, etc. should be represented as 
1, 2 or 3 characters, then the solution is trivial because we can do analysis 
in whatever agglomerated scheme we want just by using Bitrans.
The Curva and Gava alphabets are examples of agglomerating schemes that merge 
several groups into new characters. Anybody can modify and add their own 
preferences with Bitrans. Indeed, it takes much less time to write a Bitrans 
table than transcribing a few lines of the ms. 

Funnily, in these agglomerated alphabets,  many statistics remain similar or 
even the same.
For example, the low entropy is still noticeable (for those who do not believe 
it, just check the counting commas pages).
Character counts are obviously different, as there will be new "letters" and 
word lengths become even shorter in these agglomerations, but the observed 
Zipf's laws and *word* entropies remain completely unaffected.

Sorry for the lengthy post.
Cheers,

Gabriel


______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list