[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Benchmark transcription file

Nick Pelling wrote:

> Hi Bruce,
> At 11:00 11/08/01 -0400, Bruce Grant wrote:
> >It seems to me that it would be nice to have a sort of "benchmark
> >transcription"
> >of the VMS extracted from the EVA transcription which would have these
> >characteristics: </snip>
> Perhaps a later iteration could merge the various interleaved lines into a
> single line, and assign a fuzzy probability to each possible reading? Just
> an idea. :-)

I have always felt that there were two incompatible motives in creating a
computer-readable transcription file:

1.    To capture every scrap of information that might turn out to be useful in
interpreting the VMS (a sort of "variorum" edition).

2.    To put the information in a format that can be analyzed mechanically.

Unfortunately, the more options you leave open, the less possible a mechanical
analysis becomes. (For example, if you are unwilling to make any assumptions
about what a constitutes a character, you won't get very far in any calculation
depending on counting characters).

That's why I think it makes sense to have a "variorum" version like the EVA
interlinear file and a "benchmark" file for analysis. (Of course, if you have
serious disagreements in interpretation, you could have more than one benchmark
file, but something less than a different one for each analysis/analyzer would be