[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Benchmark transcription file



Hi Bruce,

That's why I think it makes sense to have a "variorum" version like the EVA
interlinear file and a "benchmark" file for analysis. (Of course, if you have
serious disagreements in interpretation, you could have more than one benchmark
file, but something less than a different one for each analysis/analyzer would be
nice).

For a benchmark file, the EVA transcription style is well enough thought out that you can remap to other transcriptions from it without huge difficulty: so that's pretty much OK.


The interlinear file is a set more of interleaved interpretations than of interleaved transcriptions.

The only question, then, is to what degree (and at what stage) do we resolve ambiguous characters between those interpretations in order to create our benchmark file?

We could introduce a mark into the benchmark (like "~") to indicate "next character is ambiguous" - some scripts/filters/rules could then include it in, others could remap it to "*".

Or - if we still have access to the OCR scans separated by character - we could simply vote on the most contentious ones to form a unified consensus?

Or we could simply agree which single transcription to lock to (and just get on with it)?

Cheers, .....Nick Pelling.....