[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Benchmark transcription file
Hi Bruce,
That's why I think it makes sense to have a "variorum" version like the EVA
interlinear file and a "benchmark" file for analysis. (Of course, if you have
serious disagreements in interpretation, you could have more than one
benchmark
file, but something less than a different one for each analysis/analyzer
would be
nice).
For a benchmark file, the EVA transcription style is well enough thought
out that you can remap to other transcriptions from it without huge
difficulty: so that's pretty much OK.
The interlinear file is a set more of interleaved interpretations than of
interleaved transcriptions.
The only question, then, is to what degree (and at what stage) do we
resolve ambiguous characters between those interpretations in order to
create our benchmark file?
We could introduce a mark into the benchmark (like "~") to indicate "next
character is ambiguous" - some scripts/filters/rules could then include it
in, others could remap it to "*".
Or - if we still have access to the OCR scans separated by character - we
could simply vote on the most contentious ones to form a unified consensus?
Or we could simply agree which single transcription to lock to (and just
get on with it)?
Cheers, .....Nick Pelling.....