--- Nick Pelling <incoming@xxxxxxxxxxxxxxxxx> wrote:

> Also: I should say that was more thinking more in
> terms of HTML 
> (specifically DHTML) than pdf's. It should be easy
> to devise a way for 
> JavaScript to generate (and display) the line/word
> numbers automatically 
> for a given lump of text. 
> But for what you're trying to do, I think it would
> make an *awful* lot of 
> sense to use XML, as John Grove suggested last
> summer.
> For the uninitiated, an XML version of a VMS
> transcription might well look 
> something like this:
> <?xml version='1.0' encoding='windows-1252'
> standalone='yes'?>
 [ ... snipped ... ]

Two observations:

In the end, it depends on how one wants to use a 
transcription file. The XML can be generated
automatically from a straight text file,
just like GC's PDF file is a product of some tool
that reads something from either an ASCII file
or some kind of database table.
The requirements coming from the desire to
visualise the page in VMs-lookalike form
are completely different from the requierements
to allow automated text processing.
So it is just my old-fashioned opinion that the
'meat' is the transcribed text and the file format
is 'dressing'.

The problem with the item 'word' is that it is
not uniquely defined. Since lines are not too 
long, I am in favour of keeping that as the
smallest unit (actually: the definiton of a 
locus as it is used in the interlinear files of
Gabriel and Stolfi).

Cheers, Rene

