VMs: Work on the relation penstroke -> letters?

Hello everyone,

It occured to me lately that people appear to be taking the current VM 
transcription schemes for granted. Based on the transcription, there is a 
tremenduous amount of cunning work being done on the properties, statistics 
etc. of the text.

But upon a closer look I found it very difficult to uniquely identify what 
would be a "letter" (ie, the smallest independent unit of information) in the 
MS, and what would just be the penstrokes which constitute it.

For example, the "iiiv"-sequence could really be four letters, "iiv" could be 
three. But at the same time, the "i"s could be used like we use arcs in writing 
latin letters, so "iii" might be "m", and "ii" might be "n", thus the sequences 
would be two letters both times. Or in both cases the "i"s really belong to 
the "v", and the whole sequence is just a single letter every time.

I understand that this would make a huge difference on the evaluation of the 
text. For example, I found if you're really rigorous, you can cut down the 
number of different symbols to 10 or so -- things like word length or 
repetivity (repetitiouness? ;-) would heavily depend on it.

Yet most people seem to take the current tanscription schemes for granted, and 
only give a fleeting glance to this question which I feel is very basic and 
fundamental. So, did I miss research which clearly answered that question, or 
are people simply taking the transcription for granted, since it's easier to 
tackle with the statistical apparatus we have?



P.S.: I latly checked a few MS in gothic/late medieval handwriting, where 
people did exactly the same -- compose letters from a set of only a few 
different strokes, which is what brought me to this idea.

