[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: VMs: No stats no fun



Hi John,

At 08:38 09/08/2003 -0400, John Grove wrote:
You found recently that lots of words 'could' start with the pair 'dy' if
you chopped up the transcription at the appearance of 'e' or 'ee'. This
really is not new news - it's just a different way of saying that lots of
words in the VMS token-A style end with 'dy'.No matter what transcription
set you use - you will still observe patterns like in B-token pages there
are lots of lines beginning with an 'l' character/glyph while in the A-token
pages this doesn't happen.

You could choose to ignore the line as a separate
unit for your stats, but that doesn't make this anomaly go away.

It's a fair cop - there are indeed (as you noted) a handful of patterns and statistics which remain relevant to the kind of transcription I'm pursuing. But it is only a handful, out of a large sack of results.


Also: note that once you tokenise out <ol> and <al> pairs from the balneological section, you find freestanding <l>s appearing within the lines (such as in <qol>s). But generalisations about language differences need hard data: in the [Currier A, I believe] page f45r, for example, there are three line-start <l>s, but no free-standing <l>s to be seen (FWIW I think that the <l> on the last line is part of a split-up <ol> pair).

If the
author used some sort of pairing as you suggest - why does the alignment on
line starts in B-token pages differ so dramatically from those on A-token
pages.

Can you please explain what you mean by "alignment"? (Thanks!)


If we assume that there is no difference between A & B, and that the
statistical distinctions that have been noted to date is purely coincidence
then we still have to develop a reason for the patterning to show itself.

I'm not claiming that the statistics generated to date are meaningless or irrelevant - any theory (whether analytical or generative) would need to explain (or forensically reproduce) them, just as much as any other statistic. What I *am* claiming is that the statistical ambiguities observed may have arisen from the process of looking for primary properties amongst (largely) secondary artefacts - where any faint signal would likely be swamped by noise.


That's not so much "coincidence" as the result of (I suspect) conscious misdirection on the original author's part. I believe we need to give him more credit... :-)

Cheers, .....Nick Pelling.....


______________________________________________________________________ To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: unsubscribe vms-list