[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: VMs: No stats no fun
Hi John,
At 08:38 09/08/2003 -0400, John Grove wrote:
You found recently that lots of words 'could' start with the pair
'dy' if
you chopped up the transcription at the appearance of 'e' or 'ee'. This
really is not new news - it's just a different way of saying that lots of
words in the VMS token-A style end with 'dy'.No matter what transcription
set you use - you will still observe patterns like in B-token pages there
are lots of lines beginning with an 'l' character/glyph while in the A-token
pages this doesn't happen.
You could choose to ignore the line as a separate
unit for your stats, but that doesn't make this anomaly go away.
It's a fair cop - there are indeed (as you noted) a handful of patterns and
statistics which remain relevant to the kind of transcription I'm pursuing.
But it is only a handful, out of a large sack of results.
Also: note that once you tokenise out <ol> and <al> pairs from the
balneological section, you find freestanding <l>s appearing within the
lines (such as in <qol>s). But generalisations about language differences
need hard data: in the [Currier A, I believe] page f45r, for example, there
are three line-start <l>s, but no free-standing <l>s to be seen (FWIW I
think that the <l> on the last line is part of a split-up <ol> pair).
If the
author used some sort of pairing as you suggest - why does the alignment on
line starts in B-token pages differ so dramatically from those on A-token
pages.
Can you please explain what you mean by "alignment"? (Thanks!)
If we assume that there is no difference between A & B, and that the
statistical distinctions that have been noted to date is purely coincidence
then we still have to develop a reason for the patterning to show itself.
I'm not claiming that the statistics generated to date are meaningless or
irrelevant - any theory (whether analytical or generative) would need to
explain (or forensically reproduce) them, just as much as any other
statistic. What I *am* claiming is that the statistical ambiguities
observed may have arisen from the process of looking for primary properties
amongst (largely) secondary artefacts - where any faint signal would likely
be swamped by noise.
That's not so much "coincidence" as the result of (I suspect) conscious
misdirection on the original author's part. I believe we need to give him
more credit... :-)
Cheers, .....Nick Pelling.....
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list