[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VMs: vms sentences
At 14:54 08/02/2004 -0500, Bruce Grant wrote:
Just as you grab a math reference book to look up a certain integral or
the value of a function, it would be nice to have a sort of "data book" of
raw VM statistics that could be referred to easily.
Examples of the types of statistics that might be included are:
- letter frequencies by page/by type of page/ by language
- references to all occurrences of repeated words or phrases.
- statistics on occurrences of gallows characters
- a word frequency list
- a KWIC index (concordance) of the VM
- word ending frequencies
Then, for example, rather than referring in general to "triple repetitions
of words" it would be easy to examine all the actual occurrences to look
for some pattern.
One problem with this is that there is still a good deal of uncertainty
over what constitutes both a glyph and an encoded token, so it may be
better to do this using a real-time (rather than a static) resource. So...
built a (reasonably funky) Voynich transcription analysis page on this
basic theme (if you haven't seen it already):
Perhaps I (or someone else) might refine this to emit a load of other
statistics, but offer the option (probably via PHP) of saving particular
statistical runs onto the server, and giving you a URL to that search to
share with others (if you did this on the command-line, as in
test.html?transcription=H&pairs=qo.ee.dy.or.ol&quires=all&... etc, you'd
probably run out of space). Just a thought. :-o
By the way, to do this it would be useful to choose one (or more)
transcription(s) of the VM as a sort of "reference text", which would be
included in the databook and used as the basis for all statistics, with
the understanding that there are legitimate differences in opinion which
would have some effect on the resulting statistics. (Without "putting a
stake in the ground" somewhere, however, it is impossible to do more than
talk in generalities.)
Bear in mind that few of the transcriptions are complete, and that opinions
on individual characters differ widely (especially on hardy perennials like
o/a/y & d/m/y etc). Also, many lines in the interlinear relate solely to
people's transcriptions of a small group of pages, and so merging between
transcriptions might introduce multiple renderings of the same patterns (on
different pages etc) - how to make these choices consistent? Overall, not
an easy task.
Then, as new theories arise, additional statistics to investigate them
could be generated and added to the "data book".
This might point to a more dynamic (and customisable) solution being
appropriate: what would the word count be if I changed all occurrences of
"oi" into "ai"? etc etc
Cheers, .....Nick Pelling.....
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying: