[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: VMs: vms sentences
Nice, Nick. Hope I can play with it soon!
Don
-----Original Message-----
From: owner-vms-list@xxxxxxxxxxx [mailto:owner-vms-list@xxxxxxxxxxx]On
Behalf Of Nick Pelling
Sent: Sunday, February 08, 2004 1:15 PM
To: vms-list@xxxxxxxxxxx
Subject: Re: VMs: vms sentences
Hi everyone,
At 14:54 08/02/2004 -0500, Bruce Grant wrote:
>Just as you grab a math reference book to look up a certain integral or
>the value of a function, it would be nice to have a sort of "data book" of
>raw VM statistics that could be referred to easily.
>
>Examples of the types of statistics that might be included are:
> - letter frequencies by page/by type of page/ by language
> - references to all occurrences of repeated words or phrases.
> - statistics on occurrences of gallows characters
> - a word frequency list
> - a KWIC index (concordance) of the VM
> - word ending frequencies
> etc.
>
>Then, for example, rather than referring in general to "triple repetitions
>of words" it would be easy to examine all the actual occurrences to look
>for some pattern.
One problem with this is that there is still a good deal of uncertainty
over what constitutes both a glyph and an encoded token, so it may be
better to do this using a real-time (rather than a static) resource. So...
we might consider doing much of this "live" using JavaScript. I've already
built a (reasonably funky) Voynich transcription analysis page on this
basic theme (if you haven't seen it already):
http://www.voynich.info/nickpelling/analyse.htm
Perhaps I (or someone else) might refine this to emit a load of other
statistics, but offer the option (probably via PHP) of saving particular
statistical runs onto the server, and giving you a URL to that search to
share with others (if you did this on the command-line, as in
test.html?transcription=H&pairs=qo.ee.dy.or.ol&quires=all&... etc, you'd
probably run out of space). Just a thought. :-o
>By the way, to do this it would be useful to choose one (or more)
>transcription(s) of the VM as a sort of "reference text", which would be
>included in the databook and used as the basis for all statistics, with
>the understanding that there are legitimate differences in opinion which
>would have some effect on the resulting statistics. (Without "putting a
>stake in the ground" somewhere, however, it is impossible to do more than
>talk in generalities.)
Bear in mind that few of the transcriptions are complete, and that opinions
on individual characters differ widely (especially on hardy perennials like
o/a/y & d/m/y etc). Also, many lines in the interlinear relate solely to
people's transcriptions of a small group of pages, and so merging between
transcriptions might introduce multiple renderings of the same patterns (on
different pages etc) - how to make these choices consistent? Overall, not
an easy task.
>Then, as new theories arise, additional statistics to investigate them
>could be generated and added to the "data book".
This might point to a more dynamic (and customisable) solution being
appropriate: what would the word count be if I changed all occurrences of
"oi" into "ai"? etc etc
Cheers, .....Nick Pelling.....
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list