Thanks! That is one of the largest problems of a frequency count -
what is the underlying glyph?
In this case "c" is usually the "c" in the "ch", "cPh",
etc glyphs and most other appearances of a "c" like character are
listed as "e". But I think it would be constructive to try distinguishing
between "ii" (which in several cases I believe is 'u') and "iin" and some others
to see what the structure looks like then. The best I can say about the
current stats is that it they are consistent within themselves. The main
idea I was working from doing this was to see if the pages had a similar set of
statistics, and it is apparent that this is not always so. Hopefully that
might clue something to someone who could take it and run.
I'll go back to one of the more verbose pages and try your suggestions
in various ways to see what shakes out.
Personally, I find the word ending stats particularly fascinating.
There are relatively very few glyphs that end words. Too few in my
book. "y" has an huge, commanding lead. On one page
it is over 70% of the word endings - (though it generally averages around
30%).
The other interesting thing is that the distribution curve fits very nicely
for a monoalphabetic cipher - though we know that is very unlikely that this is
the case (unless the underlying language is either unknown or invented).
If it is monoalphabetic then there is a mixture of languages underlying
the text (or the cipher changes between pages such as most of the f20s and
f76). Whoever created this thing was a genius. Or incredibly
mad. Maybe a Mad Genius (Simon Barsinister?) <grin>.
I have another 30 pages done and will post those with some of the
suggestions you have made. I also want to run some samples of various
other languages through the program to see what comes out. I would dearly
love to find textual versions of old herbals I could cut-and-paste, but all
I have found so far are page captures.
****************************** Larry Roux Syracuse University lroux@xxxxxxx ******************************* >>> John@xxxxxxxxxxxx 08/23/03 03:06PM >>> Well... there sure was a lot to read after
my holidays!
However, I'll limit my
response to Larry's work which showed some interesting coincidence in folio 26
and 31 compared to the
folio's surrounding them. First, I think I
may have asked this before - but what's the difference between a standalone 'c'
and a standalone 'e'?
The stats seem to show the popularity of
each of these as separate... I think vladimir is right that common constructs
like 'ch', 'cph', 'cth',
etc... should be counted separately - and
any standalone 'c or e' are treated always as 'e'. Then, I'd like to see a
frequency count that includes
the frequency of 'e', 'ee', 'eee', 'eeee',
and 'i', 'ii', 'iii', 'iiii' as well. In the count, I think that when any
character is repeated it should be counted as
a whole -- that is 'eee' doesn't count as
'eee' and 'ee'+e and 'e'+'e'+'e'... it only counts as one occurrence of
'eee'.
John.
|