My apologies to Brett for not replying sooner. This is one
that got away. I had so many mailing list replies that some were
left unread. I am writing a program to extract the lines from the files. I
am getting statistics that support my theory already. When I initially said "I
think I've cracked the Voynich" that's all I could say at the time. I wasn't
certain. Now I am becoming more certain by the day. However, there are an
astronomical number of variations that can be used. I have devised several ways
of determining which is the correct one and will be starting program development
to prove (or disprove) my claims very shortly. If nothing else it will produce
statistics and methods that might finally nail it down. I will present the
methods as soon as I have hard evidence. Thanks again for the offer Brett.
:-)
Hi Jeff,
if you want the text (I am thinking of the interlinear version 1.7) minus
starting columns and comments, you could use grep, if you have access to a
unix box to generate the text, I am willing to do this if required and email
the file to you. If you want me to do this can you also say whether you want
the currier or fsg text? Either currier or fsg will have the * for
undecipherable character and the OR stuff like [a|b] where this means that the
character could be interpreted as a OR b, would you want these left in? I
normally work on the basis of, If there are two choices for a character then
pick the first on the assumption that I will be right 50% of the time. I
assume that you would want the spacing characters removed as well.
Regards,
Brett
Jeff <jeff@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
I
have a couple of questions for anyone with any knowledge of the following
subjects.
1. Has a general dictionary of word forms been created by
anyone. A sort of Oxford Voynich dictionary, for want of a better
phrase.
2. Have individual sections of the manuscript had their own
dictionaries created to use as a comparison with the other sections for
word frequency.
This is exactly what I would do. Then the frequency
of word patterns from each section could be cross matched for frequency
against the other sections. This way any topic specific word forms
could be anchored to the section topic. If all word forms in the
dictionaries are evenly distributed across topic sections then the
probability is high that the content is a hoax and is meaningless.
Certain word forms such as the definite and indefinite articles would be
evenly distributed across topics whereas nouns such as petal, leaf or
stem would appear more regularly in the herbal section.
Has anyone
thought of construction such a dictionary?
Also, does anyone know
where the bare text can be obtained minus the notes etc? I want to write
a syntax parser that does not rely on determination of meaning, but
rather word correlation and placement between sections. I would also be
willing to build the various dictionaries if they do
not already exist.
regards
Jeff
Haley.
______________________________________________________________________ To
unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body
saying: unsubscribe vms-list
Yahoo! Plus - For a better Internet experience!
|