Sukhotin algorithm for vowel recognition

Ideas relating to possible methods and systems for the translation of the Voynich text.
Forum rules
All ideas are welcome, but please be civil with each other.
User avatar
DFS346
Posts: 93
Joined: Mon Apr 05, 2021 4:12 pm
Contact:

Re: Sukhotin algorithm for vowel recognition

Post by DFS346 »

As illustrated below, my recent experiments with the Sukhotin algorithm (on my v211 transliteration) seem to identify a consistent set of vowels for the herbal section, and another broadly consistent set of vowels for the text-only, text-with-stars, cosmology and zodiac sections.

v211 most probable vowels by section.jpg
v211 most probable vowels by section.jpg (145.5 KiB) Viewed 11431 times
I provisionally designated the language of the herbal section as Language H (which is very similar to Currier’s Language A); and that of the text, cosmology and zodiac sections as Language T (which has several probable vowels in common with Currier’s Language B). The biology, pharmaceutical and astronomy sections seem to be in different languages from either A or B.

Vocabularies

Different languages have different vocabularies. But European languages, even if not closely related, have some words in common. Below is a compilation of the ten most frequent words in selected medieval European languages. We see that there are five common Latin words - de, et, in, non and per - which reappear as common words in other European languages, although pronounced differently.
Top 10 words AL BO EN FR IT LA.jpg
Top 10 words AL BO EN FR IT LA.jpg (101.35 KiB) Viewed 11411 times
Source: my analysis, based on https://www.browserling.com/tools/word-frequency

If Currier A and B are different languages, and if we seeing different languages in the thematic sections, then we should expect to see some differences, and some commonalities, in the vocabulary from one language to another, and from one section to another. As I illustrated in a previous post, we do see these differences and commonalities.

Correlations and mappings

My idea of the next step is to correlate the frequencies of the glyphs in Languages H and T with the frequencies of the letters in some presumed precursor languages (for example, medieval Italian and medieval Latin). Then it should be possible to match vowel-glyphs with vowels, and consonant-glyphs with consonants, and thereby transliterate selected pages of the Voynich manuscript into the precursor languages.

If this process yields some recognisable words in, say, Latin or Italian, we might be on the right track. If not, there are several permutations of the approach, for example:
• To use a different transliteration: that is, to redefine the glyphs. For example, in v211 I assumed that initial o, interior o and final o were different glyphs (or, more precisely, represented different precursor letters). We can recombine them.
• To swap some pairs of glyphs which have similar frequencies. For example in the herbal section, interior o is the most common glyph, accounting for 8.9 percent of all the glyphs, followed by final 9 with 8.6 percent of the glyphs. That might encourage us to map interior o to the most common letter in the precursor language (E in Italian, I in Latin). But within a given language, the letter frequencies differ from one document to another. Dante’s La Divina Commedia does not have exactly the same letter frequencies as the OVI corpus. So we have to allow ourselves flexibility in the glyph-to-letter mapping.
• To try other precursor languages. (I have already calculated the letter frequency distributions in selected medieval documents in Albanian, Bohemian, English, French and German.)

One further permutation which I am exploring is to consider that the precursor documents were in abbreviated languages. Adriano Cappelli’s Lexicon Abbreviaturarum encourages us to conjecture that the Voynich initial 9 and final 9 are abbreviation symbols (and with different functions). To this end, I developed an abbreviated version of Dante’s Monarchia (1313-14), with the following substitutions:
• initial 9 for the prefixes co-, com-, con-, cum-, cun-
• final 9 for the Latin suffixes -is, -os, -us, -um, -em, -am.

This transforms, for example, the following line of Monarchia:
• from: “non tam de propria virtute confidens, quam de lumine largitoris”
• to: “non t⁹ de propria virtute ₉fidens qu⁹ de lumine largitor⁹”

I have calculated the symbol frequencies in the abbreviated Monarchia and will see how they match up with the Voynich glyphs. More later.

Post Reply