Page 1 of 1

Dictionary Attack versus Voynich Manuscript Results -- There are TWO CIPHERS

Posted: Wed Oct 08, 2025 2:12 am
by mike@datafault.net
Hello all.

I decided to do something a bit original here and used whats called a "Dictionary brute force attack" on the text of the manuscript -- the result is that there are two ciphers.

Here's the details of how I did it and why i believe my result.

This attack worked like this:


----------------------

Strategy:
Using python and a large block of text, and a small snippet from tha block and a bunch of text-based dictionaries of words (and names just in case some of the words were names or proper nouns)

Start with potential cipher [a -> b] then replace all a's with b's, and alll other letters replace with ?'s on the small snippet
Next, compare the dictionary of an entire language (Italian, Spanish, English, Latin, and others) with the small snippet of text to find if any words exist in the dictionary which could potentially still be words given he correct cipher.

If words exist for a certain cipher set [a -> b], then proceed to add additional letters [a->b, b->c] and then check the dictionary for any potential words thata can still exist.

Continue this process until no words exist for a cihper then backtrack by removing the last letter and then trying the next letter .. so

[a->b, b->d] then [a->b, b->e] and continue onward until [z->x, x->y, etc]

Anytime a cipher is able to decode all words in a small snippet, proceed to apply it over a larger chunk of text and display the result.

-------------------------------

What i found is that italian is an amazingly close match in that it was able to find compelete word matches very quickly.

Unfortunately, this process willl take longer than the length of the universe (give or take a few billion years) and many TB of storage / RAM to proceess every cipher possibility .

Buuut what I discovered is that there are some lines which have words that do not seem to have ANY potential words the can be and then the next lines have millions of potential words that htey can be.. almsot like one line is real, then the next one or two are false.

If we look at the very end of the book, there are weird lines of text with light stars and dark stars which seem to share the same ordering as the lines which have valid sentence and those without.. at least for the ciphers i was able to test (i only did a few hundred milliion possible ciphers for each chunk of 2-3 words) and got a consistent set of words which had very very few potential words or they had a cipher which was buried extremely deeply into the alphabet structure so as to not be immediately found -- but even if i went off the ordering of the letters and random potential letters there seemed to be almost no matches for certain lines.

--------------------------------

My script that i created to brute-force dictionary attack the manuscript:
usage: decode.v13.py [-h] --dict DICT [--index {bisect,trie}] --excerpt EXCERPT [--domain {all,dict}] [--order {none,freq}] [--randomize-domain RANDOMIZE_DOMAIN] [--random-seed RANDOM_SEED]
[--checkpoint-save CHECKPOINT_SAVE] [--checkpoint-load CHECKPOINT_LOAD] [--ckpt-every-nodes CKPT_EVERY_NODES] [--ckpt-every-sec CKPT_EVERY_SEC] [--forbid-self FORBID_SELF]
[--status-every STATUS_EVERY] [--heartbeat-sec HEARTBEAT_SEC] [--print-on-full-only PRINT_ON_FULL_ONLY] [--threshold-print THRESHOLD_PRINT] [--threshold THRESHOLD] [--max-prints MAX_PRINTS]
[--quiet QUIET] [--debug DEBUG] [--chunk-size CHUNK_SIZE] [--original ORIGINAL]

Voynich permutation solver — pure backtracking (full-decode prints).

options:
-h, --help show this help message and exit
--dict DICT Path to dictionary (one word per line).
--index {bisect,trie}
Dictionary index for prefix membership: 'trie' (radix; O(len(prefix))) or 'bisect' (sorted lists + binary search). Default: trie.
--excerpt EXCERPT Path to excerpt text file.
--domain {all,dict} 'all' = 26 letters; 'dict' = only letters present in dictionary (auto-expands if too small).
--order {none,freq} Order cipher letters by encounter order or descending frequency.
--randomize-domain RANDOMIZE_DOMAIN
1 = shuffle domain (plaintext letters) order before search. Seed via --random-seed.
--random-seed RANDOM_SEED
Optional RNG seed for reproducible shuffles & runs.
--checkpoint-save CHECKPOINT_SAVE
Path to write JSON checkpoint periodically (overwritten).
--checkpoint-load CHECKPOINT_LOAD
Path to JSON checkpoint to resume from.
--ckpt-every-nodes CKPT_EVERY_NODES
Save checkpoint every N node visits (0=disabled).
--ckpt-every-sec CKPT_EVERY_SEC
Save checkpoint every ≥ seconds (0=disabled).
--forbid-self FORBID_SELF
1 = disallow c->c assignments.
--status-every STATUS_EVERY
Progress report step.
--heartbeat-sec HEARTBEAT_SEC
Minimum seconds between status refresh.
--print-on-full-only PRINT_ON_FULL_ONLY
1 = print only full decodes (default).
--threshold-print THRESHOLD_PRINT
1 = also print partials when threshold is met.
--threshold THRESHOLD
Fraction of fully-mapped valid words to trigger threshold prints.
--max-prints MAX_PRINTS
Max prints (0 = unlimited).
--quiet QUIET 1 = suppress human-readable prints, keep JSON only.
--debug DEBUG 1 = print invariants at start.
--chunk-size CHUNK_SIZE
If > 0, run in chunked mode over non-overlapping N-word chunks.
--original ORIGINAL Path to original text; when set in chunk mode, score each cipher on this file and print best decode.
----------------------------

Result from running my script with very small limits

Code: Select all

mike@node2:~/build/vm$ cat exerpt.txt 
dlar shar shar r ain sheain okain shey qokchy chckhy orain
ar al shear teey chcphy rain cphan adar aty shey qokam

Code: Select all

mike@node2:~/build/vm$ python3 ./decode.v13.py  --dict "./dictionary/italian_full.txt" --excerpt "./exerpt.txt" --domain "all" --order "none" --print-on-full-only "1" --max-prints 500 --chunk-size 6 --original ./original.txt
[debug] k=17 cipher_letters=dlarshineokyqctpm
[debug] domain=26 domain_letters=abcdefghijklmnopqrstuvwxyz
[debug] dict_size=101130
[debug] domain_letters=abcdefghijklmnopqrstuvwxyz (|D|=26)
[debug] cipher_letters=dlarshine (k=9)
[debug] words=6 excerpt_preview='dlar shar shar r ain sheain'
[debug] starting...
[prog] nodes=388,496 backtracks=388,488 prints=488 depth=8/9 rate=136,359.9/s elapsed=2.8{'d': 'd', 'l': 't', 'a': 'r', 'r': 'a', 's': 'h', 'h': 'e', 'i': 'l', 'n': 'o'}s

[debug] dict_size=142373
[debug] domain_letters=abcdefghijklmnopqrstuvwxyz (|D|=26)
[debug] cipher_letters=okainsheyqcr (k=12)
[debug] words=6 excerpt_preview='okain shey qokchy chckhy orain ar'
[debug] starting...
[prog] nodes=101,731,676 backtracks=101,731,666 prints=0 depth=10/12 rate=245,122.0/s elapsed=415.0typing.Dict[str, str]s

From the output, we ca see that "dlar shar shar r ain sheain" only scanned through 388,000 combinations before finding over 500 potential full matches.

But 'okain shey qokchy chckhy orain ar' scanned through 100,000,000 combinations and found exactly no full matches, which leads me to believe that because 'ar' is on the next line down and corrosponds to a star line that is different from the one above it, suddenly no matches exist.


If we do batches of 5 words instead of 4 words, we find that valid full match patterns are able to be found very quickly for each chunk.
Its as if there are two different ciphers at play here, which could explain why decoding has been such a headache.


---------

Output from batches of 5 words instead of 6 words, keeeping the two lines separated

Code: Select all

mike@node2:~/build/vm$ python3 ./decode.v13.py  --dict "./dictionary/italian_full.txt" --excerpt "./exerpt.txt" --domain "all" --order "none" --print-on-full-only "1" --max-prints 500 --chunk-size 5 --original ./original.txt
[debug] k=17 cipher_letters=dlarshineokyqctpm
[debug] domain=26 domain_letters=abcdefghijklmnopqrstuvwxyz
[debug] dict_size=28183
[debug] domain_letters=abcdefghijklmnopqrstuvwxyz (|D|=26)
[debug] cipher_letters=dlarshin (k=8)
[debug] words=5 excerpt_preview='dlar shar shar r ain'
[debug] starting...
[prog] nodes=0 backtracks=0 prints=0 depth=0/8 rate=0.0/s elapsed=0.1typing.Dict[str, str]s
/* found 500 matches before the status update which happens every 5 seconds */

[debug] dict_size=141728
[debug] domain_letters=abcdefghijklmnopqrstuvwxyz (|D|=26)
[debug] cipher_letters=sheainokyqc (k=11)
[debug] words=5 excerpt_preview='sheain okain shey qokchy chckhy'
[debug] starting...
[prog] nodes=43,208,668 backtracks=43,208,658 prints=500 depth=10/11 rate=193,790.1/s elapsed=223.0{'s': 'd', 'h': 'o', 'e': 'r', 'a': 'a', 'i': 't', 'n': 'e', 'o': 'i', 'k': 'n', 'y': 'u', 'q': 'c'}s
/* found 500 prints in 223 seconds in ~43,000,000 combinations */

[debug] dict_size=69426
[debug] domain_letters=abcdefghijklmnopqrstuvwxyz (|D|=26)
[debug] cipher_letters=orainlshety (k=11)
[debug] words=5 excerpt_preview='orain ar al shear teey'
[debug] starting...
[prog] nodes=0 backtracks=0 prints=0 depth=0/11 rate=0.0/s elapsed=0.3typing.Dict[str, str]s
/* found 500 prints before status update again in just 0.3 seconds */

[debug] dict_size=148789
[debug] domain_letters=abcdefghijklmnopqrstuvwxyz (|D|=26)
[debug] cipher_letters=chpyraindt (k=10)
[debug] words=5 excerpt_preview='chcphy rain cphan adar aty'
[debug] starting...
[prog] nodes=0 backtracks=0 prints=0 depth=0/10 rate=0.0/s elapsed=1.1typing.Dict[str, str]s
/* reached 500 full matches after 1.1 seconds */

[debug] dict_size=68781
[debug] domain_letters=abcdefghijklmnopqrstuvwxyz (|D|=26)
[debug] cipher_letters=sheyqokam (k=9)
[debug] words=2 excerpt_preview='shey qokam'
[debug] starting...
[prog] nodes=0 backtracks=0 prints=0 depth=0/9 rate=0.0/s elapsed=0.3typing.Dict[str, str]s

/* reached 500 matches after 0.3 seconds */

The result of the 5 word batch scan was this: an incomplete result but promsing

=== BEST CIPHER (applied to original.txt) ===
valid=126 invalid=412 total=538 ratio=0.2342 letters=11
mapping={'a': 'c', 'c': 'v', 'e': 'n', 'h': 'i', 'i': 'h', 'k': 'a', 'n': 'e', 'o': 'r', 'q': 't', 's': 'd', 'y': 'o'}

avi??o dino trache r?c?din?o tr?nno dinc? che r? ??r?o dinc?c?r?
diche vinn? che ranno ranno dio ?c? c? chhhe rao vic? cd rache oacec?
?che vi? ?dino v?io ?din?o r?nr? dino tr dc?o
?c?c? dino rdinnao tr? ?chhe vivaio rac? vin?o r?nn?o tr?c? c?c?c?o
?nhhe dinn? travi?o r?c? vin?o ?ache r?nn?o r?r? chhe r?o ?r? ?r? r?o
dche r? ?vin?o vin?o r?no vin?o oar?che r?n?o r?nno
?vir?vi?o ?nr?o r?no tr trache tr?nno ?rache r?n?o ?r?r? ?r?o?o
?c? o?nn?o vin?o tranno trache tr?r?o r?nn?c? r?n?o ??o ?vin?o
tranno ?vino trann?o trache ranno?achhe
vi??he vinvaio ?c? din?o trann?o di?o ?che dinn?o v?ir? ? ?nr? viv?ic?
r? chhe din? tr?nn?o ranr?dio tr?che ran?o vin?o r?vin?o r?ache c?d
tr?he c? vir?vind rache ?che vinno ranno r?che r?vi?o r?c? ?nhhe r?c?
dc? che ?no vin?che di?dino rano vin?o tr?nn?o trache din?o rann?c?
dche vinovinc? che vi?? d r?nn?o
?virn?c? r?n?c? c?c? r?nn?o r?? ?chhe rann?o trao ?c? c? ann?o di?o
?c? vin?o dinn?o r?c? c? ?vin?o div?io tr?no ?che r?c? r?c?c? c?c?
?che vino tranno ranno ?che ranno tr? vin?o
?vic?c?c? trano ? che r?n?o r?che ?r? rhhe r?che r?c? r?nnn?o vind c?o
?r?vino dinn?o tr?che vin?c? tr?c? c? c?c?o viv?io ?che r?no r?no ?che
vir? ann?o r? vinno ?chhe o vin?o r?c? rac? rache r?c? o?n?o tr?o ????
dch?r? dinno trache vic? tr? vi? ??che rache divaio ??c? r?viv?i?o ??o
?r? din?o dinavio trache vin?o r?c? rac?che div?io r?nno ?c? vin?o ??
?che vinn?nno ?ac? din?o trac? din?o tr?nn?o vind che che c?o dc?r ??
tran?o rache viv?io r?o din?o tranno vic?anno rano an?o vino ?c?
vir? dinao din?o tranno trann?o divaio trache r?c? vind che che c? r?
o?vino trachhe vivair? dinvir? tr?no r? vinn?o r?che ran?o tr?c?
?chhe vino trano ?din?o r?che vivaino ?ache vio ?din?o ?din?o trao ?r?
vinr? ?vino ?anno dinc? ?dic?dio tr?c?dio v?in?o ?ao vin?o r?nn?o ?vin?
v?ice vinno ?annc? ?dino vi?? ?ache vinc? chhe vi? ? ann?o ?c?chhe r?o
dc?chhe dino trache viv?io rac? ch? c??chhe rac?o
?vic??c?c? c? vaic? ?che c?r??vio ??vino di?o vind c? r?vinace ???
r?anno ?che dino tr? chhe dino r? ?vin?o ?dino trann?o vi?che r?o
dr?chhe oanno c?che dinnao trache dinno tr? vin?? c? ? c?din?
trache c? ?chhe dina rache o?dino tr?vino rachhe divaio trac?
din?o tranno trache tranno ?vino r?ano ?chhe v?ic? divaio tr?c?
tranno ?che dino ranno ?ache ? ?che vino dinvaio tv?iio tra?che
?ch?che dinna?o rhhe vinno ?anno r?anno ?vino trao ?din?o vinc? dic?
?chhe tranno ?dino trachhe viac? dino rachhe vin?o trann?o ?chhe dio
trache vino r?? che dino trache r? anno annno ?anc? r? c? ?r?
?dinno dino trano dino trache divaio vin?o c? vin?o ?vino?vio
??c? dic? dic? ? che dinche rache dino travio vivaio r?che
tr trache dinvaio trache dinache diache din?o dino trace vic?
rd??? c? c? dinc? ?nno viv?io ?che v?ice c?c? c?o dino trac?
dtr?nn?o trache divaio r? ?vinr? viao ?chhe vino tr? cac?
r?che dino trac? r?nno vio
rtrachhe c? dino trac? rac?c? ranodiv?iio r?nno rrac? r?o?o
cdche diao tr?che vivaino trano ?anvio ranno rac? vin?ac?o
doac? che ?ranno ?chevino trac? vino ?che o r?ce r?che r?o
dcdc? dino trano ranr?ce vino tr? r? vinno tr? c?c? r? ?ace
dc?c? vic? viv?io vivaio tr? che c?o

---------------------------

The attempt to scan batches of 6 words went on for hours and hours scanning tens of billlions of ciphers with no matches, which stopped when my comptuer ran out of RAM and killed the process (and the terminal window)

-----------------------

Anyone with a super computer or some super powerful something could possibly use this same strategy to decipher the true 100% match translation by using chunks of 10 or 20 words possibly or by selectively choosing specific lines so that they all share the same star type from the back of the book and potentially reveal one of the ciphers in full doing chunks of 10 to 20 words.

My computer isn't up to the challenge.. but i got things moving towards actually decoding this thing once and for all in a way that will definitely work if its a 1 to 1 cipher and a known language.

Let me know if anybody wants this script to try it out