[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: D!

Jacques Guy wrote:
> Now, if the assessment time could be brought down to
> one tenth of a second, that would only be 6 years
> and a bit. Any  volunteers? No? All right, let's
> make it *only*  8 hours a day, but you'll have to be
> at it for about 10 years, though.

Ah wait, I have a better idea. The easy way to decide if a text is in a known language is
match text chunks against a word-list of that language. Random text would only give a few
hits, but text in that language would generate >90% hits (we'll assume some unincluded

Now, write a massively parallel Net client application similar to Seti@Home or the
Internet Prime Search (see http://setiathome.ssl.berkeley.edu/ and
http://www.mersenne.org/prime.htm). Each client has a subset of dictionaries and is
issued different substitution-cipher copies of Voynich text by the central controlling
server. The search runs 24/7 automatically on millions of machines.

# servers: 1,000,000 (conservative, Seti@Home now has >2.5M users, but not all run 24/7)
Time to test one language against one data chunk: 10 milliseconds
Total languages to test: 1,000 
Tests/second/server: 100
Tests/second: 10^8
Tests/year: 3.15 * 10^14 (>300 trillion!)

I'll let everyone do the rest of the math. Even if you vastly vary
the time to test one chunk (say, 1 line of Voynich text), or vary
the number of languages, you still get quite a lot of tests.

You might make it even faster if you reduced all the languages into
numbers dictionaries, like cryppies use (they include words and
common phrases of under 10 different letters, non-repeating letters
are numbered consecutively: "apple" is 12234 while "aardvark" is
11234125). The Bletchley Park gang got great utility out of the fact
that "Heil Hitler" (1234135426) is apparently a unique numbers
string in German. 

Remember, all the client tests have to do is discard non-hits. If
it _might_ be a language you pass it back to the server for more
detailed automated, and then human, review.

The Mersenne Great Prime Search now reports they have averaged over
28 million Pentium-90 CPU-years equivalent, and expect to find 5
new prime numbers in the current search--they have discovered 4
since 1996

Hmm, this might actually be worth trying. You could actually get
enough tests to try several different transcription schemes without
severely affecting the testing time.  


==================================================== Adams Douglas,
San Diego, CA   Adams@xxxxxxxxxxx http://Adams.Douglas.net/ PGP
Public Keys: http://Adams.Douglas.net/pgpkey.txt <adamsd@xxxxxxxxxxxxx>
084E B706 E8D5 4C2E 1A43  ECE2 6B96 8018 6238 197A UTM:11S0487200
3623500 MGRS-2:11SMS872235 (100-meter)

           "Reserve your right to think, for even to 
            think wrongly is better than to not 
            think at all."
                   - Hypatia of Alexandria (c 400 CE)