[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Gordon Rugg's experiment



Hi,

I did some analysis of the texts that are available at Gordon Rugg's
site, and of other material which he kindly sent me. I am still
writing a detailed report, but my main objection is that the
table-and-grille method is too good. For one thing, it allows complete
control over the first 500 or so words; so one could use it to produce
a literal copy of --say-- Hamlet's monologue. Therefore, to uncover
the method's limitations one would have to analyze a much larger
sample.

A more fundamental problem is that the pseudo-Voynichese text produced
with the table-and-grille method apparently resembles the VMs text
only visually, not quantitatively. In particular, the word frequencies
are not the same. But then one can generate "monkey English" (say, by
an order-3 markov chain) that, to someone who doesn know the language,
will look similar to real English -- to the same extent. Obviously no
one would take such experiment as "evidence" that the English language
is meaningless gibberish.

This argument is even more convincing if we use a monosyllabic
language like Chinese or Vietnamese. (Their "dense" lexicons and
three-segment word structure make such languages excellent "monkey
food".) One of these two samples below is real Vietnamese (in the VIQR
encoding), the other is pseudo-Vietnamese generated by the
table-and-grille method: Can you tell which is which?

  ngu+o+`i la^.p no+i o+? nga`i giu+~a hai vai ngu+o+`i ngu+o+`i chu'c
  ve^` gio^ se'p ra(`ng xu+' ngu+o+`i ddu+o+.c ddu+'c gie^ ho^ va ban
  phu+o+'c tu+` tro+`i nga`i gia'ng xuo^'ng cho ngu+o+`i a^n tu+' ra^'t
  ba'u la` su+o+ng mo'c nhu+~ng suo^'i cu?a vu+.c tha(?m co' nu+o+'c sa^u
  nhu+~ng hue^ lo+.i qui' nhu+'t cu?a ma(.t tro+`i hoa qua? cu+.c ba'u
  cu?a ma(.t tra(ng nhu+~ng va^.t nhu+'t ha.ng cu?a nu'i xu+a ca'c ba'u

  ngu+o+`i dda~ ddu+o+.c tie^n tri y sa y a no'i dde^'n ra(`ng tie^'ng
  cu?a ngu+o+`i ho^ trong sa ma.c ha~y do.n ddu+o+`ng chu'a ha~y ba.t
  lo^'i ngu+o+`i ddi o^ng gio^ an na`y co' a'o lo^ng la.c dda` ngang
  lu+ng thi` tha('t xie^m ba(`ng da thu' va^.t co`n thu+'c a(n cu?a
  o^ng la` cha^u cha^'u va` ma^.t ong da.i ba^'y gio+` gie^ ru sa lem
  va` ca? xu+' giu dde^ va` kha('p vu`ng gia'p ca^.n so^ng gio^ ddanh

(It easy, if you know what to look for, or if you know any Vietnamese,
or if you ask Google. But of course none of that applies to the
VMs...)

Finally, if we are to believe that the VMs author intentionally
adjusted the table and grille to make his pseudo-text look like
natural language, Zipf plot and all, then why would he try to make it
so unlike any European or Semitic language -- but so similar to an
East Asian one (complete with doubled and tripled words)?

So, my conclusion is that Gordon Rugg's table-and-grille is a nice
idea, but it only shows that someone like Kelley could have been
able to run a "mechanical monkey" efficiently. Until one can find a
table/grille set that reasonably matches the *frequencies* of the VMs
words (and word pairs), that does not increase significantly the
probability of the VMs being a hoax.

All the best,

--stolfi
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list