RE: VMs: RE: provable?

I ran your scenario.  For the "master block" I took the first 2000
characters of Wycliffe, and I created a 1 MB file from it with a "window" of
8-12 characters.  The resulting text looked like this:

o the dament fro themorwetid wae bifore to d seiy thanes and tycounde d and
the dand makynnd makynge n erthe and clepide idel ande euentid and tymeis
and depathe dai d the fourt daies and the morwetthe secoundirmament of kynde
And ies and yeer shyne tho iatris tha firmament a maad and fro watris is
kynde w of watris  and derkn seed an seide Liyt Also God e niyt and rmament
and  o daie  and shymade sterppere anirmament of ament heuene and a td God
seiknessis  seide Liwas good an tho schu nd God meuene And thce and alden
deparit was don e the waaies and yy his kyndd voide ane erbe ans doon so
AGod seiy teed by hi be bifore t and the leson so And  And God sthe
firmamenand appir heuene othe God ide The watrs that wde dai Fid and the
rmament of hand schuldens kynde whoiyt be mament and nd morwetidd seiy the
lynge seedis of wao the niytt weren ide Liyt be d and morwetyt and derkne to
the daatris fro waris fro wauyte forth gthe the ert it was diy that ynge
forth ge watris tho sch ris the se erthe Fo clepide theiy that it od seide
id was maad watris the  borun onthe and  of watrisf heuene twas good a

Statistics: 24422 unique words, of which 64% occur only once

My guess (though I haven't had a chance to demonstrate it) is that a text
produced in this way will have the same relative distribution of letter
frequencies for the first letter of each word as it has for every letter of
each word.  If we are asking whether the VM was produced in this way, then
that is what I would test next.

I have put the first 150K of my resulting file online at


Brian Tawney

Brian Tawney wrote:
> RE: whopping great chains (=63!!!)I ran a simulation.  I took the Wycliffe
> bible (Middle English) and wrote a program to hop around randomly picking
> sections of text 3-5 characters long, spaces and letters included, and
> a new text out of the bits that was about 1 megabyte in size.  Then I
> counted the words and unique words in the text.  The result was that there
> were 78075 total words, of which 92% occurred only once.
> For anyone with a morbid interest, the text looks like this:
> bleir  tf swen tos cheeesenace ois poueod ofand sarijnneignacpreysperkiee
 > ...

Yes, I've been there as well after one too many Guinness'...

Seriously, this is looking very interesting, but there is one big difference
to the Fincher hypothesis: Marke started out from the premise of having a
very small set of master sequences (I guess not more than 2000 characters --
as much as you can put down on a sheet of paper), and I also think for
practical considerations the window should be broader -- something like 8 to
12 characters.

Could you re-run your simulation with those parameters and let us see the

I'd very much like to get a chunk of your resulting text (like, 150k?), so I
could let my little programs run over it and see how their properties
compare to what I found for the VM. Pretty please with a cherry on top?



