Brian Tawney wrote:
RE: whopping great chains (=63!!!)I ran a simulation.  I took the Wycliffe
bible (Middle English) and wrote a program to hop around randomly picking up
sections of text 3-5 characters long, spaces and letters included, and build
a new text out of the bits that was about 1 megabyte in size.  Then I
counted the words and unique words in the text.  The result was that there
were 78075 total words, of which 92% occurred only once.

For anyone with a morbid interest, the text looks like this:

bleir tf swen tos cheeesenace ois poueod ofand sarijnneignacpreysperkiee
> ...

Yes, I've been there as well after one too many Guinness'...

Seriously, this is looking very interesting, but there is one big difference to the Fincher hypothesis: Marke started out from the premise of having a very small set of master sequences (I guess not more than 2000 characters -- as much as you can put down on a sheet of paper), and I also think for practical considerations the window should be broader -- something like 8 to 12 characters.

Could you re-run your simulation with those parameters and let us see the results?

I'd very much like to get a chunk of your resulting text (like, 150k?), so I could let my little programs run over it and see how their properties compare to what I found for the VM. Pretty please with a cherry on top?



