[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LSC sums for monkey texts

Gabriel Landini wrote:

> On 15 Jan 00, at 8:47, Mark Perakh wrote:
> > Therefore I submit that the behavior of monkey texts can be
> > reasonably foreseen from the data for permuted texts.
> I am not sure that I follow. It is easier to generate a n-order
> character monkey text because you store only the probabilities
> and then you only generate the "next" character.

Thanks for the comments.I did not discuss it in terms of it being easier
or harder to get. Computer does the job in both cases, the program exists,
so the question of easyiness seems to be moot.  I am not suggesting to use
permutations instead of monkey program.  My comments related only to the
question whether or not we can expect LSC to distinguish between
meaningful and monkey texts.  I believe the behavior of monkey texts from
the standpoint of LSC is expected to be quite similar to that of permuted
texts, therefore LSC is expected to work for monkeys as well as for
permutations.  I do not think LSC will distinguish between permuted and
monkey texts.  This is based of course on the assumption that the texts
are long enough so the actual frequencies of letter occurences are quite
close to their probabilities.

> How do we
> generate permuted n-plets and assure that the probabilities of the
> "plets" appearing at the boundaries of the newly permuted "plets"
> are also falling within the observed probabilities of the original
> language?

They may not.  This hardly matters for the question  whether or not LSC
will distinguish between n-plet monkey and meaningful text.

> >
> It may be, but if you consider n-order WORD monkey texts, you
> lost all original meaning while the new text still it is readable (all
> sequences of 3 words in the new text *exist* in the original text)
> and therefore some grammar remains. That is why is "readable"; in
> order 1-word monkeys are just the words in random order and
> therefore grammar is lost.
> I still suspect that LSC would not differentiate between, an
> order 3 *word*-Monkey and a real text, but of course I haven't
> tested it.

In my paper #5 there are LSC data for texts randomized in various ways.
One of them was to permute the entire verses (in Genesis) without
permuting words or letters within the verses.  Each verse contained
considerably more than 3 words. The LSC sum for such permuted text is
quite clearly different from the non-permuted meaningful text. The
difference is at relatively large chunk sizes, while at small n the sum
behaves very similar to a meaningful text. Of course it was expected
because at small n when the chunk's size is smaller than the verse's size,
each verse preserves its meaning so from the standoint of LSC it is just
another meaningful text.  When n is larger than the average verse's size,
shuffling the verses kills the long range order inherent in the meaningful
contents, and the LSC immediately reveals that.  N-order word monkey is
not principally different from a verse-shuffled text, from  the standpoint
of LSC.  Therefore I expect that LSC will show the difference between
n-order word monkey at chunk's size exceeding the order of monkey (which
for orders such as 3 and 4 is a rather small chunk's size n).As to the
letter monkeys, it is evem more reasonable to expect, as the difference
would be revealed already at rather small n.  Of course I can be wrong, so
let us wait until Rene obtains the data.
Cheers, Mark