[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LSC sums for monkey texts

First, I submit that, from LSC viewpoint, there is no principal difference
between monkey texts on the one hand and permuted texts on the other.
First order monkey behaves practically like a letter-permuted text, n
order monkey behaves like a text permuted in n-tuple chunks, to all
intents and purposes. In both cases the stock of letters available is
limited to that in the original meaningful text (what we call identity
permutation).  Therefore the statistics in both cases is that without
replacement.   In case of LSC it does not make a quanitattive difference
because the LSC expected sum differs between <with replacement> (that is
multinomial distribution) and <without replacement> (that is
hypergeometric distribution) only by a factor of L/(L-1) where L is the
length of the text expressed in number of letters.  If L>>1 the difference
is negligible.  There is no reason to believe that the measured sums will
differ to a much larger extent. Therefore I submit that the behavior of
monkey texts can be reasonably foreseen from the data for permuted texts.
These data showed that LSC distinguishes quite well between original
meaningful text and its permutations (letter- words- , and verses
permutations alike). The preliminary results by Rene seem to confirm that
expectation. As to Rotokas, I have no idea about it, but why not just to
try LSC on it?  Rene has now a program which, as we have verified,
measures LSC sums well.  Best to all, Mark

Gabriel Landini wrote:

> On 15 Jan 00, at 6:03, Jacques Guy wrote:
> > I vaguely suspect that
> > LSC sums would distinguish between real Rotokas and
> > second-order Monkey Rotokas. Third-order and beyond,
> > I am not so sure.
> > What do you think?
> I think that the LSC depends heavily on the construction of words,
> But also think that word construction (because of Zipf's law)
> depends heavily on a sub-set of the word pool.
> Long-range correlations in codes was discussed in DNA a couple
> of years ago in very prestigious Journals like Nature and Science,
> but to date I do not think that anybody had a convincing theory or
> explanation of the meaning and validity of the results.
> If you think, really what is the relation (in any terms) of a piece of
> text which is many characters away from another? What is the
> large scale structure of a text?  That would mean that there are
> events at a small scales and also at larger scales.
> I can imagine that up to the sentence level or so there may be
> patterns or correlations (what we call grammar?), but beyond that, I
> am not sure.
> Think of a dictionnary, there may not be any structure beyond 1
> sentence or definition (still Roget's Thesaurus coforms Zipf's law for
> the more frequent words). Consequently I see no reason why there
> should be any large scale structures in texts. (I may be very wrong).
> I suggested the other day that higher-order Monkeys generate LSC
> which are closer and closer to that of the language the Monkeys
> are based on. If I understand correct, Rene's analysis seems to
> confirm that?
> I guess that the LSC could not differentiate between, let's say, an
> "order 3 word-Monkey" and a real text. (Word Monkeys generate a
> language based on the probability of words, rather than
> characters). Note that 3rd order word-Monkeys usually generate
> readable, (meaningless and most of the time hilarious) text.
> Perhaps this is worth looking into.
> Cheers,
> Gabriel