[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LSC and the VMS
> Mark's LSC tests applied to the VMS give results typical of natural
> languages, and quite different from those of monkey text.
> This is very good news, at least for those of us who still believe
> that there is a text in there to be read. As for myself, I have
> remarked several times in the past that the distribution of words in
> the VMS seemed to be far from uniform; it is nice to see that vague
> feeling turned into a quantitative measurement.
> Unfortunately, even this powerful test still leaves some room
> for doubt.
> For one thing, while the LSC can unmask ordinary monkeys, it too can
> be fooled with relative ease, once one realizes how it works....
This is my feeling exactly (although I am not yet sure about _how_
easy it would be). In order to have real, strong evidence that the
VMs contains meaningful text, we need to know how one can create a
'meaningless' text that still exhibits the same properties as meaningful
text. More to the point: we need to find a mechanism that could have been
applied 400-500 years ago.
Jacques already pointed out that we don't actually know how to define
meaningful and meaningless. This may well prove to be a serious problem.
When trying to generate meaningless texts which the LSC would classify
as mneaningful, or vice versa, we're likely to end up in the no-man's
land bordering on the two....
Take a meaningful text and start removing words (every 10th, every 2nd,
at random...). When does the text stop being meaningful?
How does the LSC curve behave?
I'll take a bit more time replying to Stolfi's earlier post.
Important is that the LSC test identifies texts as meaningful if for
medium-size chunks the correlation between letter frequencies is *higher*
than for random texts, while for longer chunks this correlation is actually