[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: meaning-less/full


Rene wrote:

What are the doubtful cases?

- Think of Jacques' telegram-style recipes. This is more towards the
meaningful side, but it could have someone very puzzled.
- Or a text in which every third word has been struck out. It is un-
grammatical but depending on the complexity of the original text, it
may be right in the middle between meaningful and meaningless.

Rene and Jorge: You my have noticed that in our work with  Brendan we had systematically measured LSC in meaningful texts from which either all
vowels or all consonants were removed.  In all cases explored, the PMP shifted to lower n and DOM decreased but the LSC curve looked typical
of meaningful texts even though it was an extreme case of your example with every third word struck out. I consider such vowelless or consonantless
texts a form of abbreviation so in my classification such texts are meaningful, although it may be hard to make sense of them, especially the consonantless
ones. In another example, I took a meaningful text and converted it into what, among the debunkers of the alleged Bible code, is called a skip-text.(a short example of such was descibed briefly in paper 2 on LSC).
Subsequently one of the debunkers, Randy Ingermanson analyzed skip texts quite thoroughly using a different statistics from LSC (partially based on chi-square). .  Despite the fact that skip texts preserve the entropy of the original texts (as they are obtained by rearranging the letters or n-grams of the original text according to a regular non-random reversible  procedure) Randy found that they largely display statistics of meaningless texts, which fits the fact that we cannot make sense of them if we don't know that they are skip-texts and what is the value of the skip. Randy's results are described in his book "Who wrote the Bible code."  That book has a very good statistical appendix on Randy's web page. On the other hand, once I asked my son to encode the Song of Hiawatha without telling me how he encoded it.  His encoding obviously
involved a high rate of compression.  LSC of the encoded texts was like that of a gibberish. Skip-texts are also actually texts encoded in a simple way, and they preserve exactly the length of the text.  My son's encoding was rather complex and shortened the texts.  I did not know what to make of it, so I did not pursue the study of encoded texts by LSC.  It can be surmized that such a study could shed some light on the mathematically definable distinction between meaningful texts and gibberish.
Another story is what we did with Brendan. He would email to me LSC data for various unknown to me texts, which he created by all kinds of reshuffling alphabets, etc, and
my task was to guess, first whether the text was meaningful or not, and if not, how it was made up.  To the first question I answered correctly in all cases, and to the second with a reasonable rate of success. This served for us as a proof of objecttivity of LSC data as well as our reasonable understandng of it.  Therefore, while I agree with you that a math definition of meaningfulness is a bird which is hard to catch, I still have a feeling that some reasonable criterion could be formulated on the base of the empirical accumulation of data.  We probably could define a crtiterion of meaningfulness as some number combining certain experimental quanitative characteristics, which number would be within certain limits for all the studied texts we recognize as meaningful and beyond those limits for all the studied texts we recognize as gibberish.  Such a crtiterion would be imperfect but still useful.  Of course  I am saying all that because of my background and experience of a physicist rather than of a mathematician or a linguist. I apologize for the disordered discussion, it is just raw ideas which all of you can dismiss if you feel so.  .
- Or take a text in which every character has been replaced by the
next one in the alphabet. Totally meaningless.
No, it is not (see below).
Yet the LSC defines it
as fully meaningful.
Yes, because it is indeed meaningful, just using an alphabet where symbol B means sound A etc. In this case LSC truthfully reports what the text actually is.
Is 'bogorodice djevo raduysia' Russian or the result of a Russian character
monkey? Before last Xmas I wouldn't have know but the LSC could have told
me (given more text, of course).
Of course LSC would tell you. Meaningfulness, in my view, is not about whether or not a reader can understand it but about whether or not the writer wrote something
meaningful from the writer's viewpoint. The question whether or not  we can understand a text has nothing to do with its being meaningful.  LSC (and I am not at all prone to vouch for its versatility or absolute reliability) discerns meaningfulness regardless of language which can be utterly unknown to us.
And this gets us to the question of the
VMs. That is as readable to me as Russian. In fact, it is more readable
to me than Arabic. The LSC classifies it as meaningful, and all the
experiments Mark has done help to reinforce the conclusion.
But could it be in the grey area above?
It certainly can be in a grey or in a striped or in a dappled area. That is why I suggest that LSC is just one of many possible tools and the more tools are used, the more we can hope to know if it is grey, or black-and-white.

The point: we're not sure what we're measuring. And that isn't the
first time in the history of the VMs, to put it mildly.
Still, as an engineer, I feel that it shouldn't stop us from experimenting.

Yes, because it is fun.  Best!  Mark