[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
What are the doubtful cases?
Rene and Jorge: You my have noticed that in our work with Brendan
we had systematically measured LSC in meaningful texts from which either
- Think of Jacques' telegram-style recipes. This is more towards the
meaningful side, but it could have someone very puzzled.
- Or a text in which every third word has been struck out. It is un-
grammatical but depending on the complexity of the original text, it
may be right in the middle between meaningful and meaningless.
vowels or all consonants were removed. In all cases explored,
the PMP shifted to lower n and DOM decreased but the LSC curve looked typical
of meaningful texts even though it was an extreme case of your example
with every third word struck out. I consider such vowelless or consonantless
texts a form of abbreviation so in my classification such texts are
meaningful, although it may be hard to make sense of them, especially the
ones. In another example, I took a meaningful text and converted it
into what, among the debunkers of the alleged Bible code, is called a skip-text.(a
short example of such was descibed briefly in paper 2 on LSC).
Subsequently one of the debunkers, Randy Ingermanson analyzed skip
texts quite thoroughly using a different statistics from LSC (partially
based on chi-square). . Despite the fact that skip texts preserve
the entropy of the original texts (as they are obtained by rearranging
the letters or n-grams of the original text according to a regular non-random
reversible procedure) Randy found that they largely display statistics
of meaningless texts, which fits the fact that we cannot make sense of
them if we don't know that they are skip-texts and what is the value of
the skip. Randy's results are described in his book "Who wrote the Bible
code." That book has a very good statistical appendix on Randy's
web page. On the other hand, once I asked my son to encode the Song of
Hiawatha without telling me how he encoded it. His encoding obviously
involved a high rate of compression. LSC of the encoded texts
was like that of a gibberish. Skip-texts are also actually texts encoded
in a simple way, and they preserve exactly the length of the text.
My son's encoding was rather complex and shortened the texts. I did
not know what to make of it, so I did not pursue the study of encoded texts
by LSC. It can be surmized that such a study could shed some light
on the mathematically definable distinction between meaningful texts and
Another story is what we did with Brendan. He would email to me LSC
data for various unknown to me texts, which he created by all kinds of
reshuffling alphabets, etc, and
my task was to guess, first whether the text was meaningful or not,
and if not, how it was made up. To the first question I answered
correctly in all cases, and to the second with a reasonable rate of success.
This served for us as a proof of objecttivity of LSC data as well as our
reasonable understandng of it. Therefore, while I agree with you
that a math definition of meaningfulness is a bird which is hard to catch,
I still have a feeling that some reasonable criterion could be formulated
on the base of the empirical accumulation of data. We probably could
define a crtiterion of meaningfulness as some number combining certain
experimental quanitative characteristics, which number would be within
certain limits for all the studied texts we recognize as meaningful and
beyond those limits for all the studied texts we recognize as gibberish.
Such a crtiterion would be imperfect but still useful. Of course
I am saying all that because of my background and experience
of a physicist rather than of a mathematician or a linguist. I apologize
for the disordered discussion, it is just raw ideas which all of you can
dismiss if you feel so. .
No, it is not (see below).
- Or take a text in which every character has been replaced by the
next one in the alphabet. Totally meaningless.
Yet the LSC defines it
Yes, because it is indeed meaningful, just using an alphabet where symbol
B means sound A etc. In this case LSC truthfully reports what the text
as fully meaningful.
Is 'bogorodice djevo raduysia' Russian or the result
of a Russian character
Of course LSC would tell you. Meaningfulness, in my view, is not about
whether or not a reader can understand it but about whether or not the
writer wrote something
monkey? Before last Xmas I wouldn't have know but the LSC could have
me (given more text, of course).
meaningful from the writer's viewpoint. The question whether or not
we can understand a text has nothing to do with its being meaningful.
LSC (and I am not at all prone to vouch for its versatility or absolute
reliability) discerns meaningfulness regardless of language which can be
utterly unknown to us.
And this gets us to the question of the
It certainly can be in a grey or in a striped or in a dappled area. That
is why I suggest that LSC is just one of many possible tools and the more
tools are used, the more we can hope to know if it is grey, or black-and-white.
VMs. That is as readable to me as Russian. In fact, it is more readable
to me than Arabic. The LSC classifies it as meaningful, and all the
experiments Mark has done help to reinforce the conclusion.
But could it be in the grey area above?
Yes, because it is fun. Best! Mark
The point: we're not sure what we're measuring. And that isn't the
first time in the history of the VMs, to put it mildly.
Still, as an engineer, I feel that it shouldn't stop us from experimenting.