[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Another explanation for dain daiin...
>Codebooks aside, this may (if true!) be the earliest recorded instance of
>data-compression.
Not unless the VMS is older than we think. There is a squiggle in
Chinese which means "repeat the previous character". There is one
in Japanese which means "repeat the previous syllable" and another
which means "repeat the previous word". And the modern "simplified
characters" are nothing but an adaptation of the cursive script
(aka caoshu, or ts'ao-shu). Come to think of it, Ancient Egyptian
hieratic and demotic are likewise "data-compressed" hieroglyphs.
And this brings to my mind the Tironian notes, which were the
granddaddy of shorthand. Closer to us, you often see a "2" in
Indonesian and in Malay, which just means "repeat the previous
word." It comes in handy when you have to write "mahasiswa-mahasiswa,
mahasiswi-mahasiswi" (mahasiswa is a university student, male, the
other one is ditto, female; it's borrowed from Sanskrit, of course).
Now:
>In a text stream (as opposed to a data stream), "copy(-1)" would be very
>very rare [ insert self-referential joke here :-) ], but "copy(-2)" and
>"copy(-3)" would be very common indeed.
Hmmm... this has to be tested on real texts. Not difficult at all, really,
and I might very well yield to the temptation. But first I must write up
that article I have in mind about the "Apai" recitation. Here is a sample:
46 .... .... ..... .... .... ...e oho te nauai e rai te
47 nauau nauai kino noho ava-ava tauake te kete irnuga te niu
48 ei ia hoa ko ni ni ei ia hoa o Rionou tona koake matone uake
49 te nauai e oho te nauai e rai te nauai nauai nauai kino nohi
50 ava ava taua kate kete iringa te niu
It looks like gibberish, doesn't it? The transliteration is tainted.
(The printers have confused handwritten u's and n's for instance)
You have almost exactly the same text repeated twice. I think
I have cracked it (the repeated text), but this almost besides the
point. What is relevant is how I tried to crack it.
Before signing off, a prophecy: copy(-1), copy(-2) and copy(-3)
will turn out to be equally common (in a system that uses that),
except that, in many languages (Malay for instance), copy(-1) will
be the most common by far. And in many languages, copy(-2) will be second
most common. E.g. "Te Pito te Henua", the name of Easter Island
which, according to some, means "The Navel of the World" and
according to others "The Navel and the Womb". Fortunately, this
is easily testable and I am sorely tempted... but it is 20 past midnight.
Later today, perhaps...