[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: low entropy text
On 4 Sep 2000, at 13:50, Rene Zandbergen wrote:
> Which brings me to the other thread: what we need is a word game
> which both reduces entropy and word length, still keeping the
> vocabulary size reasonable. That last feat may of course be
> assisted by introducing spelling variations.
Unfortunately that alone would not do.
The daiin dialect does exactly that, even pushing entropy lower than
the vms, yet I am not convinced that it is the right type of coding that
would produce a vms-like text.
I think that the fact that the vms word structure seems very much
constrained is very puzzling (Stolfi's OKOKOKO structure)
There are a few other things needing explanation as well.
Why on earth <daiin> and <aiin> are so common and sometimes
repeated or run together?
Why the <m>, <g>, <j> are so common at the end of lines and <q>
at the start of words?
Why are there sequences with characters that never appear in the
entire text, and why some relatively common letters do not appear
in those sequences?
I still think that we need to attack the labels.
My candidate word is <kydain> in folio 2r for the plant name.
It appears twice in that page, as <kydainy> in the first line, and as
<kydain> in the 2nd paragraph. It does not appear anywhere else in
the ms. I think that Stolfi did some logical filtering for finding which
words occur only in a single folio and nowhere else.
So let's assume that the "proper" Latin name of the plant is kydainy
with let's say <y> standing for "us" (given that it is already used for
marking the quire number) and the second <kydain> is the "normal
name" in whatever tongue the author uses.
At least in Spanish plants have sometimes names derived from
Latin that which differ only in the termination like gladiolus (Latin)
and gladiolo (Spanish).
Gabriel