[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Entropy VS Word Length



I know this doesn't hold true for all languages, but this is
back to Occam'a Razor:

Supposing Voynich is like many of the languages that would make
logical candidates because of sheer numbers (ie, not tonal or
vowel-less), shouldn't it in comparison to them on a graph show
a similar mathematical function of the relationship between
entropy and word length?  My idea is this, the more predictable
the sequence of letters, the less value each has as a data bit,
therefore one would expect a highly predictable language to have
a longer word length than an unpredictable one.  Take an extreme
example of "qu" in English where it adds to the predictability
while making every 'qu' equal to one 'bit' as opposed to two and
increasing the word length of all 'q-words'.  This doesn't seem
to follow in Voynich, so that leaves three possibilities of an
incorrect assumption, and generating potentially good values for
two of the incorrect ones.  

1) If the word length as it stands now is wrong, then, given the
entropy, we should be able to generate a reasonable range for
the real word length and be able to check which character as
word break character gives us a value in that range.
2) Assuming that the entropy is wrong because the cipher
generates a different entropy than the underlying text, given
the word length, we might be able to generate a range for the
entropy and play with ciphers to see which kind alters the
entropy in a similar way (perhaps again a function).
3)  Voynich is in a really screwy language that has some feature
that gives its letters or some of it's letters the power to
convey more data than a conventional letter (like a tone mark).

Actually there is a fourth, that both the word length and the
entropy are wrong.

A good extreme data point on the graph might be generated with a
language like Hawaiian or Tahautl.

Regards,
Brian