[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: No numbers in the VMS
Hi Nick,
>
> >A simple yet very hard to decipher device is the following.
> >You take a text in a plain language, e.g. latin. Then you write it using
> >the hebrew alphabet. Finally you represent each letter of the hebrew
> >alphabet with some symbol of your choice.
>
> There have been many variations on this general theme suggested over the
> years: it's possible to mount a fairly convincing rebuttal on the grounds
> that the statistical entropy of the text is too low - bear in mind that
> losing vowels would tend to make the text more dense (rather than less)...
> hence higher entropy, not lower.
>
I understand that discussing with each newcomer issues which have already been
settled long ago must be more frustrating than rewarding, but please be
patient.
If this theme has been discarded only on entropy consideration, I believe
that it's been dismissed too hastily.
Even in my less than amateurish knowledge of the field, I understand pretty
well that removing vowels is bound to increase a text entropy.
I also understand pretty well that a vowel-less alphabet such as hebrew (the
same holds true for arab AFAIK), leads intrinsically to a higher text entropy
than a latin-western alphabet.
The lack of redundancy produces misunderstandings such as the well known
Gospel translation error where cable (GOMEL) has been translated into camel
(GAMAL), resulting in an amusing but incorrect metaphor.
But, and here is my point, if you're using the same alphabet to write a
different language, matters change radically, because you're forced to render
sounds and word structures which are not in the realm of the language the
alphabet was created for. If you try to write for a german reader an italian
text, you'll stuff it with groups of letters meant to render sounds and
structures which do not exist in german. You'll end up with a lot of ''tsch"
to render the soft sound of "c" in italian, "dzh" to render the soft sound of
"g", and so on. On intuitive ground I'd bet that the entropy of such a text
is significantly lower than that of a plain german text.
The same holds true, but to a higher degree, when you attempt to use the
hebrew alphabet to render a western language such as latin. All words
beginning by a vowel A-E-I (quite rare in hebrew, and quite common in latin)
must begin with the place-holder "aleph", while words beginning by O-U (also
rare in hebrew) will begin either by "waw" or by "aleph-waw", depending on
your choice. You may drop some vowels in the middle of the word, but not too
many (because otherwise the reader is led to read the word following the
hebrew structure), and in most cases, to have the word correctly read you'll
use either "aleph" or "yod" for "i", "waw" for "o" or "u", and again "aleph"
for A-E. The presence of a vowel at the end of the word is marked by a "he"
place-holder, to make it clear that the word doesn't end with the preceding
consonant. If the ending is "i" it becomes "yod-he", "o" or "u" become
"waw-he", while "a" or "e" become just "he".
As a conclusion, a vowel at the beginning or at the end of the word is never
dropped, but is replaced from a symbol from a limited set, while a vowel in
the middle of a word can be dropped only if it's an "a" or an "e" between two
consonants, while "i", "o" and "u" are never dropped.
You end up with a text which isn't vowel-less, but which has replaced five
vowels with four symbols and/or symbol pairs depending from the position of
the vowel in the word. Maybe you drop a 20% of vowels, but replace five
symbols with one or two at the beginning of the word, with one or one out of
two pairs at the end of the word, and with three in the middle of the word.
The net result, it seems to me, should be not an increase of entropy, but a
decrease.
As an empirical evidence, my knowledge of hebrew is quite scant, my reading of
a hebrew text is more a deciphering than a reading, but nonetheless I can spot
at a glance the presence of a foreign word in the text, because of the
presence of those typical patterns which are extraneous to the language.
Moreover If you're rendering a language with word declination (such as latin
or greek), which has a limited number of endings, you add another element of
regularity, which helps to decrease total entropy.
All that said, my question is: has anybody attempted to measure the actual
entropy (and the other relevant statistical properties) of a comparison text
built according a set of rules as those? This could help either to rule out
completely or to give some ground to such a scheme.
If not, I volunteer for the part of the work I can undertake.
The only middle age text I have readily available on electronic support is the
"Tractatum spere" of Bartholomeus Parmensis written in 1397, which was a
textbook of the Bologna University, for the teaching of astronomy and
astrology, and which appears to me to be suitable.
I could rewrite some of it using the hebrew alphabet and trying to follow a
set of consolidated rules. As I'm not familiar enough with statistical text
analysis programs, I'd leave the analysis to someone more deep in the field.
I'd only need to know which set of symbols to use to represent the hebrew
alphabet, to be compatible with the existing programs. I assume that a 1 to 1
mapping should be the best, but I'd like confirmation.
[...]
> >Coming finally to numbers, numbers in hebrew are traditionally written
> >using the numerical value of the letters, but combining them in such a
> >way as to produce a word which can be pronounced. So, for instance the
> >number 15 is usually written as TW (I'm use a standard latin alphabet
> >transliteration), which can be pronounced "too", and results from the
> >numerical value 9 of T and 6 of W. T+W = 9+6 = 15. The numerical value
> >of words is the base for all the Cabalistic lore. This makes almost
> >impossible to detect numbers in a hebrew text using statistical
> >analysis, because numbers can't be told apart from words.
>
> .....like this! It's completely possible that (for example) ot- words could
> be followed by this kind of number-scheme. Once we get a better grasp of
> the VMS morphology, we ought to experiment with this & see how far it goes.
>
> Can you point us to a place where this (Sephiroth?) is described?
>
Sorry, my knowledge is rather anecdotal. All the references I've ever found
take for granted the general rules and the consolidated usage, and just deal
with a particular case. The numerical value of single letters is just their
position in the alphabet.
Thank you for the patience
Giuliano Colla
P.S. All of the above would be quite amusing for someone already in the grave
since a number of centuries, if VMS is nothing but a hoax...
---
Ing. Giuliano Colla
Direttore Tecnico
Copeca srl
Via del Fonditore 3/E
Bologna (Zona Industriale Roveri)