[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Brute Force (information levels/'though')



Deviation from normal Zipf distribution of short words in the
Voynich MS:

While on the subject of information levels, using a written
alphabet may or may not introduce errors into an analyses of a
natural language.  How much information is apparently contained
in the English word 'thatch' when written as opposed to when
spoken.  Shouldn't the Zipf distributions mostly reflect the
spoken form rather than the written, especially at a time when
most people were illiterate?  Imagine an aphabet like Cherokee
(this may be a bad example, but follow me) which was created at
the time when it had had all of it's phonological influences
(for then) and was designed to cover it's own bases, versus
English which had to do things like create a 'th' combination
because of the addition of the voiced and unvoiced 'th' sounds. 
When you use a Roman alphabet to express a language with many
unexpresssable sounds, you get long words that are short on
phonemes, which are your real data points for Zipf.  Also add
odd spellings from words for whatever reason and you get
"though", a word that should be much shorter according to Zipf. 
It actually is very short, it is only two phonemes even THOUGH
it now takes six letters to write.  Of course we can't break
Voynich into phonemes with any certainty, but we might be able
to use deviations from normal Zipf distributions to tell us what
kind of alphabet or language it is, ie a language with heavy
ouside influences and an old alphabet, or an alphabet invented
in the times of the manuscript for a particular language.  
Regards,
Brian