[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: low entropy text
Rene Zandbergen wrote:
> Claus Anders wrote:
> > today I tried to compute the entropy of a raeto romance example:
> > I got (with monkey):
> > h0: 4.32
> > h1: 3.93
> > h2: 2.69
> > Nearly as low, as VMS and even h1-h2 is in the same range.
> > Maybe the numbers are due of the low char count of my example.
> Actually, h1 is quite normal for a 20-character alphabet (as implied
> by the h0). h2 is right in between Latin and Voynichese, and the
> relatively low value could indeed be due to the shortness of the text
> (the higher the order, the more the estimated entropy is reduced by
> this). You can see some of that happening in the graphs of the
> web article I mentioned yesterday.
Twenty characters sounds low for a Romance language.
Could it be due to a lossy spelling?
> Which brings me to the other thread: what we need is a word game
> which both reduces entropy and word length, still keeping the
> vocabulary size reasonable. That last feat may of course be
> assisted by introducing spelling variations.
An EKT word game could do this. (If you're not
familiar with that,see:
You could reduce word length by making the "word"
breaks syllable breaks. EKT allows for variant
> Cheers, Rene