[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Stolfi's Binomial word lengths revisited

To: "Vms-List@Voynich. Net" <vms-list@xxxxxxxxxxx>
Subject: VMs: Stolfi's Binomial word lengths revisited
From: Giddy Landan <giddy@xxxxxxxxxxxxxxxx>
Date: Wed, 19 Nov 2003 21:52:39 +0200
Importance: Normal
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx

Hi all,

I just had another look at Stolfi's word length analysis,
http://www.dcc.unicamp.br/~stolfi/voynich/00-12-21-word-length-distr/
that is that word lengths are distributed as binom(9,0.5)
shifted by 1. The visual agreement between the observed word length and 
the expected frequencies from the binomial distribution is so 
good that I was convinced the match is highly significant statistically.
To my surprise, it isn't:

Testing the null hypothesis 
H0: word lengths are drawn from binom(9,0.5) shifted by 1
vs.
H1: not so

I got:

G-statistic = 17.26
df = 10
p-value = 0.045 ( = 1 - chi2cdf(17.26,9) )

and

chi-square = 50.37
df = 10
p-value < 10^-7  ( = 1 - chi2cdf(50.37,9) )

so the null hypothesis is flatly rejected by the chi-square test
and rejected at the 5% significance level by the G-test.


Even worst. The observed lengths include some 11- and 12-letters words, 
which are values that cannot be realized by the hypothesized
binomial distribution, but all such words occur only one time
throughout the word sample. These could easily be errors, so I
tried filtering out all words with only one occurrence in the sample.
This resulted in :

G-statistic = 319.51
chi-square = 317.42
do = 10
p-value < 10^-16 

One last point. Stolfi gives an example code that produce the binomial 
word length distribution. It should be noted that the code is limited to 
2^9=512 distinct values, whereas the VMS sample vocabulary contains 6525
words. 

I confess that I find these finding shocking, and I would rather believe
my eyes. Would anyone point out my mistake?

Giddy
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.541 / Virus Database: 335 - Release Date: 14-Nov-03

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

Follow-Ups:
- Re: VMs: Stolfi's Binomial word lengths revisited
  - From: Nick Pelling

Prev by Date: VMs: Has anyone seen this facsimile ? (Tranchedino)
Next by Date: VMs: Fw: Strange pattern (key?)
Previous by thread: Re: VMs: Has anyone seen this facsimile ? (Tranchedino)
Next by thread: Re: VMs: Stolfi's Binomial word lengths revisited
Index(es):
- Date
- Thread