[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Fw: Character n anomaly

To: voynich@xxxxxxxx
Subject: RE: Fw: Character n anomaly
From: Jorge Stolfi <stolfi@xxxxxxxxxxxxx>
Date: Mon, 30 Jul 2001 23:17:27 -0300 (EST)
In-reply-to: <5.1.0.14.0.20010730233253.0264d090@mail.globalnet.co.uk>
References: <5.1.0.14.0.20010730125613.0252aec0@mail.globalnet.co.uk> <IHECKCNAEFMGOEKMIABPOEJHCDAA.giddy@netvision.net.il> <5.1.0.14.0.20010730233253.0264d090@mail.globalnet.co.uk>
Reply-to: stolfi@xxxxxxxxxxxxx

[Note: I prefer to use the standard parsing nomenclature, where a
`token' is an occurrence of a `word'. So, for me, the sentence 
"the man can open the can" contains 6 tokens but only 4 words.]

    > [Nick Pelling:] If I was going to fake [the ultra-regular word
    > length] distribution (but instead peaking at, say, 10), I'd take
    > a pack of modern cards, throw out all the court cards, and,
    > every time I turned over an ace, insert a space. Once in a
    > while, I'd have to shuffle the deck: but basically that would be
    > it.
    >
    > But with average length 6, the easiest way would be to roll a
    > normal 6-sided dice: if it's a six, insert a space. How far off
    > is that from the observed distribution?

I am afraid that it won't do. With your method, the probability of a
random text token having k letters would be roughly p*(1-p)**(k-1)
where p is the probability of inserting a space (1/10 or 1/6 in your
examples).

This is an exponentially decaying distribution, which is quite
different from the humped and tail-less distribution we observe in the
VMS.

As for the distribution of *word* lengths: I haven't done the math,
but I believe that, if spaces were inserted at random, we should see
many more different words than we see. For instance, every letter
sequence with 1 or 2 letters should occur in the VMS --- which is
clearly not the case.

One way to test "Nullspace" theories is to remove all spaces from the
VMS text, then re-insert them according to the proposed method. If the
theory is correct, the resulting text should have the same word
statistics and structure as the original. The above space-insertion
methods would definitely fail this test.

In fact, the symmetrical distribution of word lengths is only a small
part of the picture. That feature is clearly connected with the very
rigid internal structure of the VMS words --- which seems to be
utterly incompatible with the theory that spaces are inserted at
random.

All the best,

--stolfi

References:
- Re: Fw: Character n anomaly
  - From: Nick Pelling
- RE: Fw: Character n anomaly
  - From: Giddy Landan
- RE: Fw: Character n anomaly
  - From: Nick Pelling

Prev by Date: Re: VMS graphology...?
Next by Date: RE: Fw: Character n anomaly
Previous by thread: RE: Fw: Character n anomaly
Next by thread: RE: Fw: Character n anomaly
Index(es):
- Date
- Thread