AW: Counting the Gallow Bits

I did this, but the resulting numbers where quite similar only differing in
the 1st decimal.
This behaviour I have anticipated ,because in lines shorter than 7 tokens
the distribution ist a good mixture of 01 sequences (in fact all numbers
from 0x00 to 0xff exist, but some are of cource more frequent). But in
longer lines the tendency to clustering of 0 or 1 becomes significant.

Today I had the idea to make a bitmap of the VMS:
1. For every char in VMS I compute the frequncy and assign a color to the
char depending on frequency: from red (low frequncy) to blue (high
frequency). Than map every page to this colouring scheme using e.g a 10
pages by 10 pages grid. For the whole VMS I will get around 2 Bitmaps. Myabe
the image could reveal some structure (Courier A and B or something like
2. Then do the same for char pairs or triplets
3. At last the same for tokens
Do you think this is futile?
On 12 Sep 2000, at 20:18, Claus Anders wrote:
> yes I counted the labels too, but they're even long lines (11 tokens
> or less) with 100% 1 coverage.

Well, those statistics will mix the rules of word construction with
those of grammar.
I would count only those in text lines but not labels since a sequence
of labels may not have any grammatical structure.