I did this, but the resulting numbers where quite similar only differing in the 1st decimal. This behaviour I have anticipated ,because in lines shorter than 7 tokens the distribution ist a good mixture of 01 sequences (in fact all numbers from 0x00 to 0xff exist, but some are of cource more frequent). But in longer lines the tendency to clustering of 0 or 1 becomes significant. Today I had the idea to make a bitmap of the VMS: 1. For every char in VMS I compute the frequncy and assign a color to the char depending on frequency: from red (low frequncy) to blue (high frequency). Than map every page to this colouring scheme using e.g a 10 pages by 10 pages grid. For the whole VMS I will get around 2 Bitmaps. Myabe the image could reveal some structure (Courier A and B or something like that). 2. Then do the same for char pairs or triplets 3. At last the same for tokens Do you think this is futile? Cheers Claus -----Ursprungliche Nachricht----- Von: Gabriel Landini [mailto:G.Landini@xxxxxxxxxx] Gesendet: Mittwoch, 13. September 2000 09:58 An: Claus Anders Betreff: Re: Counting the Gallow Bits On 12 Sep 2000, at 20:18, Claus Anders wrote: > yes I counted the labels too, but they're even long lines (11 tokens > or less) with 100% 1 coverage. Well, those statistics will mix the rules of word construction with those of grammar. I would count only those in text lines but not labels since a sequence of labels may not have any grammatical structure. Cheers, Gabriel

