[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: Number crunching the Fincher window



On Mon, 13 Sep 2004, Elmar Vogt wrote:
> Here's what I got -- number of different sequences for different sequence
> lengths:
>
> Length   VM        German
> 4        4389        9435
> 5        8773       14949
> 6       14087       19623
> 7       19432       23443
> 8       23934       26264
> 9       27263       28237
> 10      29527       29609
> 11      30954       30543
> 12      31783       31187
> 13      32249       31651
> 14      32491       31964
> 15      32612       32190
> 16      32674       32346
>
> ...
> We seem to see that natural languages have a larger variety of short
> sequences. At the same time, for longer sequences, the VM gets more varied,
> until at a sequence length of 16, there were only 90 instances of phrases of
> 16 or more characters, which got repeated. (In German, we still had some 400
> duplicates.)

Pardon my denseness, but I don't see how we got from the preceding table
to the numbers in the text?

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list