[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: VMs: Character repetition

Thanks, this clarifies matters for me!

John E. Koontz
> My comment was: I find it remarkable that removing the spaces and doing
> a spectral analysis of the space-less stream one still can see the modal
> token length and verse length in (for instance) Chaucer. The same token
> length-related peak (this time at 5.9) appears in the space-less vms
> which corresponds to the modal token length when considering spaces as
> the delimiters (5 or 6 depending how one measures it). If one scrambles
> the characters (still same char. distribution) this peak disappears, so
> obviously it has to do with the word construction rules (the peak does
> not disappear by

Would we, perhaps, be safer here to say that the modal token length has to
do with the token construction rules?  In case, for example, token weren't
words, but were some other unit?  Although, for example, I wouldn't expect
a list of numbers to behave this way under scrambling.  But a list of
"letter encodings" or "numbers + grammatical endings" or "syllable
encodings" might.

I can see how this argument suggests that token (probably word) spacing
reflects the underlying divisions of the text and is not arbitrary.

To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list