Re: AW: VMs: Character repetition

Hi John,

On Friday 17 September 2004 21:54, Koontz John E wrote:
> So, whatever else they are, the tokens or words of the VMs behave
> statistically like other cases of higher-level coding units and presumably
> represent some kind of significant unit.

Yes, that is what I think because I can see the same effect in other 

> Or maybe what you're actually saying, since you say it rather carefully,
> is that word spacing produces units similar in length to the units
> implied by spectal analysis, and simplest assumption is that the two sets
> of entities are one and the same?

Short: yes.
Long: There are several features in those correlation plots. I mostly 
concentrated in the short-length correlations. What I am saying :-) is that 
the analysis still recognises fluctuations in symbol occurrences that peak at 
the same length as the tokens (i.e. their mode) in various languages. Of 
course this could be a coincidence. The only way I can see to test this is to 
create surrogate data, and that is precisely what I did: character and token 
scrambling. The first one destroys the effect, the second doesn't. I 
therefore suggested that this peak has to do with word construction+relative 
frequencies of words and not with  sentence construction (i.e. the position 
of the tokens does not seem to affect it). 

Note that Chaucer's modal *verse* length can also be seen in the correlation 
plots (all this in space-less texts!). This 2nd peak and the long range 
correlation slope, disappear after both word- and character scrambling. Of 
course, this makes sense too: moving tokens around in the text, breaks the 

If one uses a polyalphabetic substitution with more than 2 alphabets, then 
these correlations quickly disappear as well -- as also does Zipf's law. This 
could be seen as another bit of hammering against Strong's "solution" because 
those features exist in the vms and become increasingly unlikely in 
polyalphabetic substitutions.



