[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: wordlength persistence



Here is a less trivial example of two languages say A and B.
A has wordlengths (frequencies): 3 (1) 4 (2) 5 (3) 6 (2) 7 (1)
B has wordlengths (frequencies): 4 (1) 5 (2) 6 (3) 7 (2) 8 (1)
So A has average wordlength 5.0 and B has 6.0
...
I see what you're doing, but what are you up to? I mean, in which direction are
you leading us?



I should have been clearer here.

Language "A" containing words of length 4,5,6 and 7 letters, with frequences proportional to 1,2,3,2 and 1. So a 900 word document in language "A" would count 100 words of length 4, 200 words of length 5, etcetera. So the average wordlength is then 5.

Similar for language "B" with average wordlength 6.
Now in a document e.g. consisting of 900 words randomly in "A" followed by 900 words randomly in "B" one would find the "next word length" distribution as given previously:


word length: average next word length
3: 5.00
4: 5.33
5: 5.40
6: 5.60
7: 5.67
8: 6.00

Please check my reasoning for this again and see if you agree...
I hoped to demonstrate that two languages with different average wordlength as the VMs's is sufficient to produce this "word length connection".


Ger





---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.782 / Virus Database: 528 - Release Date: 22-10-2004


______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list