[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Token in VMS and their neighbour



Hello,
today I tried to find the correlation between individual token and their
next neighbour token (with line wraparound). I was suprised, that such
double tokens (using "." as delimiter) are very rare. Here is the top 30
(out of about 28000 combinations):

# of occ. 1.token next token
-----------------------------------------
38	chol	daiin	
32	or	aiin	
27	shedy	qokaiin	
25	shey	qokaiin	
24	daiin	daiin	
22	qol	chedy	
22	chol	chol	
22	chedy	qokaiin	
20	qokaiin	chedy	
20	ol	shedy	
20	ol	chedy	
20	daiin	chey	
19	shedy	qokedy	
19	chedy	qokeey	
18	shedy	qokeedy	
18	daiin	chedy	
18	ar	aiin	
17	qokaiin	shedy	
17	ar	al	
15	qokeedy	qokeedy	
15	qokal	chedy	
15	ol	daiin	
15	chedy	qol	
14	shedy	qokal	
14	qokedy	qokeedy	
14	chey	qokaiin	
14	chedy	qokedy	
13	shedy	qokeey	
13	qokedy	qokedy	
13	qokal	shedy	

Does such low redundancy (=> high entropy) occur in other natural languages
too ?

Best to all
Claus