[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Token in VMS and their neighbour
Hello,
today I tried to find the correlation between individual token and their
next neighbour token (with line wraparound). I was suprised, that such
double tokens (using "." as delimiter) are very rare. Here is the top 30
(out of about 28000 combinations):
# of occ. 1.token next token
-----------------------------------------
38 chol daiin
32 or aiin
27 shedy qokaiin
25 shey qokaiin
24 daiin daiin
22 qol chedy
22 chol chol
22 chedy qokaiin
20 qokaiin chedy
20 ol shedy
20 ol chedy
20 daiin chey
19 shedy qokedy
19 chedy qokeey
18 shedy qokeedy
18 daiin chedy
18 ar aiin
17 qokaiin shedy
17 ar al
15 qokeedy qokeedy
15 qokal chedy
15 ol daiin
15 chedy qol
14 shedy qokal
14 qokedy qokeedy
14 chey qokaiin
14 chedy qokedy
13 shedy qokeey
13 qokedy qokedy
13 qokal shedy
Does such low redundancy (=> high entropy) occur in other natural languages
too ?
Best to all
Claus