[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Char repetition redux

To: <vms-list@xxxxxxxxxxx>
Subject: VMs: Char repetition redux
From: "Anders, Claus" <Claus.Anders@xxxxxxxxxxxxx>
Date: Thu, 23 Sep 2004 12:48:00 +0200
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx
Thread-index: AcShWstgEb2or/bRT9G/x6QCsu2vCA==
Thread-topic: Char repetition redux

Title: Char repetition redux

Hi all,
I replaced all ii with I and all ee with E and getting now the following peak frequency for char distance:

Freq. Dist Char. pair

0.631167 7 e-e
0.660071 6 e-e

0.0446923 6 i-i
0.0478532 7 i-i

0.0775552 7 I-I
0.121701 6 I-I

0.209564 7 E-E
0.21246 6 E-E

1.61051 7 o-o
1.70724 6 o-o

IMHO this feature suggests the following:
ii and ee are single characters (the same pattern now for the other chars)
6.5 is the average token length (=number of chars between two repetions of the same character)
char tend to occur in the same position within in token.

(the relative frequency is the quotient of all occurences of a pair with the given distance divided by the number of all occurences of the same distance).

Replacing the ii/ee pairs with one character changes the proportions of the token structure.Now there is no exception for average distance of char pairs.

Claus

Follow-Ups:
- Re: VMs: Char repetition redux
  - From: elvogt

Prev by Date: Re: VMs: Demons, Daimones, Daemones...
Next by Date: Re: VMs: Char repetition redux
Previous by thread: Re: VMs: Monkey authorship
Next by thread: Re: VMs: Char repetition redux
Index(es):
- Date
- Thread