[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: VMs: Character repetition

To: <vms-list@xxxxxxxxxxx>
Subject: AW: VMs: Character repetition
From: "Anders, Claus" <Claus.Anders@xxxxxxxxxxxxx>
Date: Wed, 15 Sep 2004 16:27:27 +0200
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx
Thread-index: AcSbK38EKNWJ3YYlQ/OiA8lzMHJ89AAAy9zg
Thread-topic: VMs: Character repetition

Ok here is my revides version, using relativ values:

4.62723 6 .-.
2.31618 1 e-e
2.0771 1 i-i
1.67688 7 o-o
1.46827 7 y-y
1.06746 6 h-h
0.760113 7 d-d
0.720149 7 a-a
0.67459 7 k-k
0.60622 6 c-c
0.304525 7 t-t
0.301424 6 l-l
0.290138 7 q-q
0.216604 7 n-n
0.212577 3 r-r
0.188137 6 s-s
0.0423617 7 p-p
0.0101448 5 m-m
0.00559494 7 f-f

1st column: percentage of occurences regarding all character pairs in one line with the same distance
2nd: number of characters between any prior character in the line and the actual one
3rd: prior character and actual character.

The method is:
Take every line.Look at one character by one going from left to right.
Compute the distance (within the line) to every prior char and increase the distance value for these pair (there is a table with the pair and the distance as index).
After going through the whole VMS,divide the count of each pair/distance entry by the number of distance occurence:
This is the awk-script:

BEGIN {anz=0;code="-=.abcdefghijklmnopqrstuvwxyz"}
/.*/{
	pos=0;
	for (i in lp)
       	delete lp[i]
       	gsub("\{.*\}","");
	for(i=1;i<=length($2);i++)
	{
		c=substr($2,i,1);
		
		if(index(code,c)<1)
			continue;
		pos++;
		for(j in lp)
		{
			fp=pos-lp[j];
			pair=fp " " j "-" c ;
			if(pair in count )
			{
				count[pair]=count[pair]+1;
			}
			else
			{
				count[pair]=1;
			}
			countfp[fp]=countfp[fp]+1;
		}
		lp[c]=pos;
	}
	anz=pos;
}

END {
	c=0;
		for(i in count)
		{
			split(i,aa," ");
			print count[i]*100/countfp[aa[1]] " " i;
		}
} 

After that, I got a list of all char pair distances with relative frequency.Now I answer the following questions:which is the probabiltiy, that char 'x1' is followed by 'x2' or which chars are most likely to be found on position 1,2,3,4,5.. After a space.The above double char table is just a by-product.

Claus
-----Ursprüngliche Nachricht-----
Von: Lukas Palatinus [mailto:palat@xxxxxx] 
Gesendet: Mittwoch, 15. September 2004 15:42
An: vms-list@xxxxxxxxxxx
Betreff: Re: VMs: Character repetition


Hi Claus,

> I found the following table quite interesting:
> Occurences distance char-pair 
> 6862 6 .-. 
> 5028 1 e-e 
> 4509 1 i-i 
> 2145 3 o-o 
> 1837 7 y-y 
> 1583 6 h-h 
...

Please, can you give some more detailed description of what your 
table means and how was it generated?

There are the most frequent distances between the specified pairs of 
letters in the table? What about the distribution of the distances?

The table looks very interesting, but I think more explanation is 
needed for it to be of real use for others.

Lukas

______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

Prev by Date: AW: VMs: Character repetition
Next by Date: Re: VMs: Ways to generate voynichese-like text
Previous by thread: Re: AW: VMs: Character repetition
Next by thread: Re: AW: VMs: Character repetition
Index(es):
- Date
- Thread