[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VMs: entropy program

To: vms-list@xxxxxxxxxxx
Subject: Re: VMs: entropy program
From: "Rafal T. Prinke" <rafalp@xxxxxxxxxx>
Date: Tue, 19 Aug 2003 00:42:45 +0200
References: <20030815065031.44644.qmail@web40403.mail.yahoo.com> <3F3E7ECC.9FA9EF20@amu.edu.pl> <200308171030.08362.G.Landini@bham.ac.uk>
Reply-to: vms-list@xxxxxxxxxxx
Sender: owner-vms-list@xxxxxxxxxxx

Gabriel Landini wrote:

> >   http://ouray.cudenver.edu/~wcjackso/DCProject/project.html

> I did not look in detain, but in that site, the "gram" is a string separated
> by delimiters (i.e. a token). I think in Monkey this is word entropy.
> The reported low h2 in the vms, relates to the *character* entropy, not word
> entropy.
> 
> This may explain the differences you found.

It does both character and word entropy - as does Monkey.
The only thing that cannot be set in JEC is the alphabet
(or characters to be ignored). It does have a list of
"white space characters" but does not seem to ignore them.

I have made a small "random text" file and fed it to both
programs. The results are slightly different, probably due
to the treatments of white space characters. The file 
has 5,734 bytes and:

Monkey reports: 5,639 chars, 11 different chars, h1=3,027

JEC reports:    5,734 chars, 13 different chars, h1=3,169

So it seems that Monkey filters EOL off (I used "Space ON")
while JEC counts it as two different characters CR and LF
(it doesn't say that, but it is pretty obvious).

It is interesting, as Unix/Linux uses only one character
(LF) for line ends (and so does Mac - but CR), while
DOS/Windows use two charcters CR/LF. So if they are not
filtered off, the entropy calculated on different systems
will differ sligltly (unless the same physical file is used).
On the other hand, if they are filtered off, then there
is no space between the last and first word on two 
succeeding lines, thus influencing word entropy calculation.

I have now removed all EOL chars and spaces from my "random
text" and the results from both programs where closer.
But this time Monkey read only 255 characters of the 4,888
(why?).

Best regards,

Rafal
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list

References:
- Re: VMs: meaning of entropy
  - From: Rene Zandbergen
- VMs: entropy program
  - From: Rafal T. Prinke
- Re: VMs: entropy program
  - From: Gabriel Landini

Prev by Date: Re: VMs: learning
Next by Date: Re: VMs: entropy program
Previous by thread: Re: VMs: learning
Next by thread: Re: VMs: algorithm to generate VMS like text
Index(es):
- Date
- Thread