[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VMs: Re: Directions



> There is a lot of hidden info in there.

I don't have grep. But this simple perl script looks for keywords in the
mailing list archive and outputs every e-mail that contains the keyword. To
use it you have to unzip all the mailing list files and put them in one
directory. Then you run the following command in this directory (in Windows
with Active Perl):

perl extract.pl keyword > keyword_file.txt

In Unix I suppose it would be something like this:

extract keyword > keyword_file

Probably it will be mashed in the mail so I also posted it on my website.
Don't laugh at the crude programming - it works ... at least on my computer.

www.euronet.nl/users/kazil/extract.pl

###
#!/usr/bin/perl -w

$debug_1 = 0 ; # print filelist
$debug_2 = 1 ; # print openfile message
$debug_3 = 0 ; # open one file only - the first one
$debug_4 = 0 ; # print the keyword
$debug_5 = 0 ; # print the header start and stop strings
$debug_6 = 0 ; # print the string

$keyword = $ARGV[0];
if ($debug_4) { print "Keyword: $keyword \n" ; }

@files = <*.txt> ;
$length = @files ;

if ($debug_1) {
 for ($i=0 ; $i < $length ; ++$i ) {
  print "$files[$i]\n";
 }
 print "Text_files: $length\n";
}

if ($debug_3) { $length = 1 ; }

for ($j=0 ; $j < $length ; ++$j ) {

 if ($debug_2) { print "*** Opening: $files[$j]\n"; }

 open  (INPUT, $files[$j]) || die "can't open " ;

 read_file () ;

 close (INPUT )            || die "can't close" ;

}

sub read_file {

 $mail_header = 1;
 $message = "";
 $match = 0 ;

 while ($line = <INPUT>) {

  if ($debug_6) { print $line ; }

  if ( $mail_header == 1 ) {
   if ( $line =~ /^Subject\: / || $line =~ /^Date\: / || $line =~ /^From\:
/ ) {
    $message = "$message$line" ;
    next ;
   }
  }

  if ( $mail_header == 1 && $line =~ /^Status\: / ) {         # Status: line
ends header
   $mail_header = 0 ;
   if ($debug_5) { print "Header 10  : $line" ; }
   next ;
   }

  if ( $mail_header == 0 && $line !~ /^From / && $line !~ /\d\d\:\d\d/ ) {
   $message = "$message$line" ;
   if ($line =~ /$keyword/i ) { $match = 1 ; }
  }

  if ( $mail_header == 0 && $line =~ /^From / && $line =~ /\d\d\:\d\d/ ) { #
start header
   $mail_header = 1 ;
   if ( $match ) { print "$message\n" ; }
   $message = "";
   $match = 0;
   if ($debug_5) { print "Header  01 : $line" ; }
  }
 }
}