The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

  Mail::SpamCannibal::ParseMessage - parse mail headers

SYNOPSIS

  use Mail::SpamCannibal::ParseMessage qw(
        limitread
        dispose_of
        headers
        rfheaders
        skiphead 
        get_MTAs
        firstremote
        array2string
        string2array
  );

  $chars = limitread(*H,\@lines,$limit);
  $rv = dispose_of(*H,$limit);
  $hdrs = headers(\@lines,\@headers);
  $hdrs = rfheaders(\@lines,\@headers);
  $lines = skiphead(\@lines);
  $mtas = get_MTAs(\@headers,\@mtas);
  $from = firstremote(\@MTAs,\@myhosts,$noprivate);
  $string = array2string(\@array,$begin,$end);
  $count = string2array($string,\@array);

DESCRIPTION

Mail::SpamCannibal::ParseMessage provides utilities to parse mail headers and email messages containing mail headers as their message content to find the origination Mail Transfer Agent.

  use Mail::SpamCannibal::ParseMessage qw(
        limitread
        dispose_of
        headers   
        skiphead  
        get_MTAs  
        firstremote
        array2string
        string2array
  );

  # example of reading mail message from STDIN

  # read up to 10000 characters
  my @lines;
  exit unless limitread(*STDIN,\@lines,10000);

  # release the daemon feeding this script
  dispose_of(*STDIN);

  # optional, if message content is headers
  # skip the real headers on this message
  exit unless skiphead(\@lines);

  # linearize headers, convert multi-line headers
  # to single line, removing extra white space
  my @headers;
  exit unless headers(\@lines,\@headers);

  # get list of MTA's from headers  
  my @mtas;
  exit unless get_MTAs(\@headers,\@mtas);

  # extract the first remote MTA from the 
  # resulting MTA object
  my @myhosts = qw(
        mail1.mydomain.com
        mail2.mydomain.com
  };
  my $remoteIP = firstremote(\@mtas,\@myhosts);

SUBROUTINE DESCRIPTIONS

  • $chars = limitread(*H,\@lines,$limit);

    Read $limit charcters (or to end of file) from stream *H and place the lines in an array.

    This is useful for reading an input stream which could overflow internal buffers if it were not in the expected format.

      input:        *H,     # stream handle
                    array pointer,
                    limit   # max characters [1000 default]
    
      returns:      number of characters read
  • $rv = dispose_of(*H,$limit);

      Empty the stream *H
      .... reads until EOF and returns
    
      input:        *H              # stream handle
                    limit           # max buffer size
                                    # default 1000
    
      return:       positive integer if anything read
                    else zero
  • $hdrs = headers(\@lines,\@headers);

      Reads lines from array and returns them
      in and array of headers. The headers are 
      unfolded into single lines.
        i.e.
      Received: from hotmail.com ([64.216.248.129])
            by mail.mydoamin.com (8.12.8/8.12.8) 
            with SMTP id h2KIRcYC029373;
            Thu, 20 Mar 2003 10:27:39 -0800
    
      would be returned as one header line with 
      compressed white space
    
      input:        pointer to inout line array
                    pointer to output headder array
    
      returns:      number headers
  • $hdrs = rfheaders(\@lines,\@headers);

    Similar in function to "headers" above. Parsing is "dirty" in the sense that extraneous leading characters such as:

      >> etc... 

    are ignored and lines improperly wrapped without leading white space (by your email client) will be added correctly to the header in a manner that can be parsed by "get_MTA's"

    This method is not a "pure" as just using "headers", but it also does not require properly formated header text with no leading spaces or characters.

      input:        pointer to inout line array
                    pointer to output headder array
    
      returns:      number headers
  • $lines = skiphead(\@lines,\@discard);

      Removes lines from the text array until one
      or more blank lines are found. Leading blank
      lines are removed and the top of the array
      is positioned at the first line with text.
    
      Optionally, an array of the skipped lines
      is returned for use in bounce messages.
    
      input:        pointer to text lines,
                    [optional] ptr to skip lines
    
      returns:      number of lines remaining
  • $mtas = get_MTAs(\@headers,\@mtas);

      Return  an array pointing to a structure of
      "Received: from" MTA's found in header lines.
    
      each array entry ->{from} = IP addr;
                    |--->{by}   = host or IP;
    
      input:        pointer to header array
    
      returns:      number of MTA entries
  • $from = firstremote(\@MTAs,\@myhosts,$noprivate);

      Parse the "Received: from" structure for the first 
      remote MTA address that is not in @myhosts or is
      not part of a private network where:
    
      @myhosts = (
            '12.34.56.78',          # a dot.quad address
            '12.34.56.0/28',        # a net block
            'mail.mydomain.com',    # a domain name
            'etc...',
      }
    
      The IP addresses of "named" hosts will be resolved for
      multiple interfaces. If you do not want this behavior
      then always use dot.quad notation.
    
      The private networks listed below are automatically
      included in @myhosts by default. If you do not want
      this behavior, set $noprivate TRUE.
    
            127./8, 10./8, 172.16/12, 192.168/16
    
    
      input:        pointer to "Received: from" structure,
                    pointer to array of local host names,
                    [optional] no private nets = TRUE
    
      returns:      ip address of first "from" remote host
                    or and 'empty' character [''] if the 
                    remote host can not be determined.
  • $end = trimmsg(\%MAILFILTER,\@lines)

    If message length is limited by configuration of MAXMSG, remove duplicate blank lines and return the $end pointer for further processing

      input:        pointer to MAILFILTER hash,
                    pointer to @lines array
      returns:      ending line number
  • $string = array2string(\@array,$begin,$end);

    Makes a string from the array elements beginning with $begin and ending with $end. If $begin is undefined, 0 is assumed. If $end is undefined, $#array is assumed. An empty string is returned if $begin > $end.

    Unlike a 'join', 'array2string' adds an endline to the 'end' of the string in this manner:

      $string = join("\n",@array,"");
    
      input:        pointer to array of lines
      returns:      string;
  • $count = string2array($string,\@array);

    Convert a string into an array of separate lines. Surpresses multiple trailing blank lines. Considers a dangling line to be complete.

      i.e.  "once upon a time
             there were three"
    
      is the same as:
    
            "once upon a time 
             there were three
            "
    
      input:        string or string pointer,
                    pointer to array
      returns:      line count

DEPENDENCIES

  NetAddr::IP::Lite version 0.02

EXPORT

        none

EXPORT_OK

        limitread
        dispose_of
        headers
        rfheaders
        skiphead
        get_MTAs
        firstremote
        array2string
        string2array
        trimmsg

AUTHOR

Michael Robinton <michael@bizsystems.com>

COPYRIGHT

Copyright 2003 - 2007, Michael Robinton <michael@bizsystems.com> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

SEE ALSO

perl(1)