
Mail::Sort - split incoming mail according to header matching conditions

use Mail::Sort;
$sort = new Mail::Sort(test => 0,
logfile => "$ENV{HOME}/sortlog",
loglevel => 2);
$spool = "$ENV{HOME}/Mail";
$sort->lock("$ENV{HOME}/one_at_a_time_please");
$sort->unlock("$ENV{HOME}/one_at_a_time_please");
$sort->deliver("| formail >> $spool/junk")
if $sort->header_match('from', 'spammer');
$sort->deliver("| formail >> $spool/work", keep => 1)
if $sort->destination_address('me@work.com');
$sort->deliver("| formail >> $spool/list", label => 'cool list')
if $sort->sender_match('owner-coollist@lists.r.us');
$sort->forward("boss@work.com")
if $sort->header_match('subject', 'accept credit cards');
$sort->ignore()
if grep { $_ =~ /make money/ } @{$sort->{body}};
exit(1);

Yet another module intended to enable the writing of mail filters in the style of procmail(1).

creates a new Mail::Sort object after reading a mail message from stdin. test is a boolean; when set, any delivery methods subsequently called on the returned object will perform simulated delivery only (destination mailboxes are not modified, although they are still locked and opened). logfile is the name of a file where delivery and matching methods record their activity, or a pre-made FileHandle reference for the same purpose. However, this logging only occurs if the level of each particular log output is less than or equal to loglevel. The following level values are used by convention:
lockwait is the initial interval in seconds between retries while waiting for a lock to become available (default 5), and locktries is the total number of retries before giving up (default 5). Exponential back-off with randomness is used after the second try to avoid starvation scenarios. callback, if provided, should be a reference to a subroutine expecting 2 arguments. The subroutine will be called after each unsuccessful locking try, with the lockfile name and the number of tries so far as arguments.
If auto_dedupe is defined, it is interpreted as the name of a directory where a database of all delivered mail is kept. If the message passed to the constructor isn't new (i.e. it has already been remembered in the database), it will be delivered to /dev/null instead of any delivery that your calling filter script specifies. Because of concurrency considerations (see README for the gory details), this check is not done at construction time, but at delivery time (i.e. when deliver() is called, directly or indirectly).
The constructor inspects the first line of the message to determine if it is in the 'From ' format traditionally used to delimit messages in Unix mailbox files. If it is, it sets the envelope_from attribute of the new object accordingly. This can, however, be overriden by supplying the equally named parameter to the constructor.
This alternative form of the constructor reads a mail message from the FileHandle fh instead of stdin.
This alternative form of the constructor reads a mail message from the array reference arrayref instead of a filehandle.

The header_match method greps through the message headers, looking for a header line whose tag matches tag and whose content matches pattern, after leading context. tag, pattern, context are Perl regular expressions; pattern, context are optional with natural defaults, so this can also be used to simply retrieve a particular header. The matching is case-insensitive unless tag contains an uppercase ASCII letter. The method returns the list of indices of all matching header lines. The text of the header lines itself can be obtained with the next method.
Given a set matches (represented as a list) of header indices, returns the list of actual corresonding headers. Because this library doesn't silently modify the headers (or body) in any way, the returned headers may be folded, i.e. each may consist of more than one line.
match_group returns the string that matched the appropriate parenthetical group in pattern at the last call to header_match (or any of the header matching methods that follow and are based on it). For example:
@matches = $sort->header_start('from',
'(daemon|server)\@localhost');
# sets $which to 'daemon' or 'server'
$which = $sort->match_group(1);
Some caveats are in order. First, if multiple headers match and are returned from header_start (that can happen if you use a nontrivial regexp for the tag, use one of the following methods that do that behind your back, or match a repeated header such as Received), the last matching header wins and sets the match groups. Second, if you have groups in tag or context, you must use the (?:) syntax, otherwise they become part of the total match and their contents will be returned by match_group.
This method returns the list of header indices whose contents start with pattern. The call above is exactly equivalent to
@matches = $sort->header_match($tag, $pattern, '\s*');
This method returns the list of matching destination header indices. The call above is exactly equivalent to
@matches = $sort->header_match( '(?:(?:original-)?(?:resent-)?(?:to|cc|bcc)|' .'(?:x-envelope|apparently(?:-resent)?)-to)', $pattern, $context);
This method returns the list of header indices with matching destination addresses. It corresponds to procmail's TO_ construct. The call above is exactly equivalent to
@matches = $sort->destination_match( $address, '(?:.*[^-a-z0-9_.])?');
This method returns the list of header indices with matching destination words. It corresponds to procmail's TO construct. The call above is exactly equivalent to
@matches = $sort->destination_match($word, '(?:.*[^a-z])?');
This method returns the list of matching sender header indices. It is intended mostly to help with matching mail from lists. The call above is exactly equivalent to
@matches = $sort->header_match( '(?:(?:resent-)?sender|resent-from|return-path)', $pattern, $context);
Logs a record into the logfile (that is, if level is at most that passed to the constructor). If the optional label is present, it is prepended to the record. The Mail::Sort matching and delivery methods call this internally to record their actions, but it can also be called directly by the user to produce customized logging.
Attempts to atomically create lockfile (using open(2)); dies unless successful.
The Mail::Sort delivery methods call this internally to lock the destination mailbox, if applicable; but it can also be called directly by the user, either to produce a global lock (one that mutually serializes any two instances of the filter), or to temporarily lock a destination mailbox where the necessity of this cannot be deduced automatically.
As with other methods that can die, if you wish to treat this as a temporary error (and let your MTA queue the mail) you need to either wrap the entire Perl delivery program in a shell script, or use eval, to ensure that the delivery process as a whole exits with EX_TEMPFAIL.
This just calls unlink(2) on lockfile.
Locks are NOT automatically released when a filter process exits, because this leads to conceptual difficulties with fork(2).
Delivers the message to target. target is any string suitable for open(); thus this method can
This method tries to automatically perform any necessary locking. This applies in case 2. above (always), and in case 3. when the output from the pipe is itself appended to a file. Both cases are handled the same way: if target contains the substring '>>', the following sequence of non-whitespace characters is interpreted as the destination filename, and a lockfile is created whose name is the destination filename with '.lock' appended. This is the algorithm used by procmail to implement its :0 : construct, if its documentation can be trusted.
The automatically provided lockfile (or its absence) can be overriden by providing the argument lockfile.
This method logs a record of the delivery at loglevel 2; if label is present, it's made part of the record, as described above.
Unless keep is true, this method exits with status 0 upon successful completion. In case of failure it dies, so wrap.
Forwards the message to the Internet address target. This is almost exactly equivalent to
$sort->deliver("| sendmail $target",
keep => $keep, label => $label);
This method logs a record of the delivery at loglevel 2; if label is present, it's made part of the record, as described above.
Unless keep is true, this method exits with status 0 upon successful completion. In case of failure it dies, so wrap.
Trashes the message in the bit bucket for eternity. This is exactly equivalent to
$sort->deliver("| cat >/dev/null",
label => $label);
This method logs a record of the delivery (such as it is) at loglevel 2; if label is present, it's made part of the record, as described above.
This method exits with status 0 upon successful completion.
Passes the mail as input to an external program and re-initializes self with the output of the program. This is the way to use a program such as formail(1) as a filter.
Returns a fresh automatically generated Unix style From line, with the envelope sender and current local time.
Adds a new header at position index in the header array. All existing headers above index move up by one. header_line must be the complete header line being inserted, including the header tag (such as Received:). You can also insert a RFC 822 conforming multi-line header, in which case header_line must be the concatenation of all (physical) lines that comprise the header. label is optional and is used for logging.
Just like the preceding method, but appends the new header at the end the header array instead of splicing it in the middle. It is exactly equivalent to
$sort->add_header_at(I<header_line>, scalar @{$sort->{head}});
Just like the preceding method, but appends the new header at the end the header array only unless a matching header already exists.
If there's exactly one header line with tag and index is 0, this will insert header_line just before tag (kicking it and all following headers up one). If there are multiple occurrences of tag, index selects before which one to insert. Important: index is the position in the list of headers with tag, not all headers.
Just like the preceding method, but adds after the selected occurrence of tag.
Deletes the header at position index in the header array. All existing headers above index move down by one.
As before, index selects one of the occurrences of tag. The header with this occurrence is deleted.
This method deletes all occurrences of tag. It returns the list of deleted headers.
Replace the header at position index in the header array with header_line. This must be the complete header line including tag, as before.
As before, index selects one of the occurrences of tag. The header with this occurrence is replaced with header_line.
xform must be a reference to a subroutine which expect a scalar reference as its only argument, and may modify the referent. This method applies xform to the header line at position index in the header array.
As before, index selects one of the occurrences of tag. The header with this occurrence is modified by passing it to xform.
Rename the header at position index in the header array with newtag.
As before, index selects one of the occurrences of tag. The header with this occurrence is renamed to newtag.
This method renames all occurrences of tag to newtag.
Appends line to the header array just like append_header, but in addition renames all existing same-named headers to "X-Original-tag", where tag is the tag of header_line.
Deletes all occurrences of tag except the first.
Deletes all occurrences of tag except the last.
Warning: This method is deprecated, instead use the auto_dedupe construction attribute, or use methods of the undocumented internal class Mail::Sort::Dedupe directly.
This method is intended to approximate the functionality of "formail -D n path". ttl determines how many messages to remember. Each clean_period (default: one day), this method scans all the remembered messages and forgets the ones older than ttl (default: one week). If the message isn't new (i.e. it has already been remembered when this method is called), this method returns true or "successfully ignores" the message, according to the value of keep.

The head attribute is a reference to an array of message headers. These headers are the ones used for the matching methods.
The body attribute represents the message body. It is a reference to an array of body lines. No transformation is applied to the body lines; in particular, no folding, unfolding, or From escaping.
The envelope_from attribute is a string normally set according to the original message's 'From ' line. It is useful for deliveries to Unix mailboxes or to the formail(1) filter.

/dev/stderr - default log file

procmail(1), procmailrc(5), sendmail(8), open(2), fork(2), maildir(5), formail(1)

These are the messages logged with loglevel 0 or 1.
What is says. loglevel 0; delivery is not attempted beyond this point. The most likely cause is insufficient privilege to create files in the target directory. See "BUGS" for one common case.
The lock() method timed out because the lockfile already existed. Most likely this is a stale lockfile left over from an errant process that needs to be removed manually.
What is says. loglevel 0; delivery is not attempted beyond this point. The most likely cause is insufficient privilege to write the target file.
This can only occur when delivering to a pipe. loglevel 0; delivery is not attempted beyond this point. The particular value of exit status is of some import; if exit status < 256, the subprocess probably received a fatal signal (though this is architecture-dependent).
What it says. loglevel 1; delivery continues.
What it says. loglevel 1; delivery continues.
What it says. loglevel 1; delivery continues.

Mail::Sort uses dotlocking for all its locking needs. This presents a slight problem with delivery to the main system spool, on systems (like Debian) where the mail spool is not world-writable. The obvious way:
$sort->deliver(">>/var/mail/$ENV{LOGNAME}");
won't work. There are two answers to this:
$sort->deliver("| procmail -d $ENV{LOGNAME} /dev/null");
No explicit locking is necessary in this case, because procmail knows how to do that itself, and has been installed with the required privileges to do that (one hopes). The /dev/null in the above command tells procmail to ignore any configuration files, not to trash your mail.
Mail::Sort has no built-in filename magic; there's no equivalent of procmail constructs like ORGMAIL or MAILDIR. The author considers this a feature. In a general-purpose language like Perl, it is trivial to do these things from the filter script itself. The real bug is the inclusion of this paragraph in the BUGS section. :-)
Mail::Sort is somewhat Unix-centric; it probably won't be useful without modification on systems where concepts such as MTA, mail queue, and .forward files don't make sense. The author doesn't consider this a feature, but he doesn't quite apologize for it, either.

Ian Zimmerman <itz@buug.org>

procmail by Stephen R. van den Berg <srb@cuci.nl> is the granddaddy and original inspiration for this code, and remains the best general-purpose mail filter around, in this writer's opinion.
Mail::Audit by Simon Cozens <simon@brecon.co.uk> is a Perl module with a procmail-like interface, based on some earlier code by Tom Christiansen <tchrist@jhereg.perl.com>. Unfortunately, Mail::Audit's interface seems to suffer from the overuse of object-oriented style, while also restricting the possible ways of header matching.
Mail::Procmail by Johan Vromans <jvromans@squirrel.nl> reverts to a simple procedural interface, but in doing so flushes the baby out together with the water: it is no longer possible to modify the original message and continue running the filter on the modified message.