The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

=head2 NAME

How To Use - SpamCannibal

=head2 PREFACE

Today's email systems are called upon to examine and classify 
incoming mail in ways it was never designed to do. DNSBL servers and 
sophisticated filter help immensely in this task by quickly 
identifying viruses, spam and spam sources, but there is no good way 
to stop this traffic from consuming bandwidth. The tactic from 
yesteryear of bouncing messages back to the envelope sender only 
makes the matter worse as ALL spam and virus mail comes with bogus 
headers. This practice triples or quadruples the bandwidth consumed 
by the spam. First the inbound transit, second the bounce to the 
innocent envelope domain owner, third the return bounce from the 
mailer daemon for the unknown envelope user and a potential fourth refusal from 
those site equipped with a double bounce filter. Every time a piece 
of spam is received, even from a known source, this process is 
repeated and there is no burden placed on the sender or incentive for 
them to stop.

Before discussing how SpamCannibal addresses this problem, let us 
consider the path that a message takes as it enters a well designed 
mail system.

=over 4

=item 1. connection

A connection is made to the host TCP/IP port 25 and is handed off 
to the Mail Transfer Agent or front-end-filter by the operating system.

=item 2. access control

The MTA examines the source of the message and checks against 
remote DNSBL's and its access list to see if the source is in its 
reject list. If rejected, the message is usually returned to the 
envelope sender with and error code.

=item 3. content filtering

 The message is filtered for spam content and either marked for 
special delivery disposition or bounced to the envelope sender as in 
step 2.

=back

While these steps do a reasonable job of reducing the unwanted mail 
delivered to the end user, it does nothing to reduce or eliminate the
bandwidth consumed by the ever increasing load of spam and virus mail, nor
does it impose any penalty on the feckles sender.

=head2 SpamCannibal, the missing piece

SpamCannibal provides the missing element in email system design. It 
provides the piece needed to reduce and eliminate unwanted spam 
traffic. SpamCannibal does this in a surprisingly simple way in a 
multi step process -- since we will reference the three steps that 
the MTA takes to receive mail, the SpamCannibal steps will be labeled 
a), b), c), etc..... 

With a SpamCannibal enhanced mail system, an incoming connection to 
TCP/IP port 25 goes through these steps.

=over 4

=item a. access control

The incoming host IP address is checked against a local database 
of banned hosts. If the IP address is acceptable OR UNKNOWN, it is 
logged into the archive database and the MTA is connected for step 1) 
of its process.

Let's assume for the sake of discussion that the UNKNOWN host 
delivered a spam load for which the MTA will complete steps 2) and 3) 
and provide some subsequent disposition.

=item b. c. d. skipped for normal messages

This connection is passed through to the MTA

=item e. automated spam source identification

Some few minutes later, a cron script checks all of the collected 
archive IP addresses against the same DNSBL list used by the MTA. 
Addresses for which "A" records are returned from the DNSBL's are 
added to the database of banned hosts to be tarpitted. If you wish to 
be polite and impose a minimum cost on the spam sender, SpamCannibal 
can be configured to simply ignore the incoming connection request as 
if  port 25 had no service.

The spam source has now been identified. Let us repeat the steps for 
SpamCannibal.

item a. (again) access control / tarpit action

The incoming host IP address is checked against a local database 
of banned host. The IP address is found to match an entry in the 
database. 

=item b. tarpit response

SpamCannibal ACKnowledges the connecting hosts SYN packet with a 
small window size then drops the packet.

=item c. tarpit acknowledgement

The connecting host responds with its own ACK and may attempt to 
send data using the reduced window size or simply ask for a larger 
window. Either way it will take some time before the connection is 
terminated.

=item d. persistent tarpit complete

The connecting host sends data. SpamCannibal ACKnowledges the data 
receipt and further reduces the transmission window size. The remote 
host now will hang on indefinitely trying to send the balance of its 
payload.

=item e. never reached

The local host never sees the banned connection. What little traffic remains
is handled entirely by the tarpit daemon.

=back

All of the steps that the SpamCannibal tarpit takes are stateless. 
There is no forked child, suspended job, or memory storage. Each 
incoming connection it treated anew based only on the information in 
the inbound packet. What SpamCannibal accomplished is a threefold 
reduction in the traffic cause by spam and virus payloads because 
they NEVER LEAVE THE TRANSMITTING HOST. This multiplies itself in 
reduction in resources consumed on the local mail host since it does 
not have to process the payload through the MTA, interrogate DNSBL's, 
run filters or waste human time emptying overfilled email boxes.

The flip side of this is not so pleasant. The sending mail host has a loaded
task waiting for a response from its TCP/IP stack. The TCP/IP stack has full
buffers that have not been transmitted and the timeout mechanism is reset
each time it attempts to send data. Every additional thread caught by
SpamCannibal requires another task and additional resources on the TCP/IP
stack. This could easily stall the sending process on a host that distributes 
UBE, UCE or virus mail to a large number of sites where SpamCannibal has
been deployed.

=head2 USING SPAMCANNIBAL WITH YOUR MAIL SYSTEM

SpamCannibal has four runtime elements.

=over 4

=item 1. Front end "dbtarpit" daemon.

This daemon interfaces directly with Linux's "iptables" and receives every packet destined 
for port 25 before it is passed to the MTA. As far as human operators 
are concerned, this it the most passive looking of the operations 
since there is no external interface.

=item 2. The sc_BLcheck script

This script runs periodically to check the accumulated 
(logged) IP addresses that connected to port 25 against your 
preferred list of DNSBL's. This should be the same set of DNSBL's that are
used by your MTA. IP addresses with returned "A" records are 
added to the "tarpit" database for subsequent denial of access.

=item 3. Inbox robot

Spam that escapes DNSBL detection can be emailed to 
SpamCannibal's secure mail robot, B<sc_mailfilter>,  to process the 
headers, extract the originating MTA IP address, and add that address 
to the tarpit database.

=item 4. Web administration tools

SpamCannibal's secure web administration tools allows the system 
administrator to manually add spam hosts through a simple cut and 
paste operation or to manually add or delete hosts from the database.

In addition to these tools, there's also a nifty statistics display 
that is borrowed from the LaBrea::Tarpit perl module. It provides a 
realtime snapshot of the current and recent spam host activity on the 
mail host.

=item 5. Optional multi_dnsbl daemon

B<multi_dnsbl> is a DNS emulator daemon that increases the efficacy of DNSBL
look-ups in a mail system. B<multi_dnsbl> may be used as a stand-alone DNSBL
or as a plug-in for a standard BIND 9 installation. 
B<multi_dnsbl> shares a common configuration file format with the
Mail::SpamCannibal sc_BLcheck.pl script so that DNSBL's can be maintained in
a common configuration file for an entire mail installation.

It is recommended that SpamCannibal installations utilize B<multi_dnsbl> for
there MTA's DNSBL lookups as this minimizes network traffic to the DNSBL's
and optimizes the order in which they are interrogated.

=back

=head2 TYPICAL INSTALLATIONS

The SpamCannibal site.

=head3 System 1

A standalone system incorporating an MTA, DNS daemon, web server, and 
SpamCannibal installation. This system runs three of SpamCannibal 
daemons.

=over 4

=item 1. dbtarpit

Denies access to banned hosts and collects incoming 
connection IP addresses.

=item 2. dnsbls

Provides blacklist DNS service on a internally accessible 
port from the SpamCannibal databases. The primary DNS server (bind-9.xx) is 
slaved to dnsbls to provide external DNSBL service.

=item 3. bdbaccess

Provides privileged access to the SpamCannibal web 
pages running on the local web server.

=back

In addition, the sc_lbdaemon (LaBrea) data collection daemon runs on the 
localhost and provides statistics for the local web server pages.

=head3 System 2

A standalone system incorporating an MTA and SpamCannibal 
installation. This system runs two SpamCannibal daemons.

=over 4

=item 1. dbtarpit

Denies access to banned hosts and collects incoming 
connection IP addresses in the same manner as System 1.

=item 2. bdbaccess

Provides privileged remote access to the SpamCannibal 
web pages running on a remote host (actually in another city with a 
different network provider).

=back

System 2 uses System 1's DNSBL for to check its IP archive database. 
System 2 is the secondary mail host for System 1, but bears roughly twice as 
much spam traffic based on traffic analysis.

In addition, System 2 runs the sc_lbdaemon and 
provides remote statistics access for a web process running on 
another host.