The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Apache::Traffic - Tracks hits and bytes transferred on a per-user basis

SYNOPSIS

  # Place this in your Apache's httpd.conf file
  PerlLogHandler Apache::Traffic

DESCRIPTION

This module tracks the total number of hits and bytes transferred per day by the Apache web server, on a per-user basis. This allows for real-time statistics without having to parse the log files.

After installation, add this to your Apache's httpd.conf file and restart the server:

        PerlLogHandler  Apache::Traffic

The statistics are then available through the 'traffic' script, which is included in this distribution. See the section VIEWING STATISTICS for more details.

PREREQUISITES

You need to have compiled mod_perl with the LogHandler hook in order to use this module. Additionally, the following modules are required:

        o IPC::Shareable
        o IPC::SysV
        o DB_File
        o Date::Parse 

Your OS must also support SysV IPC (shared memory and semaphores). If this is not the case, this module will be useless to you.

INSTALLATION

To install this module, move into the directory where this file is located and type the following:

        perl Makefile.PL
        make
        make test
        make install

This will install the module into the Perl library directory.

Once installed, you will need to modify your web server's configuration file so it knows to use Apache::Traffic during the logging phase:

        PerlLogHandler Apache::Traffic

Restart your web server.

As of this writing, there is a problem with IPC::Shareable which will cause segmentation faults in httpd processes if Apache::Traffic is run long enough (at least this is the case under Linux). This distribution contains a patch named 'share.patch', which will fix the problem.

If Apache::Traffic does not appear to work correctly (look in your server's error_log for problems), make sure the semaphore and shared memory segments are not already allocated for another purpose. If this is the case, you can change the constants SHMKEY, SEMKEY, and DBPATH at the top of the Apache::Traffic module, and reinstall.

HOW IT WORKS

Each time a request is served, the Apache::Traffic log handler is called which increments the byte and hit totals for the owner of the resource.

The owner of the resource is determined in the following way:

  o If the Perl variable Owner has been set for the directory, its
    value is used.  For example:

        <Directory /home/root/www/mark>
          PerlSetVar Owner mark
        </Directory>

    This would declare user mark as the owner of everything under
    the specified directory.  The value can be either the username
    or UID of the user.

    This value can also be a fake user (i.e. a username which is
    not present in the passwd file).  In this case, the username is
    stored (rather than the UID).

  o If the request is to a virtual host, the owner of the document
    root is used.

  o If neither of the above methods work, the owner of the file
    is used.

The hit and byte total information is stored in shared memory to minimize processing. On the first request of each day, all previous data in shared memory is automatically moved to permanent storage. This means that no more than one day's worth of information is ever stored in shared memory, and prevents performance degradation as data accumulates. This separation of data is transparent from the end-user perspective.

If you would rather not have the data moved into the dbm file, you can set USE_DBM to 0 at the top of the Traffic.pm module and reinstall.

Shared memory segments are not preserved through reboots. If you reboot your machine multiple times a day, Apache::Traffic will be of questionable value to you. I run Linux, so of course, I only reboot when I've upgraded the OS. ;-) This area may be improved in the future (at least for orderly shutdowns).

VIEWING STATISTICS

A script named 'traffic' is included in this distribution, which allows you to view the totals for a given user. Note that this script will not run properly until Apache::Traffic has recorded at least one page request.

The basic syntax for the script is:

        traffic [options] [username]

If username is not specified, the effective UID of the person running the script is used. By default, only data for the current day is displayed.

The following options are supported:

  -start=starting_date

        Specifies the starting date that you wish to see data for.  The
        date specifications can take any format supported by the 
        Date::Parse module.  If -end is not specified, all data between
        -start and the current day is displayed.

  -end=ending_date

        Specifies the ending date that you wish to see data for.  

  -days=num_days

        Specifies the number of days you want to see information for
        relative to the value of -start (or the current day if -start
        is not specified).  The value can be either positive
        or negative.

  -user=username

        Specifies the user you want to see data for.  Multiple
        -user specifications are allowed.  The users can also be
        specified as non-option arguments.  Both UIDs and usernames
        are allowed.

  -all

        Displays all data present within the given time period.
  
  -reverse

        If present, the information is sorted in descending order based
        on date.  

  -units=unit

        Specifies the unit to display transfer totals in.  Acceptable 
        values are 'Bytes', "Kilobytes', 'Megabytes', or 'Gigabytes'.
        Only the first character of the unit need be specified.  The
        default is Bytes.

  -summary

        If -summary is present, aggregate totals for the period 
        being viewed are displayed, rather than daily totals.

  -n

        If the -n option is present, the report displays UIDs rather than
        converting them to usernames.  In the case of a "fake" user,
        the username will still be displayed (which is a way to tell
        is a user is fake or not).

  -remove

        If the -remove option is present, all data within the specified  
        time period is permanently removed.  Only root is allowed to
        perform this operation (see the SECURITY NOTES section though).  
        The operation must be confirmed prior to being carried out.

ACCESSING INFORMATION DIRECTLY

If the supplied traffic script is not sufficient for your needs, you may access the raw data directly. The following functions are available for import into your scripts.

  fetch([START], [END], [WANTUID], [ALL], [USER LIST])

        This function retrieves all data between START and END times,
        inclusive, for the users specified in USER LIST. Both START and
        END should be UTC timestamps.  The function automatically
        normalizes the timestamps to be on day boundaries.

        If WANTUID is true, usernames are not looked up.  If ALL is true,
        data for all users is returned and USER_LIST is ignored.  
        If ALL is true, data for all users is returned and
        USER_LIST is ignored. 

        On success, the function returns a complex hash reference, which 
        contains the requested data:

                use Apache::Traffic qw( fetch remove error );

                $ref = fetch(time, time, 0, 0, 'maurice');
                foreach $day (%$ref) {
                  foreach $user (%{ $ref->{$day} }) {
                    print scalar gmtime $day, " $user\n";
                    print "  BYTES: $ref->{$day}{$user}{bytes}\n";
                    print "   HITS: $ref->{$day}{$user}{hits}\n\n";
                  }
                }

        Note that the timestamps are stored internally in GM time, 
        although START and END should be in local time.  We do this
        so we don't have to worry about daylight savings.

        The function returns undef on error, in which case you can call
        the error() function to determine what went wrong.

  remove([START], [END])

        This function removes all data between the START and END times,
        inclusive.

        The fuction returns true on success and undef on error, in which
        case you can call the error() function to determine what went
        wrong.

  error()

        Returns a string describing the last error condition encountered.

SECURITY NOTES

By default, the shared memory segments, semaphores, and DBM file are created with permissions of 0644. However, these resources must be owned by whatever user the server runs as (normally user 'nobody'). This means that your users could create CGI scripts to play with the data. For this reason, the information maintained by Apache::Traffic should not be relied upon for auditing purposes, and is intended mainly for use in friendly environments.

AUTHOR

Copyright (C) 1997, Maurice Aubrey <maurice@hevanet.com>. All rights reserved.

This module is free software; you may redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

perl(1), mod_perl(3)