Ave Wrigley > HTTPD-Log-Filter-1.08 > exclude_robot.pl

Download:
HTTPD-Log-Filter-1.08.tar.gz

Annotate this POD

CPAN RT

Open  0
Report a bug
Source  

NAME ^

exclude_robot.pl - a simple filter script to filter robots out of logfiles

SYNOPSIS ^

    exclude_robot.pl
        -url <robot exclusions URL>
        [ -exclusions_file <exclusions file> ]
        <httpd log file>
    
    OR

    cat <httpd log file> | exclude_robot.pl -url <robot exclusions URL>

DESCRIPTION ^

This script filters HTTP log files to exclude entries that correspond to know webbots, spiders, and other undesirables. The script requires a URL as a command line option which should point to a text file containing a linebreak separated list of lowercase strings to match on for bots. This is based on the format used by ABC (http://www.abc.org.uk/exclusionss/exclude.html).

The script filters httpd logfile entries either from a filename specified on the command line, or from STDIN. It outputs filtered entries to STDOUT.

OPTIONS ^

-url <robot exclusions URL>

Specify the URL of file to grab which contains the list of agents to exclude. The option is REQUIRED.

-exclusions_file <exclusions file>

Specify a file to save excluded entries from the logfile. This option is OPTIONAL.

AUTHOR ^

Ave Wrigley <Ave.Wrigley@itn.co.uk>

COPYRIGHT ^

Copyright (c) 2001 Ave Wrigley. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.