The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Apache::Log::Parser - Parser for Apache Log (common, combined, and any other custom styles by LogFormat).

SYNOPSIS

  my $parser = Apache::Log::Parser->new( fast => 1 );

  my $log = $parser->parse($logline);
  $log->{rhost}; #=> remote host
  $log->{agent}; #=> user agent

DESCRIPTION

Apache::Log::Parser is a parser module for Apache logs, accepts 'common', 'combined', and any other custom style. It works relatively fast, and process quoted double-quotation properly.

Once instanciate a parser, it can parse all of types specified with one method 'parse'.

USAGE

This module requires a option 'fast' or 'strict' with instanciate.

'fast' parser works relatively fast. It can process only 'common', 'combined' and custom styles with compatibility with 'common', and cannot work with backslash-quoted double-quotes in fields.

  # Default, for both of 'combined' and 'common'
  my $parser = Apache::Log::Parser->new( fast => 1 );
  
  my $log1 = $parser->parse(<<COMBINED);
  192.168.0.1 - - [07/Feb/2011:10:59:59 +0900] "GET /path/to/file.html HTTP/1.1" 200 9891 "-" "DoCoMo/2.0 P03B(c500;TB;W24H16)"
  COMBINED
  
  # $log1->{rhost}, $log1->{date}, $log1->{path}, $log1->{referer}, $log1->{agent}, ...
  
  my $log2 = $parser->parse(<<COMMON); # parsed as 'common'
  192.168.0.1 - - [07/Feb/2011:10:59:59 +0900] "GET /path/to/file.html HTTP/1.1" 200 9891
  COMMON
  
  # For custom style(additional fields after 'common'), 'combined' and common
  # custom style: LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%v\" \"%{cookie}n\" %D"
  my $c_parser = Apache::Log::Parser->new( fast => [[qw(referer agent vhost usertrack request_duration)], 'combined', 'common'] );
  
  my $log3 = $c_parser->parse(<<CUSTOM);
  192.168.0.1 - - [07/Feb/2011:10:59:59 +0900] "GET /index.html HTTP/1.1" 200 257 "http://example.com/referrer" "Any User-Agent" "example.com" "192.168.0.1201102091208001" 901
  CUSTOM
  
  # $log3->{agent}, $log3->{vhost}, $log3->{usertrack}, ...

'strict' parser works relatively slow. It can process any style format logs, with specification about separator, and checker for perfection. It can also process backslash-quoted double-quotes properly.

  # 'strict' parser is available for log formats without compatibility for 'common', like 'vhost_common' ("%v %h %l %u %t \"%r\" %>s %b")
  my @customized_fields = qw( rhost logname user datetime request status bytes referer agent vhost usertrack request_duration );
  my $strict_parser = Apache::Log::Parser->new( strict => [
      ["\t", \@customized_fields, sub{my $x=shift;defined($x->{vhost}) and defined($x->{usertrack}) }], # TABs as separator
      [" ", \@customized_fields, sub{my $x=shift;defined($x->{vhost}) and defined($x->{usertrack}) }],
      'combined',
      'common',
      'vhost_common',
  ]);
  
  my $log4 = $strict_parser->parse(<<CUSTOM);
  192.168.0.1 - - [07/Feb/2011:10:59:59 +0900] "GET /index.html HTTP/1.1" 200 257 "http://example.com/referrer" "Any \"Quoted\" User-Agent" "example.com" "192.168.0.1201102091208001" 901
  CUSTOM
  
  $log4->{agent} #=> 'Any "Quoted" User-Agent'
  
  my $log5 = $strict_parser->parse(<<VHOST);
  example.com 192.168.0.1 - - [07/Feb/2011:10:59:59 +0900] "GET /index.html HTTP/1.1" 200 257
  VHOST

LICENSE

This software is licensed under the same terms as Perl itself.

AUTHOR

TAGOMORI Satoshi <tagomoris at gmail.com>

SEE ALSO

http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#formats