Dmitry Karasik > Regexp-Log-DateRange-0.02 > Regexp::Log::DateRange

Download:
Regexp-Log-DateRange-0.02.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.02   Source  

NAME ^

Regexp::Log::DateRange - construct regexps for filtering log data by date range

SYNOPSIS ^

Code:

        use Regexp::Log::DateRange;
        my $rx = Regexp::Log::DateRange-> new('syslog', [qw(1 1 0 0)], [qw(3 18 13 59)]);
        print "Testing against $rx\n";
        $rx = qr/$rx/i;  # <-- note the 'i' qualifier
        for (
                'Feb  4 00:00:01',
                'May  4 00:00:01'
        ) {
                print "Date '$_' ",  (m/$rx/ ? 'matched' : 'not matched'), "\n";
        }

Result:

        Testing against (?:(?:jan|feb)\s+|mar\s+(?:(?:0?[1-9]|1[0-7])\s+|18\s+(?:0?\d|1[0-3])\:))
        Date 'Feb  4 00:00:01' matched
        Date 'May  4 00:00:01' not matched

DESCRIPTION ^

The module was written as a hack, for the task at hand, to scan a log file and account for the lines within a date range. The initial trivial implementation, for the log file conducted by syslog

  Feb  4 00:00:01 moof postfix/smtpd[1138]: connect from localhost[127.0.0.1]
  Feb  4 00:00:01 moof postfix/smtpd[1138]: BED3B70625: client=localhost[127.0.0.1]

is as simple as it gets, where the line filtering condition would be written as

   /^(\w+)\s+(\d+)\s+(\d\d):(\d\d)/ and $months{lc $1} < $some_month and $2 < 15

and so on and so on, - you get the idea. That was considered not fun enough, and instead this module was written to construct a regexp that would tell whether a date is within a particular date range - and to do it fast, too. In the example below it is explained how to construct something along the lines of

  (?:(?:jan|feb)\s+|mar\s+(?:(?:0?[1-9]|1[0-7])\s+|18\s+(?:0?\d|1[0-3])\:))

that matches a given date range within a single call.

USAGE ^

The module sees date range as two integer arrays, where each integer is a date order, such that

    [ 4, 1, 12, 00 ]

is 1st of April, 12:00 ( thus, the format allows constructing various range regexps, not necessarily date range regexps only). Two such date arrays and a template that defines the order and intermediate matching code, are enough to generate a regexp sufficient for arbitrary multi-order range matching.

First, we select the date range. Say, these will be January 1 and March 18, 13:59. The module doesn't do the actual date vs date array conversion, one has to do it by other means; here I'll simply code a magic date converter:

        my $date1 = [ qw(1 1 0 0) ]; # my_magic_date_converter( 'Jan 1');
        my $date2 = [ qw(3 18 13 59) ]; # my_magic_date_converter( 'Mar 18 13:59');

Second, we select a template describing how to the match log entries. The module currently contains the only template, 'syslog', that defines the date array item ranges and the regexp codes between these:

        syslog => [
                [ '\\s+', 1, 12, [ qw(. jan feb mar apr may jun jul aug sep oct nov dec)]],
                [ '\\s+', 1, 31, undef, '0?'],
                [ '\\:', 0, 23, undef, '0?' ],
                [ '\\:', 0, 59, undef, '0?' ],
        ],

which does basically mean that first entry defines months, so that the final regexp must match months and then match \s+, then days, in the range 1-31, then spaces again, then hours and minutes. The module doesn't provide the seconds entry, but it is trivial to construct a template with one (the date array must contain 5 elements then).

Finally, to construct a regexp ( all code together ):

        use Regexp::Log::DateRange;
        my $rx = Regexp::Log::DateRange-> new('syslog', [qw(1 1 0 0)], [qw(3 18 13 59)]);
        print "Testing against $rx\n";
        $rx = qr/$rx/i;  # <-- note the 'i' qualifier
        for (
                'Feb  4 00:00:01',
                'May  4 00:00:01'
        ) {
                print "Date '$_' ",  (m/$rx/ ? 'matched' : 'not matched'), "\n";
        }

And the result is

        Testing against (?:(?:jan|feb)\s+|mar\s+(?:(?:0?[1-9]|1[0-7])\s+|18\s+(?:0?\d|1[0-3])\:))
        Date 'Feb  4 00:00:01' matched
        Date 'May  4 00:00:01' not matched

If the first parameter of new is not a template array, the template is looked up in the list of the existing templates ( that way, the module can be easily extended for other log formats ). The result of new is a string, that is to be interpolated by qr//i - note the i, the months names in log files come in all cases.

The resulting regexp cannot be used to match the date correctness; as can be seen in the example, a line beginning with May is discarded very quickly and is not checked in full. One can rather think of these regexps as two tests, telling if both the date is correct AND the date is withing the given range.

EXTENSIBILITY ^

The code should be extensible enough for defining other kinds of log formats, by defining the match template. It is not possible though to extend it for as to catch the date elements in $1, $2, etc.

COPYRIGHT ^

Copyright (c) 2005 catpipe Systems ApS. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR ^

Dmitry Karasik <dk@catpipe.net>

syntax highlighting: