加藤 和良 > Regexp-Optimizer > Regexp::Optimizer

Download:
Regexp-Optimizer-0.16_001.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Module Version: 0.16_001   Source   Latest Release: Regexp-Optimizer-0.23

NAME ^

Regexp::Optimizer - optimizes regular expressions

SYNOPSIS ^

  use Regexp::Optimizer;
  my $o  = Regexp::Optimizer->new;
  my $re = $o->optimize(qr/foobar|fooxar|foozap/);
  # $re is now qr/foo(?:[bx]ar|zap)/

ABSTRACT ^

This module does, ahem, attempts to, optimize regular expressions.

INSTALLATION ^

To install this module type the following:

   perl Makefile.PL
   make
   make test
   make install

DESCRIPTION ^

Here is a quote from perltodo.

Factoring out common suffices/prefices in regexps (trie optimization)

Currently, the user has to optimize "foo|far" and "foo|goo" into "f(?:oo|ar)" and "[fg]oo" by hand; this could be done automatically.

This module implements just that.

EXPORT

Since this is an OO module there is no symbol exported.

METHODS ^

This module is implemented as a subclass of Regexp::List. For methods not listed here, see Regexp::List.

$o = Regexp::Optimizer->new;
$o->set(key => value, ...)

Just the same us Regexp::List except for the attribute below;

unexpand

When set to one, $o->optimize() tries to $o->expand before actually starting the operation.

  # cases you need to set expand => 1
  $o->set(expand => 1)->optimize(qr/
                                   foobar|
                                   fooxar|
                                   foozar
                                   /x);
$re = $o->optimize(regexp);

Does the job. Note that unlike ->list2re() in Regexp::List, the argument is the regular expression itself. What it basically does is to find groups will alterations and replace it with the result of $o->list2re.

$re = $o->list2re(list of words ...)

Same as list2re() in Regexp::List in terms of functionality but how it tokenize "atoms" is different since the arguments can be regular expressions, not just strings. Here is a brief example.

  my @expr = qw/foobar fooba+/;
  Regexp::List->new->list2re(@expr) eq qr/fooba[\+r]/;
  Regexp::Optimizer->new->list2re(@expr) eq qr/foob(?:a+ar)/;

CAVEATS ^

This module is still experimental. Do not assume that the result is the same as the unoptimized version.

BUGS ^

PRACTICALITY ^

Though this module is still experimental, It is still good enough even for such deeply nested regexes as the followng.

  # See 3.2.2 of  http://www.ietf.org/rfc/rfc2616.txt
  # BNF faithfully turned into a regex
  http://(?:(?:(?:(?:(?:[a-z]|[A-Z])|[0-9])|(?:(?:[a-z]|[A-Z])|[0-9])(?:(?:(?:[a-z]|[A-Z])|[0-9])|-)*(?:(?:[a-z]|[A-Z])|[0-9]))\.)*(?:(?:[a-z]|[A-Z])|(?:[a-z]|[A-Z])(?:(?:(?:[a-z]|[A-Z])|[0-9])|-)*(?:(?:[a-z]|[A-Z])|[0-9]))\.?|[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)(?::[0-9]*)?(?:/(?:(?:(?:(?:[a-z]|[A-Z])|[0-9])|[\-\_\.\!\~\*\'\(\)])|%(?:[0-9]|[A-Fa-f])(?:[0-9]|[A-Fa-f])|[:@&=+$,])*(?:;(?:(?:(?:(?:[a-z]|[A-Z])|[0-9])|[\-\_\.\!\~\*\'\(\)])|%(?:[0-9]|[A-Fa-f])(?:[0-9]|[A-Fa-f])|[:@&=+$,])*)*(?:/(?:(?:(?:(?:[a-z]|[A-Z])|[0-9])|[\-\_\.\!\~\*\'\(\)])|%(?:[0-9]|[A-Fa-f])(?:[0-9]|[A-Fa-f])|[:@&=+$,])*(?:;(?:(?:(?:(?:[a-z]|[A-Z])|[0-9])|[\-\_\.\!\~\*\'\(\)])|%(?:[0-9]|[A-Fa-f])(?:[0-9]|[A-Fa-f])|[:@&=+$,])*)*)*(?:\\?(?:[;/?:@&=+$,]|(?:(?:(?:[a-z]|[A-Z])|[0-9])|[\-\_\.\!\~\*\'\(\)])|%(?:[0-9]|[A-Fa-f])(?:[0-9]|[A-Fa-f]))*)?)?

  # and optimized
  http://(?::?[a-zA-Z0-9](?:[a-zA-Z0-9\-]*[a-zA-Z0-9])?\.[a-zA-Z]*(?:[a-zA-Z0-9\-]*[a-zA-Z0-9])?\.?|[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)(?::[0-9]*)?(?:/(?:(?:(?:[a-zA-Z0-9\-\_\.\!\~\*\'\x28\x29]|%[0-9A-Fa-f][0-9A-Fa-f])|[:@&=+$,]))*(?:;(?:(?:(?:[a-zA-Z0-9\-\_\.\!\~\*\'\x28\x29]|%[0-9A-Fa-f][0-9A-Fa-f])|[:@&=+$,]))*)*(?:/(?:(?:(?:[a-zA-Z0-9\-\_\.\!\~\*\'\x28\x29]|%[0-9A-Fa-f][0-9A-Fa-f])|[:@&=+$,]))*(?:;(?:(?:(?:[a-zA-Z0-9\-\_\.\!\~\*\'\x28\x29]|%[0-9A-Fa-f][0-9A-Fa-f])|[:@&=+$,]))*)*)*(?:\\?(?:(?:[;/?:@&=+$,a-zA-Z0-9\-\_\.\!\~\*\'\x28\x29]|%[0-9A-Fa-f][0-9A-Fa-f]))*)?)?

By carefully examine both you can find that character classes are properly aggregated.

SEE ALSO ^

Regexp::List -- upon which this module is based

eg/ directory in this package contains example scripts.

Perl standard documents
 L<perltodo>, L<perlre>
CPAN Modules

Regexp::Presuf, Text::Trie

Books

Mastering Regular Expressions http://www.oreilly.com/catalog/regex2/

AUTHOR ^

Dan Kogai <dankogai@dan.co.jp>, Kato Kazuyoshi <kato.kazuyoshi@gmail.com>

COPYRIGHT AND LICENSE ^

Copyright 2003 by Dan Kogai

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: