The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

String::MatchInterpolate - named regexp capture and interpolation from the same template.

SYNOPSIS

 use String::MatchInterpolate;

 my $smi = String::MatchInterpolate->new( 'My name is ${NAME/\w+/}' );

 my $vars = $smi->match( "My name is Bob" );
 my $name = $vars->{NAME};

 print $smi->interpolate( { NAME => "Jim" } ) . "\n";

DESCRIPTION

This module provides an object class which represents a string matching and interpolation pattern. It contains named-variable placeholders which include a regexp pattern to match them on. An instance of this class represents a single pattern, which can be matched against or interpolated into.

Objects in this class are not modified once constructed; they do not store any runtime state other than data derived arguments passed to the constructor.

Template Format

The template consists of a string with named variable placeholders embedded in it. It looks similar to a perl or shell string with interpolation:

 A string here with ${NAME/pattern/} interpolations

The embedded variable is delmited by perl-style ${ } braces, and contains a name and a pattern. The pattern is a normal perl regexp fragment that will be used by the match() method. This regexp should not contain any capture brackets ( ) as these will confuse the parsing logic. If the variable is not named, it will be assigned a name based on its position, starting from 1 (i.e. similar to regexp capture buffers). If a variable does not provide a matching pattern but the constructor was given a default with the default_re option, this will be used instead.

Outside of the embedded variables, the string is interpreted literally; i.e. not as a regexp pattern. A backslash \ may be used to escape the following character, allowing literal backslashes or dollar signs to be used.

The intended use for this object class is that the template strings would come from a configuration file, or some other source of "trusted" input. In the current implementation, there is nothing to stop a carefully-crafted string from containing arbitrary perl code, which would be executed every time the match() or interpolate() methods are called. (See "SECURITY" section). This fact may be changed in a later version.

Suffices

By default, the beginning and end of the string match are both anchored. If the allow_suffix option is passed to the constructor, then the end of the string is not anchored, and instead, any suffix found by the match() method will be returned in a hash key called _suffix. This may be useful, for example, when matching directory names, URLs, or other cases of strings with unconstrained suffices. The interpolate() method will not recognise this hash key; instead just use normal string concatenation on the result.

 my $userhomematch = String::MatchInterpolate->new(
    '/home/${USER/\w+/}/',
    allow_suffix => 1
 );

 my $vars = $userhomematch->match( "/home/fred/public_html" );
 print "Need to fetch file $vars->{_suffix} from $vars->{USER}\n";

CONSTRUCTOR

$smi = String::MatchInterpolate->new( $template, %opts )

Constructs a new String::MatchInterpolate object that represents the given template and returns it.

$template

A string containing the template in the format given above

%opts

A hash containing extra options. The following options are recognised:

allow_suffix => BOOL

A boolean flag. If true, then the end of the string will not be anchored, and instead, an extra suffix will be allowed to follow the matched portion. It will be returned as _suffix by the match() method.

default_re => Regexp or STRING

A precompiled Regexp or string defining a regexp to use if a variable does not provide a pattern of its own.

delimiters => ARRAY of [Regexp or STRING]

An array containing two precompliled Regexps or strings, giving the variable openning and closing delimiters. These default to qr/\$\{/ and qr/\}/ respectively, but by passing other values, other styles of template string may be parsed.

 delimiters => [ qr/\{/, qr/\}/ ]   # To match {name/pattern/}

METHODS

@values = $smi->match( $str )

$vars = $smi->match( $str )

Attempts to match the given string against the template. In list context it returns a list of the captured variables, or an empty list if the match fails. In scalar context, it returns a HASH reference containing all the captured variables, or undef if the match fails.

$str = $smi->interpolate( @values )

$str = $smi->interpolate( \%vars )

Interpolates the given variable values into the template and returns the generated string. The values may either be given as a list of strings, or in a single HASH reference containing named string values.

@vars = $smi->vars()

Returns the list of variable names defined / used by the template, in the order in which they appear.

BENCHMARKS

The template is compiled into a pair of strings containing perl code, which implement the matching and interpolation operations using normal perl regexps and string contatenation. These strings are then eval()ed into CODE references which the object stores. This makes it faster than a simple regexp that operates over the template string each time a match or interpolation needs to be performed. The following output compares the speed of String::MatchInterpolate against both direct hard-coded perl, and simple regexp operations.

 Comparing 'interpolate':
 
            Rate   s///  S::MI native
 s///    81938/s     --   -44%   -90%
 S::MI  145232/s    77%     --   -82%
 native 806800/s   885%   456%     --
 
 Comparing 'match':
 
            Rate    m//  S::MI native
 m//     35354/s     --   -46%   -73%
 S::MI   65749/s    86%     --   -50%
 native 131885/s   273%   101%     --

(This was produced by the benchmark.pl file in the module's distribution.)

SECURITY CONSIDERATIONS

Because of the way the optimised match and interpolate functions are generated, it is possible to inject arbitrary perl code via the template given to the constructor. As such, this object should not be used when the source of that template is considered untrusted.

Neither the match() nor interpolate() methods suffer this problem; any input into these is safe from exploit in this way.

SEE ALSO

The following may be used to provide just interpolate()-style operations:

The following may be used to provide just match()-style operations:

  • Regexp::NamedCaptures - Saves capture results to your own variables

  • perlre(1) - named capture buffers in perl 5.10 (the (?<NAME>pattern) format)

AUTHOR

Paul Evans <leonerd@leonerd.org.uk>