The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Test::Regexp - Test your regular expressions

SYNOPSIS

 use Test::Regexp 'no_plan';

 match    subject      => "Foo",
          pattern      => qr /\w+/;

 match    subject      => "Foo bar",
          keep_pattern => qr /(?<first_word>\w+)\s+(\w+)/,
          captures     => [[first_word => 'Foo'], ['bar']];

 no_match subject      => "Baz",
          pattern      => qr /Quux/;

 $checker = Test::Regexp -> new -> init (
    keep_pattern => qr /(\w+)\s+\g{-1}/,
    name         => "Double word matcher",
 );

 $checker -> match    ("foo foo", ["foo"]);
 $checker -> no_match ("foo bar");

DESCRIPTION

This module is intended to test your regular expressions. Given a subject string and a regular expression (aka pattern), the module not only tests whether the regular expression complete matches the subject string, it performs a utf8::upgrade or utf8::downgrade on the subject string and performs the tests again, if necessary. Furthermore, given a pattern with capturing parenthesis, it checks whether all captures are present, and in the right order. Both named and unnamed captures are checked.

By default, the module exports two subroutines, match and no_match. The latter is actually a thin wrapper around match, calling it with match => 0.

"Complete matching"

A match is only considered to successfully match if the entire string is matched - that is, if $& matches the subject string. So:

  Subject    Pattern

  "aaabb"    qr /a+b+/     # Considered ok
  "aaabb"    qr /a+/       # Not considered ok

For efficiency reasons, when the matching is performed the pattern is actually anchored at the start. It's not anchored at the end as that would potentially influence the matching.

UTF8 matching

Certain regular expression constructs match differently depending on whether UTF8 matching is in effect or not. This is only relevant if the subject string has characters with code points between 128 and 255, and no characters above 255 -- in such a case, matching may be different depending on whether the subject string has the UTF8 flag on or not. Test::Regexp detects such a case, and will then run the tests twice; once with the subject string utf8::downgraded, and once with the subject string utf8::upgraded.

Number of tests

There's no fixed number of tests that is run. The number of tests depends on the number of captures, the number of different names of captures, and whether there is the need to up- or downgrade the subject string.

It is therefore recommended to use use Text::Regexp tests => 'no_plan';. In a later version, Test::Regexp will use a version of Test::Builder that allows for nested tests.

Details

The number of tests is as follows:

If no match is expected (no_match => 0, or no_match is used), only one test is performed.

Otherwise (we are expecting a match), if pattern is used, there will be three tests.

For keep_pattern, there will be four tests, plus one tests for each capture, an additional test for each named capture, and a test for each name used in the set of named captures. So, if there are N captures, there will be at least 4 + N tests, and at most 4 + 3 * N tests.

If both pattern and keep_pattern are used, the number of tests add up.

If Test::Regexp decides to upgrade or downgrade, the number of tests double.

use options

When using Test::Regexp, there are a few options you can give it.

tests => 'no_plan', tests => 123

The number of tests you are going to run. Since takes some work to figure out how many tests will be run, for now the recommendation is to use tests => 'no_plan'.

import => [methods]

By default, the subroutines match and no_match are exported. If you want to import a subset, use the import tag, and give it an arrayref with the names of the subroutines to import.

match

The subroutine match is the workhorse of the module. It takes a number of named arguments, most of them optional, and runs one or more tests. It returns 1 if all tests were run successfully, and 0 if one or more tests failed. The following options are available:

subject => STRING

The string against which the pattern is tested is passed to match using the subject option. It's an error to not pass in a subject.

pattern => PATTERN, keep_pattern => PATTERN

A pattern (aka regular expression) to test can be passed with one of pattern or keep_pattern. The former should be used if the pattern does not have any matching parenthesis; the latter if the pattern does have capturing parenthesis. If both pattern and keep_pattern are provided, the subject is tested against both. It's an error to not give either pattern or keep_pattern.

captures => [LIST]

If a regular expression is passed with keep_pattern you should pass in a list of captures using the captures option.

This list should contain all the captures, in order. For unnamed captures, this should just be the string matched by the capture; for a named capture, this should be a two element array, the first element being the name of the capture, the second element the capture. Named and unnamed captures may be mixed, and the same name for a capture may be repeated.

Example:

 match  subject      =>  "Eland Wapiti Caribou",
        keep_pattern =>  qr /(\w+)\s+(?<a>\w+)\s+(\w+)/,
        captures     =>  ["Eland", [a => "Wapiti"], "Caribou"];
name => NAME

The "name" of the test. It's being used in the test comment.

comment => NAME

An alternative for name. If both are present, comment is used.

utf8_upgrade => 0, utf8_downgrade => 0

As explained in "UTF8 matching", Test::Regexp detects whether a subject may provoke different matching depending on its UTF8 flag, and then it utf8::upgrades or utf8::downgrades the subject string and runs the test again. Setting utf8_upgrade to 0 prevents Test::Regexp from downgrading the subject string, while setting utf8_upgrade to 0 prevents Test::Regexp from upgrading the subject string.

match => BOOLEAN

By default, match assumes the pattern should match. But it also important to test which strings do not match a regular expression. This can be done by calling match with match => 0 as parameter. (Or by calling no_match instead of match). In this case, the test is a failure if the pattern completely matches the subject string. A captures argument is ignored.

reason => STRING

If the match is expected to fail (so, when match => 0 is passed, or if no_match is called), a reason may be provided with the reason option. The reason is then printed in the comment of the test.

test => STRING

If the match is expected to pass (when match is called, without match being false), and this option is passed, a message is printed indicating what this specific test is testing (the argument to test).

todo => STRING

If the todo parameter is used (with a defined value), the tests are assumed to be TODO tests. The argument is used as the TODO message.

full_text => BOOL

By default, long test messages are truncated; if a true value is passed, the message will not get truncated.

escape => INTEGER

Controls how non-ASCII and non-printables are displayed in generated test messages:

0

No characters are escape, everything is displayed as is.

1

Show newlines, linefeeds and tabs using their usual escape sequences (\n, \r, and \t).

2

Show any character outside of the printable ASCII characters as named escapes (\N{UNICODE NAME}), or a hex escape if the unicode name is not found (\x{XX}). This is the default if -CO is not in effect (${^UNICODE} is false).

Newlines, linefeeds and tabs are displayed as above.

3

Show any character outside of the printable ASCII characters as hext escapes (\x{XX}).

Newlines, linefeeds and tabs are displayed as above.

4

Show the non-printable ASCII characters as hex escapes (\x{XX}); any non-ASCII character is displayed as is. This is the default if -CO is in effect (${^UNICODE} is true).

Newlines, linefeeds and tabs are displayed as above.

no_keep_message => BOOL

If matching against a keeping pattern, a message (with -Keep) is added to the comment. Setting this parameter suppresses this message. Mostly useful for Regexp::Common510.

no_match

Similar to match, except that it tests whether a pattern does not match a string. Accepts the same arguments as match, except for match.

OO interface

Since one typically checks a pattern with multiple strings, and it can be tiresome to repeatedly call match or no_match with the same arguments, there's also an OO interface. Using a pattern, one constructs an object and can then repeatedly call the object to match a string.

To construct and initialize the object, call the following:

 my $checker = Test::Regexp -> new -> init (
    pattern      => qr  /PATTERN/,
    keep_pattern => qr /(PATTERN)/,
    ...
 );

init takes exactly the same arguments as match, with the exception of subject and captures. To perform a match, all match (or no_match) on the object. The first argument should be the subject the pattern should match against (see the subject argument of match discussed above). If there is a match against a capturing pattern, the second argument is a reference to an array with the matches (see the captures argument of match discussed above).

Both match and no_match can take additional (named) arguments, identical to the none-OO match and no_match routines.

RATIONALE

The reason Test::Regexp was created is to aid testing for the rewrite of Regexp::Common.

DEVELOPMENT

The current sources of this module are found on github, git://github.com/Abigail/Test-Regexp.git.

AUTHOR

Abigail mailto:test-regexp@abigail.be.

COPYRIGHT and LICENSE

Copyright (C) 2009 by Abigail

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

INSTALLATION

To install this module, run, after unpacking the tar-ball, the following commands:

   perl Makefile.PL
   make
   make test
   make install