YAPE::Regex - Yet Another Parser/Extractor for Regular Expressions
use YAPE::Regex; use strict; my $regex = qr/reg(ular\s+)?exp?(ression)?/i; my $parser = YAPE::Regex->new($regex); # here is the tokenizing part while (my $chunk = $parser->next) { # ... }
YAPE
The YAPE hierarchy of modules is an attempt at a unified means of parsing and extracting content. It attempts to maintain a generic interface, to promote simplicity and reusability. The API is powerful, yet simple. The modules do tokenization (which can be intercepted) and build trees, so that extraction of specific nodes is doable.
This module is yet another (?) parser and tree-builder for Perl regular expressions. It builds a tree out of a regex, but at the moment, the extent of the extraction tool for the tree is quite limited (see "Extracting Sections"). However, the tree can be useful to extension modules.
In addition to the base class, YAPE::Regex, there is the auxiliary class YAPE::Regex::Element (common to all YAPE base classes) that holds the individual nodes' classes. There is documentation for the node classes in that module's documentation.
YAPE::Regex
YAPE::Regex::Element
use YAPE::Regex;
use YAPE::Regex qw( MyExt::Mod );
If supplied no arguments, the module is loaded normally, and the node classes are given the proper inheritence (from YAPE::Regex::Element). If you supply a module (or list of modules), import will automatically include them (if needed) and set up their node classes with the proper inheritence -- that is, it will append YAPE::Regex to @MyExt::Mod::ISA, and YAPE::Regex::xxx to each node class's @ISA (where xxx is the name of the specific node class).
import
@MyExt::Mod::ISA
YAPE::Regex::xxx
@ISA
xxx
package MyExt::Mod; use YAPE::Regex 'MyExt::Mod'; # does the work of: # @MyExt::Mod::ISA = 'YAPE::Regex' # @MyExt::Mod::text::ISA = 'YAPE::Regex::text' # ...
my $p = YAPE::Regex->new($REx);
Creates a YAPE::Regex object, using the contents of $REx as a regular expression. The new method will attempt to convert $REx to a compiled regex (using qr//) if $REx isn't already one. If there is an error in the regex, this will fail, but the parser will pretend it was ok. It will then report the bad token when it gets to it, in the course of parsing.
$REx
new
qr//
my $text = $p->chunk($len);
Returns the next $len characters in the input string; $len defaults to 30 characters. This is useful for figuring out why a parsing error occurs.
$len
my $done = $p->done;
Returns true if the parser is done with the input string, and false otherwise.
my $errstr = $p->error;
Returns the parser error message.
my $backref = $p->extract;
Returns a code reference that returns the next back-reference in the regex. For more information on enhancements in upcoming versions of this module, check "Extracting Sections".
my $node = $p->display(...);
Returns a string representation of the entire content. It calls the parse method in case there is more data that has not yet been parsed. This calls the fullstring method on the root nodes. Check the YAPE::Regex::Element docs on the arguments to fullstring.
parse
fullstring
my $node = $p->next;
Returns the next token, or undef if there is no valid token. There will be an error message (accessible with the error method) if there was a problem in the parsing.
undef
error
my $node = $p->parse;
Calls next until all the data has been parsed.
next
my $node = $p->root;
Returns the root node of the tree structure.
my $state = $p->state;
Returns the current state of the parser. It is one of the following values: alt, anchor, any, backref, capture(N), Cchar, class, close, code, comment, cond(TYPE), ctrl, cut, done, error, flags, group, hex, later, lookahead(neg|pos), lookbehind(neg|pos), macro, named, oct, slash, text, and utf8hex.
alt
anchor
any
backref
capture(N)
Cchar
class
close
code
comment
cond(TYPE)
ctrl
cut
done
flags
group
hex
later
lookahead(neg|pos)
lookbehind(neg|pos)
macro
named
oct
slash
text
utf8hex
For capture(N), N will be the number the captured pattern represents.
For cond(TYPE), TYPE will either be a number representing the back-reference that the conditional depends on, or the string assert.
assert
For lookahead and lookbehind, one of neg and pos will be there, depending on the type of assertion.
lookahead
lookbehind
neg
pos
my $node = $p->top;
Synonymous to root.
root
While extraction of nodes is the goal of the YAPE modules, the author is at a loss for words as to what needs to be extracted from a regex. At the current time, all the extract method does is allow you access to the regex's set of back-references:
extract
my $extor = $parser->extract; while (my $backref = $extor->()) { # ... }
japhy is very open to suggestions as to the approach to node extraction (in how the API should look, in addition to what should be proffered). Preliminary ideas include extraction keywords like the output of -Dr (or the re module's debug option).
japhy
re
debug
YAPE::Regex::Explain 3.011
YAPE::Regex::Explain
Presents an explanation of a regular expression, node by node.
YAPE::Regex::Reverse (Not released)
YAPE::Regex::Reverse
Reverses the nodes of a regular expression.
This is a listing of things to add to future versions of this module.
Create a robust extract method
Open to suggestions.
Following is a list of known or reported bugs.
use charnames ':full'
To understand \N{...} properly, you must be using 5.6.0 or higher. However, the parser only knows how to resolve full names (those made using use charnames ':full'). There might be an option in the future to specify a class name.
\N{...}
The YAPE::Regex::Element documentation, for information on the node classes. Also, Text::Balanced, Damian Conway's excellent module, used for the matching of (?{ ... }) and (??{ ... }) blocks.
Text::Balanced
(?{ ... })
(??{ ... })
Jeff "japhy" Pinyan CPAN ID: PINYAN PINYAN@cpan.org
To install YAPE::Regex, copy and paste the appropriate command in to your terminal.
cpanm
cpanm YAPE::Regex
CPAN shell
perl -MCPAN -e shell install YAPE::Regex
For more information on module installation, please visit the detailed CPAN module installation guide.