package Text::RewriteRules;
use Data::Dumper;
use Filter::Simple;
use warnings;
use strict;
use 5.010000; # 5.10.0
=encoding UTF-8
=head1 NAME
Text::RewriteRules - A system to rewrite text using regexp-based rules
=cut
our $VERSION = '0.25';
=head1 SYNOPSIS
use Text::RewriteRules;
RULES email
\.==> DOT
@==> AT
ENDRULES
print email("ambs@cpan.org") # prints ambs AT cpan DOT org
RULES/m inc
(\d+)=e=> $1+1
ENDRULES
print inc("I saw 11 cats and 23 dogs") # prints I saw 12 cats and 24 dogs
=head1 ABSTRACT
This module uses a simplified syntax for regexp-based rules for
rewriting text. You define a set of rules, and the system applies them
until no more rule can be applied.
Two variants are provided:
=over 4
=item 1
traditional rewrite (RULES function):
while it is possible do substitute
| apply first substitution rule
=item 2
cursor based rewrite (RULES/m function):
add a cursor to the beginning of the string
while not reach end of string
| apply substitute just after cursor and advance cursor
| or advance cursor if no rule can be applied
=back
=head1 DESCRIPTION
A lot of computer science problems can be solved using rewriting
rules.
Rewriting rules consist of mainly two parts: a regexp (LHS: Left Hand
Side) that is matched with the text, and the string to use to
substitute the content matched with the regexp (RHS: Right Hand Side).
Now, why don't use a simple substitute? Because we want to define a
set of rules and match them again and again, until no more regexp of
the LHS matches.
A point of discussion is the syntax to define this system. A brief
discussion shown that some users would prefer a function to receive an
hash with the rules, some other, prefer some syntax sugar.
The approach used is the last: we use C<Filter::Simple> such that we
can add a specific non-perl syntax inside the Perl script. This
improves legibility of big rewriting rules systems.
This documentation is divided in two parts: first we will see the
reference of the module. Kind of, what it does, with a brief
explanation. Follows a tutorial which will be growing through time and
releases.
=head1 SYNTAX REFERENCE
Note: most of the examples are very stupid, but that is the easiest
way to explain the basic syntax.
The basic syntax for the rewrite rules is a block, started by the
keyword C<RULES> and ended by the C<ENDRULES>. Everything between
them is handled by the module and interpreted as rules or comments.
The C<RULES> keyword can handle a set of flags (we will see that
later), and requires a name for the rule-set. This name will be used
to define a function for that rewriting system.
RULES functioname
...
ENDRULES
The function is defined in the main namespace where the C<RULES>
block appears.
In this block, each line can be a comment (Perl style), an empty line
or a rule.
=head2 Basic Rule
A basic rule is a simple substitution:
RULES foobar
foo==>bar
ENDRULES
The arrow C<==E<gt>> is used as delimiter. At its left is the regexp
to match, at the right side, the substitution. So, the previous block
defines a C<foobar> function that substitutes all C<foo> by
C<bar>.
Although this can seems similar to a global substitution, it is
not. With a global substitution you can't do an endless loop. With
this module it is very simple. I know you will get the idea.
You can use the syntax of Perl both on the left and right hand side of
the rule, including C<$1...>.
=head2 Execution Rule
If the Perl substitution supports execution, why not to support it,
also? So, you got the idea. Here is an example:
RULES foo
(\d+)b=e=>'b' x $1
(\d+)a=eval=>'a' x ($1*2)
ENDRULES
So, for any number followed by a C<b>, we replace by that number of
C<b's>. For each number followed by an C<a>, we replace them by twice
that number of C<a's>.
Also, you mean evaluation using an C<e> or C<eval> inside the arrow. I
should remind you can mix all these rules together in the same
rewriting system.
=head2 Conditional Rule
On some cases we want to perform a substitution if the pattern matches
B<and> a set of conditions about that pattern (or not) are true.
For that, we use a three part rule. We have the common rule plus the
condition part, separated from the rule by C<!!>. These conditional
rules can be applied both for basic and execution rules.
RULES translate
([[:alpha:]]+)=e=>$dic{$1}!! exists($dic{$1})
ENDRULES
The previous example would translate all words that exist on the
dictionary.
=head2 Begin Rule
Sometimes it is useful to change something on the string before
starting to apply the rules. For that, there is a special rule named
C<begin> (or C<b> for abbreviate) just with a RHS. This RHS is Perl
code. Any Perl code. If you want to modify the string, use C<$_>.
RULES foo
=b=> $_.=" END"
ENDRULES
=head2 Last Rule
As you use C<last> on Perl to skip the remaining code on a loop, you
can also call a C<last> (or C<l>) rule when a specific pattern
matches.
Like the C<begin> rule with only a RHS, the C<last> rule has only a
LHS:
RULES foo
foobar=l=>
ENDRULES
This way, the rules iterate until the string matches with C<foobar>.
You can also supply a condition in a last rule:
RULES bar
f(o+)b(a+)r=l=> !! length($1) == 2 * length($2);
=head2 Rules with /x mode
It is possible to use the regular expressions /x mode in the rewrite rules.
In this case:
=over 4
=item 1
there must be an empty line between rules
=item 2
you can insert space and line breaks into the regular expression:
RULES/x f1
(\d+)
(\d{3})
(000)
==>$1 milhao e $2 mil!! $1 == 1
ENDRULES
=back
=head1 POWER EXPRESSIONS
To facilitate matching complex languages Text::RewriteRules defines a
set of regular expressions that you can use (without defining them).
=head2 Parenthesis
There are three kind of usual parenthesis: the standard parenthesis,
brackets or curly braces. You can match a balanced string of
parenthesis using the power expressions C<[[:PB:]]>, C<[[:BB:]]> and
C<[[:CBB:]]> for these three kind of parenthesis.
For instance, if you apply this rule:
[[:BB:]]==>foo
to this string
something [ a [ b] c [d ]] and something more
then, you will get
something foo and something more
Note that if you apply it to
something [[ not ] balanced [ here
then you will get
something [foo balanced [ here
=head2 XML tags
The power expression C<[[:XML:]]> match a XML tag (with or without
children XML tags. Note that this expression matches only well formed
XML tags.
As an example, the rule
[[:XML:]]=>tag
applied to the string
<a><b></a></b> and <more><img src="foo"/></more>
will result in
<a><b></a></b> and tag
=cut
our $DEBUG = 0;
our $count = 0;
our $NL = qr/\r?\n\r?/;
my %pseudo_classes=(
TEXENV => 'TEXENV',
PB => 'PB',
BB => 'BB',
CBB => 'CBB',
XML => 'XMLtree',
'XML+1' => \&_tag_re,
);
sub _regular_expressions {
return <<'EORE';
our $__XMLattrs = qr/(?:
\s+[a-zA-Z0-9:-]+\s*
=
\s*(?: '[^']+' | "[^"]+" ))*/x;
### This (?<PCDATA>\n) is a BIG hack!
our $__XMLempty = qr/<(?<TAGNAME>[a-zA-Z0-9:-]+)(?<PCDATA>\b)$__XMLattrs\/>/x;
our $__XMLtree2 = qr/$__XMLempty |
(?<XML>
<(?<TAG>[a-zA-Z0-9:-]+)$__XMLattrs>
(?: $__XMLempty | [^<]++ | (?&XML) )*+
<\/\k<TAG>>
)/x;
our $__XMLtree = qr/$__XMLempty |
(?<XML>
<(?<TAGNAME>[a-zA-Z0-9:-]+)$__XMLattrs>
(?<PCDATA>(?: $__XMLempty | [^<]++ | $__XMLtree2 )*+)
<\/\k<TAGNAME>>
)/x;
our $__XMLinner = qr/(?: [^<]++ | $__XMLempty | $__XMLtree2 )*+/x;
our $__CBB = qr{ (?<cbb1> \{ (?<CBB>(?:[^\{\}]++|(?&cbb1))*+) \} ) }sx;
our $__BB = qr{ (?<bb1> \[ (?<BB> (?:[^\[\]]++|(?&bb1) )*+) \] ) }sx;
our $__PB = qr{ (?<pb1> \( (?<PB> (?:[^\(\)]++|(?&pb1) )*+) \) ) }sx;
our $__TEXENV = qr{\\begin\{(\w+)\}(.*?)\\end\{\1\}}s; ## \begin{$1}$2\end
our $__TEXENV1 = qr{\\begin\{(\w+)\}($__BB?)($__CBB)(.*?)\\end\{\1\}}s; ## \begin{$1}[$2]{$3}$4\end
EORE
}
sub _tag_re {
my $tagname = shift;
return "<$tagname\$__XMLattrs(?:\/>|>\$__XMLinner<\/$tagname>)";
}
sub _expand_pseudo_classes {
my $rules = shift;
$rules =~ s/(\[\[:(\w+):\]\])/$pseudo_classes{$2}?"\$__$pseudo_classes{$2}":$1/ge;
$rules =~ s/\[\[:(\w+)\(([^,\(\)]+)\):\]\]/$pseudo_classes{"$1+1"}->($2)/ge;
return $rules;
}
sub _mrules {
my ($conf, $name, $rules) = @_;
++$count;
my $code = "sub $name {\n";
$code .= " my \$p = shift;\n";
$code .= " my \$_M = \"\\x01\";\n";
$code .= " for (\$p) {\n";
$code .= " my \$modified = 1;\n";
$code .= " \$_ = \$_M.\$_;\n";
$code .= " #__$count#\n";
$code .= " my \$iteration = 0;\n";
$code .= " MAIN: while (\$modified) {\n";
$code .= " \$iteration++;\n";
if ($DEBUG) {
$code .= " print STDERR \" >\$_\\n\";\n"
}
$code .= " \$modified = 0;\n";
my $ICASE = exists($conf->{i})?"i":"";
my $DX = exists($conf->{x})?"x":"";
if (exists($conf->{d})) {
$code .= " print STDERR \"Iteration on $name: \$iteration\n\$p\n\";";
}
my @rules;
if ($DX eq "x") {
@rules = split /$NL$NL/, $rules;
} else {
@rules = split /$NL/, $rules;
}
for my $rule (@rules) {
$rule =~ s/$NL$//;
if ($rule =~ m/(.*?)(=i?=>)(.*)!!(.*)/) {
my ($ant,$con,$cond) = ($1,$3,$4);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " while (m{\${_M}(?:$ant)}g$ICASE) {\n";
$code .= " if ($cond) {\n";
$code .= " s{\${_M}(?:$ant)}{$con\${_M}}$ICASE;\n";
$code .= " pos = undef;\n";
$code .= " \$modified = 1;\n";
$code .= " next MAIN\n";
$code .= " }\n";
$code .= " }\n";
} elsif ($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)!!(.*)/) {
my ($ant,$con,$cond) = ($1,$3,$4);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " while (m{\${_M}(?:$ant)}g$ICASE) {\n";
$code .= " if ($cond) {\n";
$code .= " s{\${_M}(?:$ant)}{eval{$con}.\${_M}}e$ICASE;\n";
$code .= " pos = undef;\n";
$code .= " \$modified = 1;\n";
$code .= " next MAIN\n";
$code .= " }\n";
$code .= " }\n";
} elsif ($rule =~ m/(.*?)(=i?=>)(.*)/) {
my ($ant,$con) = ($1,$3);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{\${_M}(?:$ant)}$ICASE) {\n";
$code .= " s{\${_M}(?:$ant)}{$con\${_M}}$ICASE;\n";
$code .= " \$modified = 1;\n";
$code .= " next\n";
$code .= " }\n";
} elsif($rule =~ m/=b(?:egin)?=>(.*)/s) {
my $ac = $1;
$code =~ s/(#__$count#\n)/$ac;\n$1/;
} elsif ($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)/) {
my ($ant,$con) = ($1,$3);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{\${_M}(?:$ant)}$ICASE) {\n";
$code .= " s{\${_M}(?:$ant)}{eval{$con}.\"\$_M\"}e$ICASE;\n";
$code .= " \$modified = 1;\n";
$code .= " next\n";
$code .= " }\n";
} elsif($rule =~ m/(.*?)(=(?:i=)?l(?:ast)?=>\s*!!(.*))/s) {
my ($ant,$cond) = ($1,$3);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{\${_M}(?:$ant)}$ICASE$DX) {\n";
$code .= " if ($cond) {\n";
$code .= " s{\${_M}}{};\n";
$code .= " last\n";
$code .= " }\n";
$code .= " }\n";
} elsif($rule =~ m/(.*?)(=(?:i=)?l(?:ast)?=>)/s) {
my ($ant) = ($1);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{\${_M}(?:$ant)}$ICASE$DX) {\n";
$code .= " s{\${_M}}{};\n";
$code .= " last\n";
$code .= " }\n";
} else {
warn "Unknown rule: $rule\n" unless $rule =~ m!^\s*(#|$)!;
}
}
##---
# Make it walk...
$code .= " if (m{\${_M}(.|\\n)}) {\n";
$code .= " s{\${_M}(.|\\n)}{\$1\${_M}};\n";
$code .= " \$modified = 1;\n";
$code .= " next\n";
$code .= " }\n";
$code .= " }\n";
$code .= " s/\$_M//;\n";
$code .= " }\n";
$code .= " return \$p;\n";
$code .= "}\n";
$code;
}
sub _rules {
my ($conf, $name, $rules) = @_;
++$count;
my $code = "sub $name {\n";
$code .= " my \$p = shift;\n";
$code .= " for (\$p) {\n";
$code .= " my \$modified = 1;\n";
$code .= " #__$count#\n";
$code .= " my \$iteration = 0;\n";
$code .= " MAIN: while(\$modified) {\n";
$code .= " print STDERR \$_;\n" if $DEBUG > 1;
$code .= " \$modified = 0;\n";
$code .= " \$iteration++;\n";
##---
my $DICASE = exists($conf->{i})?"i":"";
my $DX = exists($conf->{x})?"x":"";
if (exists($conf->{d})) {
$code .= " print STDERR \"Iteration on $name: \$iteration\n\$p\n\";";
}
my @rules;
if ($DX eq "x") {
@rules = split /$NL$NL/, $rules;
} else {
@rules = split /$NL/, $rules;
}
for my $rule (@rules) {
$rule =~ s/$NL$//;
my $ICASE = $DICASE;
if($rule =~ m/(.*?)(=i?=>)(.*)!!(.*)/s) {
my ($ant,$con,$cond) = ($1,$3,$4);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " while (m{$ant}g$ICASE$DX) {\n";
$code .= " if ($cond) {\n";
$code .= " s{$ant\\G}{$con}$ICASE$DX;\n";
$code .= " pos = undef;\n";
$code .= " \$modified = 1;\n";
$code .= " next MAIN\n";
$code .= " }\n";
$code .= " }\n";
} elsif($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)!!(.*)/s) {
my ($ant,$con,$cond) = ($1,$3,$4);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " while (m{$ant}g$ICASE$DX) {\n";
$code .= " if ($cond) {\n";
$code .= " s{$ant\\G}{$con}e${ICASE}${DX};\n";
$code .= " pos = undef;\n";
$code .= " \$modified = 1;\n";
$code .= " next MAIN\n";
$code .= " }\n";
$code .= " }\n";
} elsif ($rule =~ m/(.*?)(=i?=>)(.*)/s) {
my ($ant,$con) = ($1,$3);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{$ant}$ICASE$DX) {\n";
$code .= " s{$ant}{$con}$ICASE$DX;\n";
$code .= " \$modified = 1;\n";
$code .= " next\n";
$code .= " }\n";
} elsif ($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)/s) {
my ($ant,$con) = ($1,$3);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{$ant}$ICASE$DX) {\n";
$code .= " s{$ant}{$con}e$ICASE$DX;\n";
$code .= " \$modified = 1;\n";
$code .= " next\n";
$code .= " }\n";
} elsif($rule =~ m/=b(?:egin)?=>(.*)/s) {
my $ac = $1;
$code =~ s/(#__$count#\n)/$ac;\n$1/;
} elsif($rule =~ m/(.*?)(=(i=)?l(ast)?=>\s*!!(.*))/s) {
my ($ant,$cond) = ($1,$5);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{$ant}$ICASE$DX) {\n";
$code .= " if ($cond) {\n";
$code .= " last\n";
$code .= " }\n";
$code .= " }\n";
} elsif($rule =~ m/(.*?)(=(i=)?l(ast)?=>)/s) {
my ($ant) = ($1);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{$ant}$ICASE$DX) {\n";
$code .= " last\n";
$code .= " }\n";
} else {
warn "Unknown rule: $rule\n" unless $rule =~ m!^\s*(#|$)!;
}
}
##---
$code .= " }\n";
$code .= " }\n";
$code .= " return \$p;\n";
$code .= "}\n";
$code;
}
sub _lrules {
my ($conf, $name, $rules) = @_;
++$count;
my $code = "my \$${name}_input = \"\";\n";
$code .= "sub ${name}_init {\n";
$code .= " \$${name}_input = shift;\n";
$code .= " return 1;\n";
$code .= "}\n\n";
$code .= "sub $name {\n";
$code .= " return undef if not defined \$${name}_input;\n";
$code .= " print STDERR \$_;\n" if $DEBUG > 1;
$code .= " for (\$${name}_input) {\n";
##---
my $DICASE = exists($conf->{i})?"i":"";
my $DX = exists($conf->{x})?"x":"";
my @rules;
if ($DX eq "x") {
@rules = split /$NL$NL/, $rules;
} else {
@rules = split /$NL/, $rules;
}
for my $rule (@rules) {
$rule =~ s/$NL$//;
my $ICASE = $DICASE;
if ($rule =~ m/=EOF=>(.*)/s) {
my $act = $1;
$code .= " if (m{^\$}) {\n";
$code .= " \$${name}_input = undef;\n";
$code .= " return \"$act\";\n";
$code .= " }\n";
} elsif ($rule =~ m/=EOF=e=>(.*)/s) {
my $act = $1;
$code .= " if (m{^\$}) {\n";
$code .= " \$${name}_input = undef;\n";
$code .= " return $act;\n";
$code .= " }\n";
} elsif ($rule =~ m/(.*?)(=(?:i=)?ignore=>)(.*)!!(.*)/s) {
my ($ant,$cond) = ($1, $4);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{^$ant}g$ICASE$DX) {\n";
$code .= " if ($cond) {\n";
$code .= " s{$ant}{}$ICASE$DX;\n";
$code .= " pos = undef;\n";
$code .= " return $name();\n";
$code .= " }\n";
$code .= " }\n";
} elsif ($rule =~ m/(.*?)(=(?:i=)?ignore=>)(.*)/s) {
my ($ant) = ($1);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{^$ant}g$ICASE$DX) {\n";
$code .= " s{$ant}{}$ICASE$DX;\n";
$code .= " pos = undef;\n";
$code .= " return $name();\n";
$code .= " }\n";
} elsif($rule =~ m/(.*?)(=i?=>)(.*)!!(.*)/s) {
my ($ant,$con,$cond) = ($1,$3,$4);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{^$ant}g$ICASE$DX) {\n";
$code .= " if ($cond) {\n";
$code .= " s{$ant}{}$ICASE$DX;\n";
$code .= " pos = undef;\n";
$code .= " return \"$con\"\n";
$code .= " }\n";
$code .= " }\n";
} elsif($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)!!(.*)/s) {
my ($ant,$con,$cond) = ($1,$3,$4);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{^$ant}g$ICASE$DX) {\n";
$code .= " if ($cond) {\n";
$code .= " s{$ant}{}${ICASE}${DX};\n";
$code .= " pos = undef;\n";
$code .= " return $con;\n";
$code .= " }\n";
$code .= " }\n";
} elsif($rule =~ m/(.*?)(=i?=>)(.*)/s) {
my ($ant,$con) = ($1,$3);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{^$ant}g$ICASE$DX) {\n";
$code .= " s{$ant}{}$ICASE$DX;\n";
$code .= " pos = undef;\n";
$code .= " return \"$con\"\n";
$code .= " }\n";
} elsif($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)/s) {
my ($ant,$con) = ($1,$3);
$ICASE = "i" if $2 =~ m!i!;
$ant = _expand_pseudo_classes($ant);
$code .= " if (m{^$ant}g$ICASE$DX) {\n";
$code .= " s{$ant}{}${ICASE}${DX};\n";
$code .= " pos = undef;\n";
$code .= " return $con;\n";
$code .= " }\n";
} else {
warn "Unknown rule in lexer mode: $rule\n" unless $rule =~ m!^\s*(#|$)!;
}
}
##---
$code .= " }\n";
$code .= " return undef;\n";
$code .= "}\n";
$code;
}
FILTER {
return if m!^(\s|\n)*$!;
s!^!_regular_expressions()!e;
print STDERR "BEFORE>>>>\n$_\n<<<<\n" if $DEBUG;
s!^MRULES +(\w+)\s*?\n((?:.|\n)*?)^ENDRULES!_mrules({}, $1,$2)!gem;
s!^LRULES +(\w+)\s*?\n((?:.|\n)*?)^ENDRULES!_lrules({}, $1,$2)!gem;
s{^RULES((?:\/\w+)?) +(\w+)\s*?\n((?:.|\n)*?)^ENDRULES}{
my ($a,$b,$c) = ($1,$2,$3);
my $conf = {map {($_=>$_)} split //,$a};
if (exists($conf->{'l'})) {
_lrules($conf, $b, $c)
} elsif (exists($conf->{'m'})) {
_mrules($conf,$b,$c)
} else {
_rules($conf,$b,$c)
}
}gem;
print STDERR "AFTER>>>>\n$_\n<<<<\n" if $DEBUG;
$_
};
sub _compiler{
local $/ = undef;
$_ = <>;
print __compiler($_);
}
sub __compiler {
my $str = shift;
for ($str) {
s!use Text::RewriteRules;!_regular_expressions()!e;
s!^MRULES +(\w+)\s*\n((?:.|\n)*?)^ENDRULES!_mrules({}, $1,$2)!gem;
s!^LRULES +(\w+)\s*\n((?:.|\n)*?)^ENDRULES!_lrules({}, $1,$2)!gem;
s{^RULES((?:\/\w+)?) +(\w+)\s*\n((?:.|\n)*?)^ENDRULES}{
my ($a,$b,$c) = ($1,$2,$3);
my $conf = {map {($_=>$_)} split //,$a};
if (exists($conf->{'l'})) {
_lrules($conf,$b,$c)
} elsif (exists($conf->{'m'})) {
_mrules($conf,$b,$c)
} else {
_rules($conf,$b,$c)
}
}gem;
}
return $str;
}
=head1 TUTORIAL
At the moment, just a set of commented examples.
Example1 -- from number to portuguese words (using traditional rewriting)
Example2 -- Naif translator (using cursor-based rewriting)
=head1 Conversion between numbers and words
Yes, you can use L<Lingua::PT::Nums2Words> and similar (for other
languages). Meanwhile, before it existed we needed to write such a
conversion tool.
Here I present a subset of the rules (for numbers bellow 1000). The
generated text is Portuguese but I think you can get the idea. I'll
try to create a version for English very soon.
You can check the full code on the samples directory (file
C<num2words>).
use Text::RewriteRules;
RULES num2words
100==>cem
1(\d\d)==>cento e $1
0(\d\d)==>$1
200==>duzentos
300==>trezentos
400==>quatrocentos
500==>quinhentos
600==>seiscentos
700==>setecentos
800==>oitocentos
900==>novecentos
(\d)(\d\d)==>${1}00 e $2
10==>dez
11==>onze
12==>doze
13==>treze
14==>catorze
15==>quinze
16==>dezasseis
17==>dezassete
18==>dezoito
19==>dezanove
20==>vinte
30==>trinta
40==>quarenta
50==>cinquenta
60==>sessenta
70==>setenta
80==>oitenta
90==>noventa
0(\d)==>$1
(\d)(\d)==>${1}0 e $2
1==>um
2==>dois
3==>três
4==>quatro
5==>cinco
6==>seis
7==>sete
8==>oito
9==>nove
0$==>zero
0==>
==>
,==>,
ENDRULES
num2words(123); # returns "cento e vinte e três"
=head2 Naif translator (using cursor-based rewriting)
use Text::RewriteRules;
%dict=(driver=>"motorista",
the=>"o",
of=>"de",
car=>"carro");
$word='\b\w+\b';
if( b(a("I see the Driver of the car")) eq "(I) (see) o Motorista do carro" )
{print "ok\n"}
else {print "ko\n"}
RULES/m a
($word)==>$dict{$1}!! defined($dict{$1})
($word)=e=> ucfirst($dict{lc($1)}) !! defined($dict{lc($1)})
($word)==>($1)
ENDRULES
RULES/m b
\bde o\b==>do
ENDRULES
=head1 AUTHOR
Alberto Simões, C<< <ambs@cpan.org> >>
José João Almeida, C<< <jjoao@cpan.org> >>
=head1 BUGS
We know documentation is missing and you all want to use this module.
In fact we are using it a lot, what explains why we don't have the
time to write documentation.
Please report any bugs or feature requests to
C<bug-text-rewrite@rt.cpan.org>, or through the web interface at
L<http://rt.cpan.org>. I will be notified, and then you'll automatically
be notified of progress on your bug as I make changes.
=head1 ACKNOWLEDGEMENTS
Damian Conway for Filter::Simple
=head1 COPYRIGHT & LICENSE
Copyright 2004-2012 Alberto Simões and José João Almeida, All Rights Reserved.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
=cut
1; # End of Text::RewriteRules