The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
# Copyright (C) 2001-2012, Parrot Foundation.

=pod

=head1 NAME

compiler/pge/README.pod - Readme to the 'compilers/pge' directory. 

=head1 DESCRIPTION

This is a regular expression/rules/grammar engine/parser designed to
run in Parrot.  It's still a work in progress, but has a lot of
nice features in it, including support for perl 6 regexes, globs,
shift-reduce parsing, and some support for perl 5 regular expressions.
It also includes the "pgc.pir" grammar compiler, which can convert
an entire grammar specification into the appropriate PIR code
for execution.

A nice feature of PGE is that one can easily combine many
different parsing styles into a single interface.  PGE uses
perl 6 rules for its top-down parsing, an operator precedence
parser for bottom-up (shift/reduce) parsing, and allows control
to pass freely between the two styles as well as to custom parsing
subroutines.

Note that PGE is currently in update-only mode. If you're writing a language
from scratch, you'd be better off using NQP or NQP-rx.

=head1 Installation

PGE assumes that it is part of the parrot distribution in the
F<compilers/pge> directory.   Simply type C<make> in this directory
to build the various *.pbc files and install them into
runtime/parrot/library.

The distribution comes with a small F<demo.pir> program that gives an
example of using PGE.  To run the demo, simply do
C<parrot demo.pir>.  The demo understands the following commands:

    rule pattern      - compile a Perl 6 rule from "pattern"
    save name         - save the current rule as "name"
    text              - a text string to match against previously entered rule
    pir               - display the PIR code generated for current rule
    exp               - display the expression tree for the current rule
    trace             - toggle pattern execution tracing
    next              - repeat last match on target string

=head1 PGE's rule engine  (PGE::Perl6Regex)

Once PGE is compiled and installed, you generally load it using
the load_bytecode operation, as in

    load_bytecode 'PGE.pbc'

This imports the PGE::Perl6Regex compiler, which can be used to compile
strings of Perl 6 regexes.  A sample compile sequence would be:

    .local pmc p6regex_compile
    p6regex_compile = compreg 'PGE::Perl6Regex'         # get the compiler

    .local string pattern
    .local pmc rulesub
    pattern = '^(From|Subject)\:'                  # pattern to compile
    rulesub = p6regex_compile(pattern)             # compile it to rulesub

Then, to match a target string we simply call the subroutine
to get back a C<PGE::Match> object:

    .local pmc match
    $S0 = 'From: pmichaud@pobox.com'               # target string
    match = rulesub($S0)                           # execute rule

The Match object is true if it successfully matched, and contains
the strings and subpatterns that were matched as part of the capture.
Parrot's "Data::Dumper" can be used to quickly view the results
of the match:

    load_bytecode 'dumper.pir'
    load_bytecode 'PGE/Dumper.pir'

  match_loop:
    unless match goto match_fail                   # if match fails stop
    print "match succeeded\n"
    _dumper(match)
    match.'next'()                                 # find the next match
    goto match_loop

  match_fail:
    print "match failed\n"

One can also get the intermediate PIR code that PGE generates for
the rule subroutine -- just use

    $S0 = p6regex_compile(pattern, 'target'=>'PIR')

and you can print/inspect the contents of $S0 to see the generated code.

See the STATUS file for a list of implemented and yet-to-be-implemented
features.

=head1 Known limitations of the rule engine

PGE doesn't (yet) properly handle nested repetitions of zero-length
patterns in groups -- that's coming soon.

Many well-known optimizations (e.g., Boyer-Moore) aren't
implemented yet, although a variety of optimizations are being
added as we generate code.

Lastly, error handling needs to be improved, but this will likely
be decided as we discover how PGE integrates with the rest of
Parrot.

=head1 Implementation notes

Basically, PGE is a compiler just like any other, except that its
"language" is the Perl 6 regex syntax and its output is a subroutine
that can match strings.  So, PGE consists of a series of parsers
(for each pattern matching language), an intermediate expression
format, and a code generator.

The parsers can be written using PIR subroutines or PGE's built-in
operator precedence (shift/reduce) parser; the parser for Perl 6
regexes is built with the operator precedence parser.  This parser
produces a parse tree (in the form of a Match object) for a given
Perl 6 regex.  The parse tree then goes through semantic analysis
and reduction phases before being sent to code generation to
produce a PIR subroutine.

The generated PIR code uses Parrot calling conventions for
all interfaces with external callers/callees such as subrules.

PGE also uses Parrot coroutines for the matching
engine, so that after a successful match is found, the
next match within the same string can be found by simply
returning control to the matching coroutine, which then
picks up from where it had previously left off until
another match is discovered.

The code still needs a fair amount of commenting.  In general,
if you have a question about a particular section of code,
send Pm an email and he'll write the comments for it.

=head1 AUTHOR

Patrick Michaud (pmichaud@pobox.com) is the author and maintainer.
Patches and suggestions should be sent to the Perl 6 compiler list
(perl6-compiler@perl.org).

=cut