The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=pod

=for comment
DO NOT EDIT. This Pod was generated by Swim v0.1.30.
See http://github.com/ingydotnet/swim-pm#readme

=encoding utf8

=head1 Pegex Tutorial Calculator

A Pegex Calculator

When you look around the web for stuff about parsing, you inevitably find a
bunch of examples about how to write a precedence parser for mathematical
expressions.

=over

=item * L<http://en.wikipedia.org/wiki/Operator-precedence_parser>

=item * L<http://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing/>

=item * L<http://www.hokstad.com/operator-precedence-parser.html>

=item * L<http://www.perlmonks.org/?node_id=554516>

=back

This tutorial is the Pegex version of that. Pegex actually comes with an
examples directory that contains two arithmetic expression parser/evaluator
caclulator programs:

=over

=item * L<https://github.com/ingydotnet/pegex-pm/blob/master/examples/calculator1.pl>

=item * L<https://github.com/ingydotnet/pegex-pm/blob/master/examples/calculator2.pl>

=back

They both do the same thing but using different parsing approaches. We'll
cover both of them in detail. I hope you'll find that Pegex handles operator
precedence parsing elegantly.

=head1 The Problem

Precedence parsers are interesting because you need to deal with operators
that have the same precedence, different precedences and both left and right
associativity.

Consider the equation:

    1 - 2 ^ 3 ^ 4 + 5 * 6 / 7

Normal precedence and associativity rules make this the same as:

    (1 - (2 ^ (3 ^ 4)) + ((5 * 6) / 7))

Our calculator should produce the same result for both. Note that this means
we will be parsing numbers, 5 operators, parentheses, and separating
whitespace.

Here's an example of the calculator program running:

    > perl examples/calculator1.pl
    Enter an equation: 1+2*3
    1+2*3 = 7
    Enter an equation: (1 + 2) * 3
    (1 + 2) * 3 = 9
    Enter an equation:

=head1 The Solutions

Most of the solutions that you'll read about on the web, involve (or assume) a
lexing/tokenizing step before parsing. Pegex always parses an input stream
directly, pulling out "tokens" that it needs using regex captures. So the
parse happens as one operation, which has many advantages.

But how do we get the operator precedence rules into this? Well, we have 2
different ways:

=head2 calculator1.pl - Operator Precedence Climbing

Our first example calculator uses what is known as the Operator Precedence
Climbing method. See: L<http://en.wikipedia.org/wiki/Operator-
precedence_parser#Precedence_climbing_method>.

This is basically a clever technique of specifying our grammar rules such that
they imply precedence. Here's the pegex grammar from the code:

    expr: add_sub
    add_sub: mul_div+ % /- ([ PLUS DASH ]) -/
    mul_div: exp+ % /- ([ STAR SLASH ]) -/
    exp: token+ % /- CARET -/
    token: /- LPAREN -/ expr /- RPAREN -/ | number
    number: /- ( DASH? DIGIT+ ) -/

It's a little bit wonky but it works. It says that any expression is an
addI<subtract and that an add>subtract is really a multiply/divide etc.
Finally after the last operator comes the number token and the parens.

It feels a bit backwards. One of the biggest drawbacks of PCM is that it
becomes less and less efficient with more and more operators. It needs to go
through the whole tree, just to find each number.

But it works and the code is minimal. The receiver class gets the numbers in
the correct order, immediately evaluates the answer and returns the answer for
each level. Whatever the return value of the final operation is, becomes the
result of the parse. Here's the receiver class:

    {
        package Calculator;
        use base 'Pegex::Tree';

        sub got_add_sub {
            my ($self, $list) = @_;
            $self->flatten($list);
            while (@$list > 1) {
                my ($a, $op, $b) = splice(@$list, 0, 3);
                unshift @$list, ($op eq '+') ? ($a + $b) : ($a - $b);
            }
            @$list;
        }

        sub got_mul_div {
            my ($self, $list) = @_;
            $self->flatten($list);
            while (@$list > 1) {
                my ($a, $op, $b) = splice(@$list, 0, 3);
                unshift @$list, ($op eq '*') ? ($a * $b) : ($a / $b);
            }
            @$list;
        }

        sub got_exp {
            my ($self, $list) = @_;
            $self->flatten($list);
            while (@$list > 1) {
                my ($a, $b) = splice(@$list, -2, 2);
                push @$list, $a ** $b;
            }
            @$list;
        }
    }

As you can see, it has an action method for each level or precedence. It loops
over the expression, evaluating it. Whether it loops from left to right or
right to left depends on the associativity that we want to use.

Our runner code looks like this:

    while (1) {
        print "\nEnter an equation: ";
        my $input = <>;
        chomp $input;
        last unless length $input;
        calc($input);
    }

    sub calc {
        my $expr = shift;
        my $calculator = pegex($grammar, 'Calculator');
        my $result = eval { $calculator->parse($expr) };
        print $@ || "$expr = $result\n";
    }

And that's the whole thing. We have a working calculator as specced!

However the real point of this is to explore good parsing techniques, and the
PCM leaves us wanting to try something more efficient. Let's try another
approach...

=head2 calculator2.pl - Shunting Yard Algorithm

An age old way of parsing expressions is to somehow get the numbers and
operators into an RPN (Reverse Polish Notation) stack, which is each operand
follow by its operator. Once in that form, precedence and associativity are
accounted for.

For example:

    1 / 2 - ( -3 * 4 )

becomes:

    1, 2, /, -3, 4, *, -

To evaluate an RPN you pop off an operator and then attempt to pop off and
operand. If the operand is another operator you recurse. When you have 2
operands you do the operation and put the result back on the stack. When there
is only 1 element on the stack, you are done. That's your result.

Let's look at our new grammar in C<calculator2.pl>:

    expr: operand ( operator operand )*
    operator: /- ([ PLUS DASH STAR SLASH CARET ]) -/
    operand: num | /- LPAREN -/ expr /- RPAREN -/
    num: /- ( DASH? DIGIT+ ) -/

This is much easier to understand. We are just parsing out the tokens. In a
(very real) sense, we are using Pegex as a lexer.

Now let's look at the receiver class:

    {
        package Calculator;
        use base 'Pegex::Tree', 'Precedence';

        my $operator_precedence_table = {
            '+' => {p => 1, a => 'l'},
            '-' => {p => 1, a => 'l'},
            '*' => {p => 2, a => 'l'},
            '/' => {p => 2, a => 'l'},
            '^' => {p => 3, a => 'r'},
        };

        sub got_expr {
            my ($self, $expr) = @_;
            $self->precedence_rpn($expr, $operator_precedence_table);
        }
    }

This is also much simpler. There's only one method. What's going on? Well the
secret is that I put the code to turn the tokens into RPN in a separate base
class called L<examples/lib/Precedence.pm>.

This is an implementation of Edsger Dijkstra's famous Shunting-yard Algorithm
from 1961! It's only 20 lines of Perl. I won't include it inline here, but
have a look at it for yourself. L<https://github.com/ingydotnet/pegex-
pm/blob/master/examples/lib/Precedence.pm>

The Shunting-yard algorithm simply takes a list of expression tokens and
transforms them into an RPN stack. It uses information from a
precedence/associativity table like the one above.

Unlike L<calculator1.pl> where we evaluated as we parsed, L<calculator2.pl>
creates an RPN which is akin to an AST. In other words, it's more like
something an actually language compiler would do.

But we are writing a calculator and we still need to evaluate this pupy. I
changed the runner code to look like this:

    sub calc {
        my $expr = shift;
        my $calculator = pegex($grammar, 'Calculator');
        my $rpn = eval { $calculator->parse($expr) };
        my $result = RPN::evaluate($rpn);
        print $@ || "$expr = $result\n";
    }

As you can see I also moved the evaluator code into a separate/reusable
module: L<examples/lib/RPN.pm>. This is another 30 lines of Perl. Please take
a look at it. L<https://github.com/ingydotnet/pegex-
pm/blob/master/examples/lib/RPN.pm>

So overall, this second solution was a bit more code, but also feels more
solid on several levels.

=head1 Conclusion

Pegex strives to be the nicest and most reusable way to write new parsers.
Operator precedence parsers are a necessary part of parsing mathematical
expressions and computer languages. This tutorial showed you 2 ways to do it.
As the demands for Pegex grow, we may see even more ways to do it.

=cut