The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
NAME
    Parser::Combinators - A library of building blocks for parsing text

SYNOPSIS
      use Parser::Combinators;

      my $parser = < a combination of the parser building blocks from Parser::Combinators >
      (my $status, my $rest, my $matches) = $parser->($str);
      my $parse_tree = getParseTree($matches);

DESCRIPTION
    Parser::Combinators is a library of parser building blocks ('parser
    combinators'), inspired by the Parsec parser combinator library in
    Haskell (http://legacy.cs.uu.nl/daan/download/parsec/parsec.html). The
    idea is that you build a parsers not by specifying a grammar (as in
    yacc/lex or Parse::RecDescent), but by combining a set of small parsers
    that parse well-defined items.

  Usage
    Each parser in this library , e.g. "word" or "symbol", is a function
    that returns a function (actually, a closure) that parses a string. You
    can combine these parsers by using special parsers like "sequence" and
    "choice". For example, a JavaScript variable declaration

         var res = 42;

    could be parsed as:

        my $p =
            sequence [
                symbol('var'),
                word,
                symbol('='),
                natural,
                semi
            ]

    if you want to express that the assignment is optional, i.e. " var res;"
    is also valid, you can use "maybe()":

        my $p =
            sequence [
                symbol('var'),
                word,
                maybe(
                    sequence [
                       symbol('='),
                       natural
                       ]
                ),
                semi
            ]

    If you want to parse alternatives you can use "choice()". For example,
    to express that either of the next two lines are valid:

        42
        return(42)

    you can write

        my $p = choice( number, sequence [ symbol('return'), parens( number ) ] )

    This example also illustrates the `parens()` parser to parse anything
    enclosed in parenthesis

  Provided Parsers
    The library is not complete in the sense that not all Parsec combinators
    have been implemented. Currently, it contains:

            whiteSpace : parses any white space, always returns success. 

            * Lexeme parsers (they remove trailing whitespace):

            word : (\w+)
            natural : (\d+)
            symbol : parses a given symbol, e.g. symbol('int')
                    comma : parses a comma
            semi : parses a semicolon
        
            char : parses a given character

            * Combinators:

            sequence( [ $parser1, $parser2, ... ], $optional_sub_ref )
            choice( $parser1, $parser2, ...) : tries the specified parsers in order
            try : normally, the parser consums matching input. try() stops a parser from consuming the string
            maybe : is like try() but always reports success
            parens( $parser ) : parser '(', then applies $parser, then ')'
            many( $parser) : applies $parser zero or more times
            many1( $parser) : applies $parser one or more times
            sepBy( $separator, $parser) : parses a list of $parser separated by $separator
            oneOf( [$patt1, $patt2,...]): like symbol() but parses the patterns in order

            * Dangerous: the following parsers take a regular expression, so you can mix regexes and other combinators ...                                       

            upto( $patt )
            greedyUpto( $patt)
            regex( $patt)

  Labeling
    You can label any parser in a sequence using an anonymous hash, for
    example:

        sub type_parser {   
                    sequence [
            {Type =>        word},
            maybe parens choice(
                    {Kind => natural},
                                                    sequence [
                                                            symbol('kind'),
                                                            symbol('='),
                                {Kind => natural}
                                                    ] 
                                            )        
                    ] 
        }

    Applying this parser returns a tuple as follows:

      my $str = 'integer(kind=8), '
      (my $status, my $rest, my $matches) = type_parser($str);

    Here,$status is 0 if the match failed, 1 if it succeeded. $rest contains
    the rest of the string. The actual matches are stored in the array
    $matches. As every parser returns its resuls as an array ref, $matches
    contains the concrete parsed syntax, i.e. a nested array of arrays of
    strings.

        Dumper($matches) ==> [{'Type' => 'integer'},['kind','\\=',{'Kind' => '8'}]]

    You can remove the unlabeled matches and convert the raw tree into
    nested hashes using "getParseTree":

      my $parse_tree = getParseTree($matches);

        Dumper($parse_tree) ==> {'Type' => 'integer','Kind' => '8'}

  A more complete example
    I wrote this library because I needed to parse argument declarations of
    Fortran-95 code. Some examples of valid declarations are:

          integer(kind=8), dimension(0:ip, -1:jp+1, kp) , intent( In ) :: u, v,w
          real, dimension(0:7) :: f 
          real(8), dimension(0:7,kp) :: f,g

    I want to extract the type and kind, the dimension and the list of
    variable names. For completeness I'm parsing the `intent` attribute as
    well. The parser is a sequence of four separate parsers "type_parser",
    "dim_parser", "intent_parser" and "arglist_parser". All the optional
    fields are wrapped in a "maybe()".

        my $F95_arg_decl_parser =    
        sequence [
            whiteSpace,
            {TypeTup => &type_parser},
                maybe(
                        sequence [
                                comma,
                    &dim_parser
                    ], 
            ),
                maybe(
                    sequence [
                            comma,
                            &intent_parser
                    ], 
                ),
            &arglist_parser
            ];

        # where

        sub type_parser {   
                    sequence [
            {Type =>        word},
            maybe parens choice(
                    {Kind => natural},
                                                    sequence [
                                                            symbol('kind'),
                                                            symbol('='),
                                {Kind => natural}
                                                    ] 
                                            )        
                    ] 
        }

        sub dim_parser {
                    sequence [
                            symbol('dimension'),
            {Dim => parens sepBy(',', regex('[^,\)]+')) }
                    ] 
        }

        sub intent_parser {
                sequence [
                symbol('intent'),
             {Intent => parens word}
                    ] 
        }

        sub arglist_parser {
            sequence [
                    symbol('::'),
                {Vars => sepBy(',',&word)}
            ]
        }

    Running the parser and calling "getParseTree()" on the first string
    results in

        {
        'TypeTup' => {
                    'Type' => 'integer',
                    'Kind' => '8'
                },
        'Dim' => ['0:ip','-1:jp+1','kp'],
        'Intent' => 'In',
        'Vars' => ['u','v','w']
        }

    See the test fortran95_argument_declarations.t for the source code.

   No Monads?!
    As this library is inspired by a monadic parser combinator library from
    Haskell, I have also implemented "bindP()" and "returnP()" for those who
    like monads ^_^ So instead of saying

        my $pp = sequence [ $p1, $p2, $p3 ]

    you can say

        my $pp = bindP( 
            $p1, 
            sub { (my $x) =@_;
                bindP( 
                    $p2,  
                    sub {(my $y) =@_;
                    bindP(
                        $p3,
                        sub { (my $z) = @_;
                            returnP->($z);
                            }
                        )->($y)
                    }
                    )->($x);
                }
            );

    which is obviously so much better :-)

AUTHOR
    Wim Vanderbauwhede <Wim.Vanderbauwhede@gmail.com>

COPYRIGHT
    Copyright 2013- Wim Vanderbauwhede

LICENSE
    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

SEE ALSO
    - The original Parsec library:
    <http://legacy.cs.uu.nl/daan/download/parsec/parsec.html> and
    <http://hackage.haskell.org/package/parsec>