Greg London > Parse-Gnaw-0.600 > Parse::Gnaw

Download:
Parse-Gnaw-0.600.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.600   Source   Latest Release: Parse-Gnaw-0.601

NAME ^

Parse::Gnaw - Write extensible, recursive, grammars in pure perl code (grammar rules are perl arrays) and apply them to whatever parsee you want.

SYNOPSIS ^

Write extensible, recursive, grammars using pure perl code. Grammar rules are perl arrays. Apply them to whatever parsee you want. Normal parsees would be strings. Interesting parsees might be a three-dimensional array of characters.

        no strict 'vars';
        use Parse::Gnaw;
        use Parse::Gnaw::String;

        rule('SayHello', 'Hello', 'World');
        my $string=Parse::Gnaw::String->New('So Hello World of mine');
        $string->parse('SayHello');

This is the second generation of Parse::Gnaw starting from revision 0.600. Gen1 stored rules as code references and that prevented recursive calls within a rule as calling the code ref for the rule would go into an infinite loop. Gen2 uses array references to store rule, with the name of the array reference variable matching the name of the rule.

        our $rulename = [ .... rule content .... ];

It should allow recursive rules, although it will probably get hung in an infinite loop trying to match a left recursive rule.

Define a Grammar

Before you can parse anything, you have to create a grammar. Grammars are created with the "rule" subroutine, which is imported when you use Parse::Gnaw.

        # see t/doc_ex_rule_hi.t
        use Parse::Gnaw;
        rule('SayHello', 'H', 'I');

This will create a package scalar in your current package. The name of the scalar will be the name of the rule. The scalar will be a reference to an array that contains the rule. You can treat it like any other perl variable.

        print Dumper $SayHello;

This will print out something like:

$VAR1 = [ [ 'rule', 'rule1', { 'methodname' => 'rule', 'filename' => 't/doc_ex_rule_hi.t', 'linenum' => 18, 'payload' => 'rule1', 'quantifier' => '', 'package' => 'main' } ], [ 'lit', 'H', { 'methodname' => 'lit', 'filename' => 't/doc_ex_rule_hi.t', 'linenum' => 18, 'payload' => 'H', 'package' => 'main' } ], [ 'lit', 'I', { 'methodname' => 'lit', 'filename' => 't/doc_ex_rule_hi.t', 'linenum' => 18, 'payload' => 'I', 'package' => 'main' } ] ];

The array shows three elements. The first is a "rule" which defines the name of the rule and also holds extra information about the rule. The next two elements are literals looking for 'H' and then 'I'.

Create Something to be Parsed

A grammar is half of the puzzle. You also need to create the thing you want to parse. A simple example is a string:

        # see t/doc_ex_string_dog.t
        use Parse::Gnaw::LinkedListDimensions1;
        my $ab_string=Parse::Gnaw::LinkedListDimensions1->new("dog");
        $ab_string->display();

What this does is take the string 'dog' and turn it into a linked list that can be parsed.

Because Data::Dumper() does not handle linked lists well (they do not display in an easy-to-read format), the display() method was created. It will output a Parse::Gnaw string-ish object of some kind in a more readable format

        Dumping LinkedList object
        LETPKG => Parse::Gnaw::Blocks::Letter # package name of letter objects
        CONNMIN1 => 0 # max number of connections, minus 1
        HEADING_DIRECTION_INDEX => 0
        HEADING_PREVNEXT_INDEX  => 0    
        FIRSTSTART => 

                letterobject: Parse::Gnaw::Blocks::Letter=ARRAY(0xa08c820)
                payload: 'FIRSTSTART'
                from: unknown
                connections:
                         [ ........... , ........... ]
        
        LASTSTART => 
        
                letterobject: Parse::Gnaw::Blocks::Letter=ARRAY(0xa18d70c)
                payload: 'LASTSTART'
                from: unknown
                connections:
                         [ ........... , ........... ]
        
        CURRPTR => 
        
                letterobject: Parse::Gnaw::Blocks::Letter=ARRAY(0xa08c820)
                payload: 'FIRSTSTART'
                from: unknown
                connections:
                         [ ........... , ........... ]
        
        
        letters, by order of next_start_position()
        letterobject: Parse::Gnaw::Blocks::Letter=ARRAY(0xa252d2c)
                payload: 'd'
                from: file t/doc_ex_string_dog.t, line 22, column 0     
                connections:
                         [ ........... , (0xa252de0) ]


                letterobject: Parse::Gnaw::Blocks::Letter=ARRAY(0xa252de0)
                payload: 'o'
                from: file t/doc_ex_string_dog.t, line 22, column 1
                connections:            
                         [ (0xa252d2c) , (0xa252ef8) ]


                letterobject: Parse::Gnaw::Blocks::Letter=ARRAY(0xa252ef8)
                payload: 'g'
                from: file t/doc_ex_string_dog.t, line 22, column 2
                connections:
                         [ (0xa252de0) , ........... ]
        

                letterobject: Parse::Gnaw::Blocks::Letter=ARRAY(0xa18d70c)
                payload: 'LASTSTART'
                from: unknown
                connections:
                         [ ........... , ........... ]

Apply the Grammar to the Grammee.

Now that you have a Grammar and a Grammee, you can parse.

The parse() method is something that Parse::Gnaw::LinkedList type objects have available. It takes in one argument, a string containing the name of the top level rule or grammar that you want to apply to the string. If the rule matches the string, parse() will return true 1. If the rule does NOT match the string, parse() will return false ''.

        $string->parse('rulename');

The parse() method is used for parsing an an entire string from the beginning. It is similar to putting ^ or \A at the front of a regular expression:

        m/^(rule)/ or m/\A(rule)/ 

Here's a full example of parsing a string:

        # see t/doc_ex_rule_and_string.t
        use Parse::Gnaw;
        use Parse::Gnaw::LinkedListDimensions1;

        # A Simple Rule Example
        rule( 'rule1', 'H', 'I' );

        # A simple string example
        my $histring=Parse::Gnaw::LinkedListDimensions1->new("HI THERE");

        ok($histring->parse('rule1'), "This is like regex   'HI THERE' =~ m/HI/ ");

rule

The rule function is used to create rules. Rules are created as package scalar in caller's namespace. The name of the scalar is the name of the rule.

        package main;
        rule( 'rule1', 'H', 'I' );

The above example will create a rule called "main::rule1". You can call Data::Dumper on $rule1 and see that it is an array reference.

Rules by themselves don't match anything in a string or block of text. Rules are just a way to handle a grammar in managable chunks. They could be thought of as similar to a perl subroutine, a container for the code that does something.

The first parameter is a string with the name of the rule.

Everything after that defines what the rule does. These can be string literals or character classes or alternations or quantifiers, and so on. Another thing you can do inside a rule is call another rule.

call

Use the 'call;' subroutine to have one rule call another rule.

        rule( 'rule1', 'a', 'b');
        rule( 'rule2', 'c', call('rule1') );

Note: if you call a rule that doesn't exist, script will throw a warning. You can pre-declare a rule with the predeclare() function:

        predeclare('rule1');
        rule( 'rule2', 'c', call('rule1') );
        rule( 'rule1', 'a', 'b');

predeclare

When declaring rule1 that calls rule2, and you haven't yet declared rule2, you will get a warning message about the rule not existing. To avoid that warning, use predeclare() and pass in the name of the rule you want to predeclare.

lit

Pass this function a string containing the literal value you want to match.

        rule( 'greeting', lit('hello') );

Any string passed into rule() will be assumed to be a lit().

        rule( 'greeting', 'hello' );

cc

Call this and pass in a string defining a character class.

        cc('aeiou');

This is like [aeiou] in perl regular expressions.

notcc

Call this and pass in a string defining an inverted character class.

        notcc('aeiou');

This is like [^aeiou] in perl regular expressions.

thrifty

Quantifier. Pass in a series of subrules to thrifty and it will attempt to match that series as defined by the last entry in the elements passed into the function call.

A perl regular expression /(abc)+/ becomes thrifty('a', 'b', 'c', '+');

All arguments but the last one are essentially put in parenthesis and associated with the quantity specifier i.e. /(abc)+/ becomes thrifty('a','b','c','+')

Note the only quantifier mode supported is thrifty. Parse::Gnaw does not support greedy quantifiers.

Here is a list of ways you can define the last element passed into thrifty:

thrifty( ... , [3,9] ); 3 to 9 thrifty( ... , [3,] ); 3 or more thrifty( ... , [,9] ); 0 to 9 thrifty( ... , '3,9' ); 3 to 9 thrifty( ... , '3,' ); 3 or more thrifty( ... , ',9' ); 0 to 9 thrifty( ... , '+' ); 1 or more thrifty( ... , '*' ); 0 or more thrifty( ... , '?' ); 0 or 1

The thrifty function return value can be placed in a rule declaration.

process_first_arguments_and_return_hash_ref

Internal subroutine.

This processes the various ways to call the rule() function and fills in the pieces the caller doesn't pass in. Should always return a hash reference will all info filled in.

fragment_a_rule

Internal subroutine. Used to break up a rule into pieces so that a quantifier can operate correctly.

the rest of the code in this subroutine is to "reorder" the grammar. for example, this grammar: rule1 : 'a' rule2 'b' rule2 : 'c' thrifty('d') 'e' needs to rearrange the thrifty so that it can try to match a number of 'd' then it has to match 'e', then it has to match 'b' from the previous rule. if the thrifty quantifier fails, it has to try to match another 'd', then match 'e', then match 'b' from the previous rule.

This can't be done treating each rule as a subroutine/function as they appear because a quantifier can't return after it's matched 'd'. it has to match 'd', then match anything in the grammar anywhere in the grammar that occurs after it, and THEN it can return.

The way we're going to do this is by fragmenting/chopping up the rules any time we have a CALL or QUANTIFIER (quantifiers are actually calls) we are going to take everything AFTER THE CALL, and put it in its own rule fragment. the original call gets modified with a thencall=>rulefragment added to it.

for example: rule1 : 'a' call('rule2') 'c' qty(thrifty1) 'e' rule2 : 'b' thrifty1 : 'd'

we need to fragment rule1 rule1 : 'a' call('rule2') 'c' qty(thrifty1) 'e'

It can be viewed as getting fragmented as follows: rule1 : 'a' call('rule2') [ 'c' qty(thrifty1) ['e']] ^frag1 ^frag2

therfore it becomes rule1 : 'a' call('rule2',thencall=>rule1frag1) rule1frag1 : 'c' call('thrifty1', thencall=>rule1frag2) rule1frag2 : 'e'

this will allow all calls and quantifiers to treat the rest of the grammar after the call/quantifier as if it were part of a nested function call. The "thrifty" call doesn't return until it matches all the way to the end of the grammar, therefore, everything after the thrifty call needs to be treated as part of the thrifty function call.

in the above example, when we call rule 'thrifty1', we also pass in the fact that the rule after that is 'rule1frag2' this means 'thrifty1' can match 1 'd', and then call rule1frag2 to see if the rest of the grammar matches. if it fails, we can trap the failure in the 'thrifty1' call, and then we can try to match another 'd', and then try calling rule1frag2 again to see if the rest of the rule matches THAT.

copy_location_info_and_make_new_hash_ref

Internal subroutine.

given a hash created by process_first_arguments_and_return_hash_ref(), extract all location information and copy it to a newly created hash.

eval_string

Internal subroutine.

Pass in a string. Will call eval("") on it. If you want the eval to return a value, assign it to a special variable $eval_return. The value of $eval_return will be returned by eval_string() function.

fragment_suffix

Internal subroutine. returns a string that will be used to fragment any rules.

When a rule is fragmented, the fragments are named

        originalrule.fragment_suffix().integer_counter

Call this subroutine to return the string value for fragment_suffix().

get_ref_to_rulebook

Internal subroutine. Used to get a reference to the rulebook in the caller's package.

All rules for a package are placed into a package variable called (packagename)::rulebook.

This variable is a hash reference where the keys are the names of the rules and the data is an array reference for each rule.

get_ref_to_rulename

Internal subroutine. Used to get a reference to a specific rulename in the caller's package.

Each rule generated for a package is placed into the package as a scalar containing an array reference.

The array reference contains the rule information needed to parse a string.

format_package

Internal subroutine. Formats the package name into consistent string.

format_filename

Internal subroutine. Formats the filename into a consistent string.

format_linenum

Internal subroutine. Formats the line number into a consistent string.

BUGS ^

Please report any bugs or feature requests to bug-parse-Gnaw at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Parse-Gnaw. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT ^

You can find documentation for this module with the perldoc command.

    perldoc Parse::Gnaw

You can also look for information at:

ACKNOWLEDGEMENTS ^

LICENSE AND COPYRIGHT ^

Copyright 2013 Greg London.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

syntax highlighting: