Parse::FSM - Deterministic top-down parser based on a Finite State Machine
use Parse::FSM; $fsm = Parse::FSM->new; $fsm->prolog($text); $fsm->epilog($text); $fsm->add_rule($name, @elems, $action); $fsm->start_rule($name); $fsm->parse_grammar($text); $fsm->write_module($module); $fsm->write_module($module, $file); $parser = $fsm->parser; # isa Parse::FSM::Driver $parser->input(\&lexer); $result = $parser->parse; # script perl -MParse::FSM - Grammar.yp Parser::Module perl -MParse::FSM - Grammar.yp Parser::Module lib\Parser\Module.pm
This module compiles the Finite State Machine used by the Parse::FSM::Driver parser module.
It can be used by by a sequence of add_rule calls, or by parsing a yacc-like grammar in one go with parse_grammar.
add_rule
parse_grammar
Can be used as a script to generate a module from a grammar file.
The result of compiling the parser can be used immediately by retrieving the parser object, or a pre-compiled module can be written to disk by write_module. This module can then be used by the client code of the parser.
parser
write_module
As usual in top-down parsers, left recursion is not supported and generates an infinite loop. This parser is deterministic and does not implement backtracking.
Creates a new object.
Name of the grammar start rule. It defaults to the first rule added by add_rule.
Perl code to include in the generated module near the start of the generated module and near the end of it.
Adds one rule to the parser.
$fsm->add_rule($name, @elems, $action);
$name is the name of the rule, i.e. the syntatic object recognized by the rule.
$name
@elems is the list of elements in sequence needed to recognize this rule. Each element can be one of:
@elems
A string that will match with that token type from the lexer.
The empty string is used to match the end of input and should be present in the grammar to force the parser to accept all the input;
An array refernce of a list of all possible tokens to accept at this position.
A subrule name inside square brackets, optionally followed by a repetion character, that asks the parser to recursively descend to match that subrule at the current input location.
The accepted forms are:
[term] - recurse to the term rule;
[term]
[term]? - term is optional;
[term]?
[term]* - accept zero or more terms;
[term]*
[term]+ - accept one or more terms;
[term]+
[term]<+,> - accept one or more terms separated by commas, any token type can be used instead of the comma;
[term]<+,>
$action is the Perl text of the action executed when the rule is recognized, i.e. all elements were found in sequence.
$action
It has to be enclosed in brackets {}, and can use the following lexical variables, that are declared by the generated code:
{}
$self : object pointer;
$self
@item : values of all the tokens or rules identified in this rule. The subrule call with repetions return an array reference containing all the found items in the subrule;
@item
Parses the given grammar text and adds to the parser. Example grammar follows:
{ # prolog use MyLibrary; } main : (number | name)+ <eof> ; number : 'NUMBER' { $item[0][1] } ; # comment name : 'NAME' { $item[0][1] } ; # comment expr : <list: number '+' number > ; <start: main > { # epilog sub util_method {...} }
If the text contains a code block surronded by braces before the first rule definition, the text is copied without the external braces to the prolog of generated module.
If the text contains a code block surronded by braces after the last rule definition, the text is copied without the external braces to the epilog of generated module.
Statement are either rule definitions of directives and end with a semi-colon ;. Comments are as in Perl, from a hash # sign to the end of the line.
;
#
A rule defines one sentence to match in the grammar. The first rule defined is the default start rule, i.e. the rule parsed by default on the input. A rule name must start with a letter and contain only letters, digits and the underscore character.
The rule definition follows after a colon and is composed of a sequence of tokens (quoted strings) and sub-rules, to match in sequence. The rule matches when all the tokens and sub-rules in the definition match in sequence.
The top level rule should end with <eof> to make sure all input is parsed.
<eof>
The rule can define several alternative definitions separated by '|'.
The rule definition finishes with a semi-colon ';'.
A rule can call an anonymous sub-rule eclosed in parentheses.
The last item in the rule definition is a text delimited by {} with the code to execute when the rule is matched. The code can use $self to refer to the Parser object, and @item to refer to the values of each of the tokens and sub-rules matched. The return value from the code defines the value of the rule, passed to the upper level rule, or returned as the parse result.
If no action is supplied, a default action returns an array reference with the result of all tokens and sub-rules of the matched sentence.
Every token or sub-rule can be followed by a repetition specification: '?' (zero or one), '*' (zero or more), '+' (one or more), or '<+,>' (comma-separated list, comma can be replaced by any token).
Directives are written with angle brackets.
Can be used in a rule instead of the empty string to represent the end of input.
Shortcut for creating lists of operators separated by tokens, returns the list of rule and token values.
Defines the start rule of the grammar. By default the first defined rule is the start rule; use <start:> to override that.
<start:>
Computes the Finite State Machine to execute the parser and returns a Parse::FSM::Driver object that implements the parser.
Usefull to build the parser and execute it in the same program, but with the run-time penalty of the time to setup the state tables.
Receives as input the module name and the output file name and writes the parser module.
The file name is optional; if not supplied is computed from the module name by replacing :: by / and appending .pm, e.g. Parse/Module.pm.
::
/
.pm
Parse/Module.pm
The generated code includes parse_XXX functions for every rule XXX found in the grammar, as a short-cut for calling parse('XXX').
parse_XXX
XXX
parse('XXX')
The setup of the parsing tables and creating the parsing module may take up considerable time. Therefore it is usefull to separate the parser generation phase from the parsing phase.
A parser module can be created from a yacc-like grammar file by the following command. The generated file (last parameter) is optional; if not supplied is computed from the module name by replacing :: by / and appending .pm, e.g. Parse/Module.pm:
perl -MParse::FSM - Grammar.yp Parser::Module perl -MParse::FSM - Grammar.yp Parser::Module lib\Parser\Module.pm
This is equivalent to the following Perl program:
#!perl use Parse::FSM; Parse::FSM->precompile(@ARGV);
The class method precompile receives as argumens the grammar file, the generated module name and an optional file name, and creates the parsing module.
precompile
Paulo Custodio, <pscust at cpan.org>
<pscust at cpan.org>
Calling pre-compiler on import borrowed from Parse::RecDescent.
import
Please report any bugs or feature requests through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Parse-FSM.
Copyright (C) 2010-2011 Paulo Custodio.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install Parse::FSM, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Parse::FSM
CPAN shell
perl -MCPAN -e shell install Parse::FSM
For more information on module installation, please visit the detailed CPAN module installation guide.