Object::Relation::Parser::Overview - Developer's overview of Object::Relation::Parser
See Object::Relation::Parser.
This document explains how the parsing system turns search requests into an Intermediate Representation (IR) suitable for searching a datastore.
This discussion will be a high-level overview of how the Object::Relation::Parser class operates.
Object::Relation::Parser uses the Parser module described in "Higher Order Perl," by Mark Jason Dominus, to transform a series of tokens into an intermediate representation that the Object::Relation::Handle subclasses can use to initiate searches against a datastore.
The following non-standard BNF grammar describes the legal grammar allowed for searches. I say "non-standard" because of the existence of regular expressions to explain some of the complicated terminals. The regular expressions in the %RE hash used in search_value are from Regexp::Common.
%RE
entire_input ::= statements 'End_Of_Input' statements ::= statement | statement ',' statements statement ::= statement_list | 'AND' '(' statement_list ')' | 'OR' '(' statement_list ')' statement_list ::= search | search ',' # allow trailing commmas | search ',' statement | search ',' statement ',' # allow trailing commmas search ::= identifier '=>' normal_value | identifier '=>' 'NOT' normal_value | identifier '=>' between_value | identifier '=>' 'NOT' between_value | identifier normal_value | identifier 'NOT' normal_value | identifier between_value | identifier 'NOT' between_value normal_value ::= value | compare value | any between_value ::= 'BETWEEN' '[' value ',' value ']' | 'BETWEEN' '[' value '=>' value ']' | '[' value ',' value ']' | '[' value '=>' value ']' | 'BETWEEN' '(' value '=>' value ')' | 'BETWEEN' '(' value ',' value ')' # BETWEEN is not optional when using parens any ::= 'ANY' '(' any_list ')' any_list ::= search_value | search_value ',' any_list | search_value '=>' any_list | search_value ',' any_list ',' # allow trailing commas | search_value '=>' any_list '=>' # allow trailing commas value ::= search_value | undef search_value ::= /(?!\.(?!\d))(?:$RE{quoted}|$RE{num}{real})/; identifier ::= /[[:alpha:]][.[:word:]]*/ compare ::= 'LIKE' | 'GT' | 'LT' | 'GE' | 'LE' | 'NE' | 'MATCH' | 'EQ'
Traditionally, parsing code involves three steps:
Tokenizing
Breaking code into a series of discrete tokens. Whitespace is often discarded.
# name => LIKE 'foo', age => GT 21 @tokens = qw( name => LIKE ' foo ' , age => GT 21 );
Lexing
Then meaning is assigned to the tokens. Some tokens may be combined and others discarded.
[ IDENTIFIER => 'name' ], [ OP => '=>' ], [ COMPARE => 'LIKE' ], [ OP => ',' ], [ VALUE => 'foo' ], [ OP => ',' ], [ IDENTIFIER => 'age' ], [ OP => '=>' ], [ COMPARE => 'GT' ], [ VALUE => 3 ],
Parsing
Parsing is the act of converting the lexed tokens into a data structure that suits a given task. This structure is often referred to as an intermediate representation, or IR. This document will briefly discuss the IR created by Object::Relation::Parser and how it's created.
The Object::Relation lexers Object::Relation:Store::Lexer::Code and Object::Relation::Lexer::String both combine the tokenizing and lexing into a single step. See Object::Relation::Lexer::Overview for details.
Object::Relation:Store::Lexer::Code
This module has only one public function, parse, which expects a stream as returned by a lexer and a Object::Relation store object:
parse
my $stream = code_lexer_stream($search_request); my $ir = parse($stream, $store);
The parse() function will then either throw a Object::Relation::Exception::Fatal::Search exception or return an IR.
parse()
Object::Relation::Exception::Fatal::Search
The store object is used in the final step of building the IR to determine the names of the attributes on which the actual search will take place. Attempting to search on an attribute that the current "search class" does not have will throw an exception.
In the grammar above, the LHS symbols usually represent small "sub-parsers", each capable of handling the parsing of the RHS. Thus, the first step in parse() is to pass the input stream to the $entire_input parser:
$entire_input
my $entire_input = T( concatenate( $statements, \&End_of_Input ), sub { $_[0] } );
That should be fairly straightforward to read. The T() function expects a parser (generated by the concatenate function) and an anonymous subroutine that transforms the input (T being short for transform). In this case, it's telling us that we merely wish to return the parse for the statements and not the End_of_Input.
T()
concatenate
T
End_of_Input
Now let's take a look at a more complicated example. Here's the section of the grammar for "between" values.
between_value ::= 'BETWEEN' '[' value ',' value ']' | 'BETWEEN' '[' value '=>' value ']' | '[' value ',' value ']' | '[' value '=>' value ']'
This will match any of the following:
BETWEEN [ 'foo', 'bar' ] BETWEEN [ 'foo' => 'bar' ] [ 'foo', 'bar' ] [ 'foo' => 'bar' ]
Here's how the subparser looks:
my $between_value = T( concatenate( star(match(KEYWORD => 'BETWEEN')), # 0 $lbracket, # 1 $Value, # 2 alternate( $fat_comma, # 3 $comma, ), $Value, # 4 $rbracket, ), sub { ['BETWEEN', [_normalize_value($_[2]), _normalize_value($_[4])]] } );
What this says is "concatenate this stuff". The star function tells us that the "BETWEEN" keyword is optional. Then we match an $lbracket ("["), a value ("foo"), either a regular comma (",") or a fat comma ("=>"), another value ("bar"), and an $rbracket, ("]"). The transformation returns a new token with the string "BETWEEN" and two normalized values.
star
$lbracket
$rbracket
As of this writing, the normalized value function merely examines the values to see if they match a Object::Relation::DataType::DateTime::Incomplete string and, if they do, return an object representing them.
It's not until the parser reaches the "search" LHS bit of the grammar that it has enough information to create a Object::Relation::Search object.
search ::= identifier '=>' normal_value | identifier '=>' 'NOT' normal_value | identifier '=>' between_value | identifier '=>' 'NOT' between_value
At this point, it calls the _make_search function with the data it has matched to build and return a search object. This is when it uses the Object::Relation::Handle object that was passed in to validate that the search can actually be performed on the Object::Relation class on which we are currently searching.
_make_search
The IR is merely an array reference of Object::Relation::Search objects and groups of these Search objects. Groups may be simple groups, "AND" groups, or "OR" groups.
Simple groups are merely a reference to an array of search objects.
[ $search1, $search2, $search3, ]
All searches in a simple group must succeed for the search to succeed.
"AND" groups are array references with the string "AND" as the first element.
[ [ 'AND', $search2, $search3, ] ]
An "OR" group is the string "OR" followed by an array reference.
[ 'OR', [ $search1, $search2, ] ]
Any of the above groups may be nested.
AND( name => 'foo', l_name => 'something', ), OR( age => GT 3 ), OR( one__type => LIKE 'email', fav_number => GE 42 )
The above search, whether it is a code search or a string search, should produce the same IR:
[ [ 'AND', $name_search, $lname_search, ], 'OR', [ $age_search, ], 'OR', [ $one_type_search, $fav_number_search, ] ]
Copyright (c) 2004-2006 Kineticode, Inc. <info@obj_relode.com>
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Object::Relation, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Object::Relation
CPAN shell
perl -MCPAN -e shell install Object::Relation
For more information on module installation, please visit the detailed CPAN module installation guide.