Parse::Gnaw::Blocks::ParsingMethods - A base package containing all the methods that derived "letter" type classes will inherit.
We get a starting letter, and a reference to a rule. Because rules can be called like subroutines, we have to process the rule in such a way that it can be called from anywhere and we don't know who is calling us (or where we will return to).
Note that we need to keep track of whether a rule consumes the current letter or not. some rules might not consume anything (rules that execute callbacks, for example, or configure flags) So, we need to call rules until we consume at least the current letter.
if that is the last subrule, then return current letter.
if there are more subrules, we need to get a list of possible connections to go to next. then loop through each possible connection and try the subrule. If it succeeds, great. If it fails, eval/trap the failure and loop to next connection.
============================ This is a recursive call. ============================
grammar is ('a', 'b', 'c') text is c b b j a k m n o start at "g", look for 'a'. fail, move to next starting position. repeat until hit 'a' in center position. match 'a' at center. look a next rule, it's defined and true, so we need to look for it. Howerver, we need to try every possible direction from "A". 'b', 'b', 'k', 'o', 'n', 'm', 'j', 'c' This means every connection needs to trap for die "GRAMMARFAIL" if we try direction 'h', and it dies, we need to trap the error and try the next option.
This runs an entire rule exactly one time.
It does not call then_call.
It does not address quantifier issues.
The "rule" method is just a placeholder for the first index into each rule array. This is where we store the name of the rule and any othe rule-specific info. For now, it doesn't do anything.
When one grammar rule needs to call another rule (including itself), this method will get executed.
Note that this supports recursive calling. A rule can call itself. A first rule can call a second rule which calls the first rule.
The main reason this works is that when a rule "calls" another rule, it doesn't actually CONTAIN the rule. A rule is actually made up of a perl array.
my $rule1=[ [ 'lit', 'a' ], [ 'lit', 'b' ], ];
If a rule "calls" itself, it simply points to the name of the rule its calling:
my $rule1=[ [ 'lit', 'a' ], [ 'call', 'rule1' ], [ 'lit', 'b' ], ];
If a call to a rule resulted in the rule being called being expanded and embedded into the original rule, then recursive rules would explode.
my $rule1; $rule1=[ [ 'lit', 'a' ], [ 'call', $rule1 ], [ 'lit', 'b' ], ];
This would become problematic because it would want to expand itself forever (which would be grammatically correct, but explode your memory) or it would only expand one level (which would fit in memory, and be grammatically incorrect).
If a call to another rule only contains the NAME of the rule being called, then it won't explode memory.
A rule will call a rule only when it needs to, and not explode memory.
So then the only other issue that can cause a recursive rule ot explode is if a rule calls itself before matching any text in the source string.
This rule will explode when we try to match it against some text:
$rule1=[ [ 'call', 'rule1' ], [ 'lit', 'b' ], ];
The above example will break because rule1 will keep calling itself infinitely without ever matching anything.
For the above example NOT to crash, we will eventually have to upgrade the "call" method to detect whether a recursive call is taking place, and if so, check to see that at least SOME text has been consumed. If not, skip the call and look for an alternation or something.
For now, we can handle recurssion, but only if we match some text first:
$rule1=[ ['lit','a'], ['call', 'rule1'], ];
lit is short for literal. It is looking for the current letter object to match the letter value in $subrule.
my $rule1=[ [ 'lit', 'a' ], [ 'lit', 'b' ], ];
The above example is looking for 'a' followed by 'b'.
This is short for "character class". In perl regular expressions, this is represented with . The letters in the square brackets are letters in teh character class you wnat to match. For example, [aeiou] would match a character class of any single vowel.
This is short for "not character class". In perl regular expressions, this is represented with [^ ]. The letters in the square brackets are letters in teh character class you do NOT want to match. For example, [^aeiou] would NOT match a character class of any single vowel. Or it WOULD match any character that is NOT a vowel.
perform a thrifty quantifier match
Note: Since we want to be able to read petabytes of streamed data, we will default to using thrifty matching. i.e. match as little as possible and move on. if we do greedy matching, then the first .* we run into will read in the entire stream (petabytes) into memory and crash the system. if it doesn't crash, it will back up until it finds amatch. We default to thrifty matching, meaning we only read in as little as possible to still find a match. This means we only read in just as much of the stream as we need to find a match. We can DO greedy matching, but it can be a problem if we're streaming massive quantities of data.
basic thrifty algorithm: try the rule at least min times. if that matches, then return and let rest of grammar try. If rest of grammar dies, then revert to min location and try matching one more time. if that passes, then return and let rest of grammar try. if rest of grammar dies, then revert to min+1 location and try another rule.
keep doing this until you reach "max" number of matches. if that doesn't make things happy, then quantifier dies and the expression fails.
rule1 : 'a' rule2 'b'
rule2 : 'c' d+ rule3 e+
rule3 : f g+ rule4 h
rule4 : i*
basic greedy algorithm. try the rule max times. if not even zero match, die. at the end of every match, record the letter location of that specific match.
return and let rest of grammar try. if rest of grammar dies, then revert to max-1 location, and try another rule. return and let rest of grammar try. if rest of grammar dies, then revert to max-2 location and try another rule.
keep doing this until you reach "min" number of matches. we can't find a match even at "min", then quantifier dies and the expression fails.