Marpa::XS::Semantics::Order - How Marpa ranks ambiguous parses
Marpa allows ambiguous parses. While an unambiguous parse can produce at most one parse tree and one parse result, an ambiguous parse will produce a parse series. A parse series is a sequence of parse trees, each of which will have its own parse result.
This document describes ways of controlling the order in which the recognizer's
value method evaluates the parse trees of an ambiguous parse.
It also describes ways to exclude selected parse trees from the parse series.
When evaluating the parse trees in a parse series, Marpa never evaluates the same parse tree twice. Marpa considers two parse trees to be the same if they are semantic duplicates.
Two parse trees are semantic duplicates if and only if a recursive, top-down evaluation of each applies the same rules in the same order at the same earleme locations. This definition implies that, given any deterministic semantics, two parse trees which are semantic duplicates will always produce the same parse result -- hence the term "semantic duplicates". When the Marpa documentation refers to duplicate parses, it will mean semantic duplicates unless otherwise stated.
By calling the recognizer's
value method repeatedly,
Marpa can produce all the parse results in the current parse series.
The default is for the parse results to be returned in an arbitrary order.
This corresponds to the "
none" value of the recognizer's
ranking_method named argument.
When this document calls a behavior "arbitrary" it means that an application must not rely on that behavior being repeated. An arbitrary behavior is allowed to change from version to version of the software, from run to run of the application, and even from call to call of the methods.
"Arbitrary parse order", in particular, means that the particular order in which parse trees are evaluated may not be repeated. But an application can expect, even when the order of evaluation of the parse trees is arbitrary, that all appropriate parse trees will be included in the parse series, and that none of the parse trees in the parse series will be the semantic duplicate of any other tree.
Marpa grammar objects have a
ranking_method named argument,
whose value can be the name of a ranking method,
indicating that the default ranking method is to be used.
The rule method ranks alternative parses according to their rules.
Every rule has a rule rank.
A rule's rank can be specified using the the
rank named argument for that rule.
Rule ranks must be integers.
If no rule rank is specified,
the rule rank is 0.
When being ranked,
parse trees are traversed top-down,
high_rule_only ranking method is similar to the
rule ranking method,
at every choice point,
it discards all of the choices which have a rank lower than that of the highest ranked alternative.
In carrying out the ranking,
the choice points of the parse trees are visited in top-down,
high_rule_only ranking method eliminates some parse trees,
it can reduce or eliminate the ambiguity of a parse,
but it does not necessarily do either.
This is because,
at each choice point among the parse trees,
it is possible that several of the choices,
or all of them,
will have the same rank as the highest ranked alternative.
Some rules have a RHS which contains proper nullables: symbols which may be nulled, but which are not nulling symbols. (Nulling symbols are symbols which are always nulled.)
When a rule contains proper nullables, each application of that rule to a section of input creates a nulling variant -- that rule with a specific pattern of null and non-null symbols. In many cases, this creates an ambiguity -- different nulling variants could apply to the same substring of input. In ambiguous parsings of this kind, some applications may want to rank nulling variants that start with non-null symbols higher. Other applications may want to do the opposite -- to rank nulling variants that start with null symbols higher.
null_ranking named argument for rules specifies which nulling variants are ranked high or low.
Ranking of nulling variants is done left-to-right,
with the null preference as indicated by the
null_ranking named argument.
null_ranking is "
then the closer a nulling variant places its non-null symbols to the start of the rule,
the higher it ranks.
null_ranking is "
then the closer a nulling variant places its null symbols to the start of the rule,
the higher it ranks.
A rule is null ranked if and only if its
null_ranking is defined.
Note that this definition means that a rule can be considered null ranked even if it does not have any nullable symbols.
A rule which is not null ranked is called non-null-ranked.
Null ranking, in the current implementation, is targeted at situations where the alternatives are different applications of a single rule. In situations where the choice is between two different rules, an application which cares about the order will typically want to override the null ranking using rule ranks. A more detailed description of the rule comparison logic can be found below.
null_ranking is undefined,
which means the order of the null variants is arbitrary.
This is fine for most applications.
the logic descends the parse trees top-down and left-to-right.
Places where there is more than one alternative are called choice points.
Alternatives are ranked (in the
rule ranking method) or selected (in the
high_rule_only ranking method),
by comparing them as follows:
The most general way to sort Marpa parses is for the application to take control.
The application can set up the Marpa semantic actions so that the parse result of every parse tree is a
The duples can then be sorted by
Once the resuls are sorted,
rank element of the duple can be discarded.
(Those familiar with the Schwartzian transform may note a resemblance.
duples can be implemented as references to arrays of 2 elements.)
The user needs to be careful. In theory, ambiguity can cause an exponential explosion in the number of results. In practice, ambiguity tends to get out of hand very easily. Producing and sorting all the parses can take a very long time.
constant ranking method is no longer implemented.
An initial attempt to find the ideal compromise between flexibility and efficiency,
constant ranking method proved too ambitious,
and had several drawbacks.
its overhead was high in comparison to the flexibility it offered.
it was not intuitive.
Even when the
constant ranking method was powerful enough to solve a problem,
it was not always obvious how to actually use it.
its implementation was extremely complex.
rule ranking method is far more efficient,
vastly more intuitive,
and far easier to implement and maintain.
But the new
rule ranking method is also very flexible -- it passes the test suite which was written for the
constant ranking method.
Copyright 2012 Jeffrey Kegler This file is part of Marpa::XS. Marpa::XS is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Marpa::XS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with Marpa::XS. If not, see http://www.gnu.org/licenses/.