The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Marpa::R3::Event - Parse events

Synopsis

    my @events = ();
    my @event_history = ();
    my $next_lexeme_length = undef;

    my $slr = Marpa::R3::Scanless::R->new( { grammar => $grammar,
        event_handlers => {
            "'default" => sub () {
                my ($slr, $event_name, undef, undef, undef, $length) = @_;
                if ($event_name eq 'insert d') {
                   $next_lexeme_length = $length;
                }
                push @events, $event_name;
                'pause';
            }
        }
    } );

    push @event_history, join q{ }, "Events at position 0:", sort @events
       if @events;
    @events = ();

    my $input = q{a b c "insert d here" e e f h};
    my $length = length $input;
    my $pos    = $slr->read( \$input );

    READ: while (1) {

        push @event_history, join q{ }, "Events at position $pos:", sort @events
           if @events;
        @events = ();

        if (defined $next_lexeme_length) {
            $slr->lexeme_read_literal('real d', undef, undef, $next_lexeme_length);
            (undef, $pos) = $slr->block_progress();
            $next_lexeme_length = undef;
            next READ;
        }
        if ($pos < $length) {
            $pos = $slr->resume();
            next READ;
        }
        last READ;
    } ## end READ: while (1)

The synopsis is extracted from an example given in full below.

About this document

This document is an overview of parse events. One example of a circumstance for which an event can be created is the prediction of a symbol in the parse. Another example is the recognition of a symbol in the parse. A third example is parse exhaustion.

Parse events are often used to allow an application to switch over to its own custom procedural logic. Among other things, an application can do its own "external" scanning of lexemes. An application may ask Marpa to resume internal scanning at any point.

Terminology

  • In contexts where the meaning is clear, Parse events are called events.

  • In this document, an instance of a symbol in a parse means an occurrence of the symbol in that parse with a specific start location and a specific length. An instance of a symbol is also called a symbol instance. A consequence of this definition is that every symbol instance has exactly one end location.

  • A potential symbol instance is one which might appear in the parse.

  • An actual symbol instance is one which actually appears in the parse.

  • For a specific string and a specific grammar, we say that a parse is valid if some parse of the string exists according to the grammar.

  • In a parse, a nulled symbol instance, or nulled symbol, is a symbol instance with a length of zero.

  • A non-nulled symbol instance, non-nulled instance, or non-nulled symbol, is a symbol instance which is not nulled.

  • A symbol in a grammar is a nullable symbol if it has at least one nulled instance in at least one valid parse of at least one string.

  • A symbol in a grammar is a nulling symbol if, for all strings, all of its instances in valid parses are nulled instances.

  • A symbol in a grammar is non-nulling if it is not a nulling symbol.

  • A string is actual to location L if we know the actual input as far as L, and the string is consistent with the actual input.

  • If we know the input as far as location L and a symbol <S> that starts at L is in some parse of some string actual to L, then we say that symbol <S> is acceptable at a location L. Where location L is understood, we just say that symbol <S> is acceptable.

  • If a symbol <S> ends at L, and we know the parse as far as location L, we say that symbol <S> is recognized at a location L.

For those readers who find formal definitions useful, some of this terminology is defined more carefully below.

Types of parse event

Prase events are of two types:

  • Instance events report the status of one particular symbol instance, which may be actual or potential. Prediction events, for example, report potential symbol instances. Instance events are declared in the Marpa::R3 DSL.

  • Location events report the general status of the parse at particular location, rather than the status of one particular instance. The symbol instance whose status the event reports does not have to be an actual symbol instance.

Any event which is not an instance event, is a location event. There are two types of location event: rejection and exhaustion. A rejection event is declared using the grammar's rejection named argument. A exhaustion event is declared using the grammar's exhaustion named argument.

An instance event is either a lexeme event or a non-lexeme event. A lexeme event is declared using a :lexeme pseudo-rule. A non-lexeme event is declared using a named event statement.

The various types of parse events are described in detail below. The description of each type of parse event will state whether it is a location event, a lexeme event or a non-lexeme event.

The life cycle of events

  • An parse event must be declared.

  • A declared event may trigger.

Once declared, a parse event may trigger during any event-triggering recognizer method. The event-triggering recognizer methods are the recognizer's new() constructor, read(), resume(), lexeme_read_block(), lexeme_read_literal(), lexeme_read_string() and the lexeme_complete() method.

The location at which a parse event triggers is the trigger location. An event may trigger at any parse location, including location 0. It is important to keep in mind that location 0 events will trigger in the recognizer's new() constructor.

In addition to a trigger location, an event has an event location. The event location is the same as the trigger location, except for pre-lexeme events. For pre-lexeme events, the event location is before the trigger location.

When an event triggers, the event handler may pause the recognizer. If an event handler pauses the recognizer, it causes the event-triggering method to return as soon as all processing at the event location is finished, but before any processing at later locations.

Some event-triggering methods always return after processing at each parse location is finished, and for those methods, it makes no difference if one of the event handlers pauses the recognizer. But, in the case of the read() and the resume() methods, if an event handler pauses the recognizer, the method may return control to the user earlier than it otherwise would.

Recall that, in the case of pre-lexeme events, the event location is before the trigger location. This means, by using a pre-lexeme event to pause the recognizer, a Marpa::R3 app can enforce a form of look-ahead.

An event will only trigger if activated. By default, all events are activated when declared. The triggering of parse instance events may be controlled with the activate() method.

Event handlers

Events are processed using Perl closures called event handlers. Event handlers are specified using the event_handlers named argument of the recognizer's new() constructor.

Two or more arguments are passed to the event handler. The first argument is always the recognizer. Subsequent arguments are the event data. The first element of the event data, which is the second argument of the event handler, is always the event name. Whether there is more than one element in the event data, and therefore what the third and later arguments to the event handler are, depends on the type of event. Below, the details of the event data arguments are described for each type of event.

An event handler is expected to return a single value. This value may be either a string, or a Perl undef. If the return value of the event handler is a string, it must be either 'pause' or 'ok'. If the return value of the event handler is a Perl undef, it will be treated as if it was the string 'ok'.

Let L be a parse location. The recognizer pauses at L if any of event handlers triggered at L return 'pause'. If none of event handlers triggered at L return 'pause', the recognizer returns control to the app at L only if the method that triggered the event would ordinarily return control to the app at L.

A return of 'ok' by a parse handler can be thought of as a command to "continue as you ordinarily would", and a 'pause' return as an instruction to pause the recognizer at the current location. From this point of view, a 'pause' return value is seen to override any number of 'ok' return values at the same parse location.

Event handlers are Perl closures, and an event handler has access to those variables in scope at the point where it is defined. When a handler needs to access data visible only in multiple, distant scopes, a "factory" can be used. Below are several examples of event handlers, including one where the event handlers are instantiated by a factory closure.

An event handler may not call, directly or indirectly, certain methods of the same recognizer. The restricted methods include all those recognizer methods which may themselves trigger parse events. The restricted methods are read(), resume(), lexeme_read_block(), lexeme_read_literal(), lexeme_read_string() and lexeme_complete().

This restriction applies only to methods of the recognizer to which the event handler belongs. Calls to methods of a different recognizer are not restricted.

Types of parse event

Completion events

Completion events are declared in the Marpa::R3 DSL using the named event statement:

    event 'a' = completed A
    event 'b'=off = completed B
    event 'c'=on = completed C
    event 'd' = completed D

A completion event triggers when a non-nulled instance of its symbol is recognized at the current location. Completion events are non-lexeme events. A completion parse event can be specified for any symbol that is not a lexeme.

When a completion event triggers, its event location is set to the current location, which will be the end location of the instance that triggered the event. The event is called a "completion" because, at the event location, the recognition of its symbol is "complete".

A completion event passes two arguments to its event handler: the recognizer, and the event name.

Discard events

    :discard ~ ws event => ws
    ws ~ [\s]+
    :discard ~ [,] event => comma=off
    :discard ~ [;] event => 'semicolon'=on
    :discard ~ [.] event => period

Discard events are specified in discard pseudo-rules. They are non-lexeme events. This may seem counter-intuitive, but a lexeme must be a symbol visible to the G1 grammar -- discarded symbols are discarded before the G1 grammar can see them.

When a discard event triggers, its event location is set to the current location. This will be the end location of the discarded text.

A discard event passes the following arguments to its event handler, in order:

  • The recognizer.

  • The name of the discard event.

  • The index of the block containing the discarded text.

  • The start position of the discarded text, as a zero-based offset from the start of the block.

  • The length, in codepoints, of the discarded text.

  • The G1 location of the last lexeme read.

An intended purpose of the G1 location is to allow the synchronization of data taken from discard events with the parse tree. L0 locations are not sufficient to do this, because Marpa::R3 allows an app to move backward and forward both within and between blocks of input. G1 locations are always in left-to-right order from the point of view of parse tree.

Note that, since discarded text is never actually seen by G1, in the strict sense it cannot have a G1 location. The G1 location reported with the discard event is that of the last lexeme read before the discarded text. All lexemes have G1 locations. If the discarded text is at the beginning of the parse, before any lexemes have been read, the G1 location is reported as zero.

Nulling events

A nulling event is declared in the Marpa::R3 DSL using the named event statement:

    event '!a' = nulled A
    event '!b'=off = nulled B
    event '!c'=on = nulled C
    event '!d' = nulled D

A nulling parse event occurs when a nulled instance of its symbol is recognized at the current location. When a nulling event triggers, its event location is set to the current location, which will be the location where the triggering instance both begins and ends.

A nulling event is a non-lexeme event. A nulling parse event can be specifed for any symbol that is not a lexeme. A nulled symbol may derive other null symbols, producing one or more nulled trees; because a null derivation may be ambiguous, a nulled symbol may derive more than one nulled tree. A set of one or more nulled trees is called a nulled forest.

When a nulling event triggers for a symbol instance, all activated nulling events declared for symbols derived from the triggered symbol instance will also trigger. The triggering of nulling events is recursive, so that when a nulled symbol instance triggers an event, it triggers all the events in the nulled forest derived from the triggering symbol instance. Nulled forests are described in more detail in a separate section.

A nulling event passes two arguments to its event handler: the recognizer, and the event name.

Prediction events

A prediction event is declared in the Marpa::R3 DSL using the named event statement:

    event '^a' = predicted A

A prediction event triggers when a non-nulling symbol is acceptable at the current location. When a prediction event triggers, its event location is set to the current location. A prediction may not result in an actual instance of the symbol, but no actual symbol instance can start at the event location unless a prediction, if properly declared and activated, would trigger at that location.

Prediction parse events may be defined for any symbol, whether it is a lexeme or not. But prediction events are non-lexeme events, even when their symbol is a lexeme.

A prediction event passes two arguments to its event handler: the recognizer, and the event name.

Pre-lexeme events

    :lexeme ~ <insert d> pause => before event => 'insert d'

A pre-lexeme event is a lexeme event. It triggers if the lexeme is scanned at the current location. When a pre-lexeme event triggers, its event location is set to the location where the lexeme starts, which will be before the trigger location.

The recognizer will not have read the lexeme when its pre-lexeme event triggers. In effect, a pre-lexeme event "rewinds" the scanning.

For most events, the trigger location is the event location, but pre-lexeme events are the exception. Pre-lexeme events set the event location to the start of the lexeme, which is consistent with the pre-lexeme event's behavior as a "rewind". An intended use of pre-lexeme events is catching a lexeme which is about to be read, and giving it special treatment. For more on this, see below.

There is a lot of similarity between pre-lexeme events and predictions, but there are also very important differences.

  • A pre-lexeme event does not occur unless the triggering lexeme is actually found in the input. On the other hand, a prediction event is, as the name suggests, only a prediction -- the triggering lexeme may never actually be found in the input.

  • Even though they have the same event location, pre-lexeme and prediction events do not trigger at the same time, because pre-lexeme events require a scan of the lexeme, while prediction events do not. If both are defined for a symbol, the prediction event will trigger first, before the lexeme is scanned. The pre-lexeme event will trigger later, after the lexeme is scanned.

  • Pre-lexeme events can be defined only for lexemes. Prediction events can be defined for any symbol.

A pre-lexeme event passes the following arguments to its event handler, in order:

  • The recognizer.

  • The name of the pre-lexeme event.

  • The symbol ID of the lexeme.

  • The index of the block containing the lexeme.

  • The start position of the lexeme, as a zero-based offset from the start of the block.

  • The length, in codepoints, of the lexeme.

Post-lexeme events

    :lexeme ~ <a> pause => after event => '"a"'

A post-lexeme event is a lexeme event. It triggers if the lexeme is scanned at the current location. The recognizer will have already read the lexeme when its post-lexeme event triggers.

When a post-lexeme event triggers, its event location is set to the current location, which will also be the location where the lexeme ends.

A post-lexeme event passes the following arguments to its event handler, in order:

  • The recognizer.

  • The name of the post-lexeme event.

  • The symbol ID of the lexeme.

  • The index of the block containing the lexeme.

  • The start position of the lexeme, as a zero-based offset from the start of the block.

  • The length, in codepoints, of the lexeme.

Exhaustion events

    my $g = Marpa::R3::Scanless::G->new(
        {
            source     => \$dsl,
            exhaustion => 'event',
            rejection  => 'event',
        }
    );
        my @shortest_span = ();

        my %event_handlers1 = (
              'target' => sub {
                   my ($slr) = @_;
                   my (undef, $pos) = $slr->block_progress();
                   @shortest_span = $slr->last_completed('target');
                   diag(
                       "Preliminary target at $pos: ",
                       $slr->g1_literal(@shortest_span)
                   ) if $verbose;
                  return 'pause';
              },
              q{'exhausted} => sub {
                  return 'pause';
              }
        );

        my $recce =
          Marpa::R3::Scanless::R->new( { grammar => $g,
              event_handlers => \%event_handlers1,
          }, $recce_debug_args );
        my $pos = $recce->read( \$string, $target_start );

Exhaustion events are location events. An exhaustion parse event triggers on asynchronous parse exhaustion, if the grammar's exhaustion setting is "event". The name of the event is "'exhausted" (The initial single quote is part of the event's name, and indicates it is a reserved name, which will not conflict with the name of any user-named event.)

Intuitively, parse exhaustion events are created only when needed for control to return to the application. More formally, a parse exhaustion event is called asynchronous if it occurs in a method, and at a location, where the method would have continued reading under "ordinary circumstances". In this context, "ordinar circumstances" means

  • that parse exhaustion has not occurred, and

  • that no event handler has paused the recognizer.

A parse exhaustion event is called synchronous if it is not asynchronous.

Parse exhaustion in the lexeme_read_block(), lexeme_read_literal(), lexeme_read_string() and the lexeme_complete() methods is always synchronous, because they always return control to the app after every attempt to read input -- they never try to continue reading input. Parse exhaustion in the recognizer's new() constructor is always synchronous, because it can only occur if the grammar is nulling. Parse exhausion in the read() or the resume() methods may be either synchronous or asynchronous.

Exhaustion events are typically used to ensure that the recognizer pauses, instead of throwing a fatal error. For purposes of subsequent processing, an app often cares whether a parse is exhausted or not, but far less often cares whether that exhaustion is synchronous or asynchronous. Since an exhaustion event only occurs on asynchronous exhaustion, apps will often not use the event to determine whether the parse is exhausted, but will instead call the exhausted() method. The exhausted() method reports both asynchronous and synchronous parse exhaustion. Exhaustion is discussed further in a separate document.

Rejection events

    my $g = Marpa::R3::Scanless::G->new(
        {
            source    => \($grammar),
            rejection => 'event'
        }
    );
    my $rejection = 0;
    my $pos;

    my $recce =
      Marpa::R3::Scanless::R->new( { grammar => $g,
         event_handlers => {
             q{'rejected} => sub {
                $rejection = 1;
                diag("You fool! you forget the semi-colon at location $pos!")
                    if $verbose;
                return 'pause'
             }
         },
      }, $recce_debug_args );
    $pos = $recce->read( \$suffixed_string, 0, $original_length );

    READ_LOOP: while (1) {
        (undef, $pos) = $recce->block_progress();
        last READ_LOOP if not $rejection;
        $recce->resume( $original_length, 1 );
        diag("I fixed it for you.  Now you owe me.") if $verbose;
        $rejection = 0;
        $recce->resume( $pos, $original_length - $pos );
    } ## end READ_LOOP: while (1)

Rejection events are location events. A rejection event triggers if all lexemes at a G1 location are rejected, and the grammar's rejection setting is "event". The name of the event is "'rejected" (The initial single quote is part of the event's name, and indicates it is a reserved name, which will not conflict with the name of any user-named event.)

Triggering

Lexeme events

A lexeme event will trigger at the current location if all of the following criteria, applied in order, are true:

  • It is declared in a :lexeme pseudo-rule.

  • Its lexeme has been scanned by the L0 grammar at that location.

  • The G1 grammar would accept its lexeme at that location.

  • The event is activated. An event is activated by default when it is declared. Deactivation and reactivation of events is done with the recognizer's activate() method.

  • Its lexeme priority is higher than, or equal to, that of any other lexeme remaining after the previous criteria have been applied.

  • If it is a post-lexeme event, none of other remaining events are pre-lexeme events. (In other words, a pre-lexeme event prevents post-lexeme events from triggering at the same location.)

Marpa allows ambiguous lexemes and, even after all the above criteria have been applied, there may be more than one lexeme event at a location.

Non-lexeme events

Prediction, completion and nulling events are non-lexeme events. The conditions for a non-lexeme event are simpler than those for a lexeme event, because they do not involve lexical processing.

A non-lexeme event will trigger at the current location if all of the following are true:

  • It is declared in a named event statement.

  • Its triggering condition is true. Specifically,

    • It is a prediction and its symbol is acceptable at the current location; or

    • it is a completion or a nulling event and its symbol is recognized at the current location; or

  • The event is activated. An event is activated by default when it is declared. Deactivation and reactivation of events is done with the recognizer's activate() method.

Location events

A location event will trigger at the current location if all of the following are true:

  • It is declared using the event_handlers named argument of the recognizer's new() constructor.

  • Its triggering condition is true. Specifically,

    • it is an exhaustion event, and asynchronous parse exhaustion, as defined above, occurs at the current location; or

    • it is an rejection event, and all lexeme alternatives are rejected at the current location.

  • The event is activated. An event is activated by default when it is declared. Deactivation and reactivation of events is done with the recognizer's activate() method.

Techniques

External scanning

Switching to external scanning is an intended use case for all events, other than exhaustion events. In particular, the behavior of pre-lexeme events is most intuitive when thought about with external scanning in mind.

The example code for this document contains an artificially simple example of external scanning. The symbol <insert d> has a pre-lexeme event declared:

    :lexeme ~ <insert d> pause => before event => 'insert d'

When this event triggers,

  • Marpa::R3 returns control to the app without reading the lexeme actually found in the physical input;

  • the app reads a <d> symbol externally, and

  • the app resumes internal scanning.

Markers

It is quite reasonable to create "markers" -- nulling symbols whose primary (or sole) purpose is to have nulling events declared for them. Markers are the only way to declare events that trigger in the middle of a rule.

Rules

There are no events explicitly defined in terms of rules, but every rule event that is wanted can be achieved in one or more ways. The most flexible of these, and the best for many purposes, is to use markers.

Another method is to use the LHS of a rule to track rule predictions and completions. This requires that the LHS symbol of the rule be unique to that rule.

Implications

This section describes some implications of the parse events mechanism that may be unexpected at first. These implications are Marpa working as designed and, I hope the reader will agree, as is desirable.

Ambiguity

If a parse is ambiguous, events trigger for all the possible symbols. A user thinking in terms of one of the parses, and unaware of the ambiguity, may find this unexpected. In one of the examples, events for both the symbols <ambig1> and <ambig2>, as well as all their derived symbols, trigger.

Tentative events

Marpa's events are left-eidetic but right-blind. Left of the event location, Marpa's events are 100% accurate. Right of the event location, they are totally unaware of what the actual input will be -- there is no "lookahead". Because events trigger based on input action only up to the event location, events are tentative.

Once the parse is complete, and the actual input to the right of the event location is taken into account, it is quite possible that none of the parse trees will actually contain the symbol instance that triggered an event.

In one of the examples, prediction and completion events are reported for the symbols <start1>, <start2>, <mid1> and <mid2>, but none of these symbols winds up in any of the parse tress. This is because they are derived from <ambig1> or <ambig2>, and <ambig1> or <ambig2> will never be fully recognized in any of the parse trees. The reason that <ambig1> and <ambig2> will not be fully recognized is that their full recognition requires that there be a <z> symbol in the input and the input stream in the example does not contain a <z> symbol.

Exhaustion events are not tentative. All other parse events are tentative.

In the example, the predictions for <mid1> and <mid2> do not match anything in the final parse tree, because the locations where <mid1> and <mid2> would be predicted are not reached in those trees. For similar reasons, nulling events are tentative.

Lexemes can be ambiguous and when they are ambiguous one or more of the lexeme alternatives may not be used in any final parse tree. Because of this, lexeme events are also tentative.

After rejection events, input can be, and typically is, retried at the same G1 location. This is what happens when the Ruby Slippers technique is used. Often, on the second or later attempt, one or more lexemes are found that are acceptable to the grammar. For this reason, rejection events are tentative.

Nulled forests

When a symbol is nulled, any symbol which can be null-derived from it is also nulled. In one of the examples, when the symbol <g> is nulled, derived symbols <g1>, <g2>, <g3>, <g4> are also nulled.

Note that what was said about ambiguity applies here. In the example, the symbols <g1> and <g2> are in one derivation, while <g3> and <g4> are in another, so that not just a parse tree, but an entire parse forest is nulled. (Pedantically, a nulled tree is a forest of a single tree.)

More precisely,

  • If the grammar allows any derivation of the symbol Y from X in which X and Y are both nulled; and

  • a nulling parse event is declared for Y and activated; and

  • a nulled instance of X is encountered in the parse at location L; then

  • a nulling parse event for Y will trigger at location L.

Events and instances

As stated above, only nulling instances generate nulling events, and only non-nulled symbols generate prediction events and completion events. Since lexemes cannot be zero length, this means that, for a given symbol instance, nulling events and all other events, are mutually exclusive. In other words, if a nulling event occurs for an instance, no other event will trigger for that instance.

Some cases may seem to violate this rule. For example at position 23 in the parse in the code below, we have four events of four different types, all for the symbol <e>. In addition to a nulling event, there is a post-lexeme event, a prediction event and a completion event:

    Events at position 23: "e" ^e ^f e$ e[]

The reason for this is that these events are for three different symbol instances, all of which share the same trigger location:

  1. A nulled instance at location 23.

  2. A potential non-nulled instance, which may begin at location 23.

  3. A non-nulled instance, which begins at location 22 and ends at location 23.

The prediction of the second instance is, in fact, fulfilled, as reported at location 25:

    Events at position 25: "e" ^f e$

The second instance is length 1 and predicted at location 23, but its completion is reported at location 25. This is because whitespace delayed its start by one position.

    Events at position 21: ^e ^f d$ e[] mid1$ mid2$

The third instance is reported as predicted at position 21, even though it actually begins at position 22. The delayed start is because of whitespace.

Prediction and completion events exclude nulled symbols, because there is no practical distinction between predicting a nulled symbol, and actually seeing one. This means that the prediction and completion of a nulled symbol would always occur together. This very special nature of nulled symbols motivates their treatment as a special case.

Examples

Grammar 1

This grammar will be used in the next few examples.

    my $dsl1 = <<'END_OF_DSL';
        top ::= A B C
        A ::= 'a'
        B ::= 'b'
        C ::= 'c'
        event A = completed A
        event B = completed B
        event C = completed C
        :discard ~ ws
        ws ~ [\s]+
    END_OF_DSL

    my $grammar1 = Marpa::R3::Scanless::G->new( { source => \$dsl1 } );

Basic example

    @results = ();
    $recce   = Marpa::R3::Scanless::R->new(
        {
            grammar        => $grammar1,
            event_handlers => {
                A => sub () { push @results, 'A'; 'ok' },
                B => sub () { push @results, 'B'; 'ok' },
                C => sub () { push @results, 'C'; 'ok' },
            }
        }
    );

A default event handler

    @results = ();
    $recce = Marpa::R3::Scanless::R->new(
        {
            grammar        => $grammar1,
            event_handlers => {
                "'default" => sub () {
                    my ( $slr, $event_name ) = @_;
                    push @results, $event_name;
                    'ok';
                },
            }
        }
    );

Using both default and explicit handlers

The next example processes event "A" with an explicit handler, and leaves the others to a default handler.

    @results = ();
    $recce = Marpa::R3::Scanless::R->new(
        {
            grammar        => $grammar1,
            event_handlers => {
                A => sub () { push @results, 'A'; 'ok' },
                "'default" => sub () {
                    my ( $slr, $event_name ) = @_;
                    push @results, "!A=$event_name";
                    'ok';
                },
            }
        }
    );

Exhaustion and rejection events

The next example shows how exhaustion and rejection events can be handled. Note that, in this example, the event handler pauses the recognizer so that it can process these events outside the recognizer, perhaps ending or abending the parse.

    my $dsl2 = <<'END_OF_DSL';
            top ::= A B C
            A ::= 'a'
            B ::= 'b'
            C ::= 'c'
            :discard ~ ws
            ws ~ [\s]+
    END_OF_DSL

    my $grammar2 = Marpa::R3::Scanless::G->new(
        {
            source => \$dsl2,
            rejection => 'event',
            exhaustion => 'event',
        },
    );

    @results = ();
    $recce = Marpa::R3::Scanless::R->new(
        {
            grammar        => $grammar2,
            event_handlers => {
                "'rejected" => sub () { @results = ('rejected'); 'pause' },
                "'exhausted" => sub () { @results = ('exhausted'); 'pause' },
            }
        }
    );

Events with associated data

The next two examples show how event handlers access data. This example shows an event which has event data directly associated with it. This data is passed in the arguments to the handler.

    my $dsl3 = <<'END_OF_DSL';
            top ::= A B C
            A ~ 'a' B ~ 'b' C ~ 'c'
            :lexeme ~ <A> pause => after event => 'A'
            :lexeme ~ <B> pause => after event => 'B'
            :lexeme ~ <C> pause => after event => 'C'
            :discard ~ ws
            ws ~ [\s]+
    END_OF_DSL

    my $grammar3 = Marpa::R3::Scanless::G->new(
        {
            source => \$dsl3,
            rejection => 'event',
            exhaustion => 'event',
        },
    );

    @results = ();
    $recce = Marpa::R3::Scanless::R->new(
        {
            grammar        => $grammar3,
            event_handlers => {
                "'default" => sub () {
                    my ( $slr, $event_name, $symid, $block_ix, $start, $length ) = @_;
                    my $symbol_name = $grammar3->symbol_name($symid);
                    push @results,
                        "event $event_name for symbol $symbol_name; block $block_ix, location $start, length=$length";
                    'ok';
                },
            }
        }
    );

Handlers requiring local and non-local data.

We've already shown handlers which use global data. We took advantage of the fact that handlers, like all Perl functions, are closures.

The next example shows the use of a "factory" for making handlers. This is the most powerful and general method, capable of taking data that is only available in several widely dispersed scopes, and making it available to the handler at event processing time.

The anonymous handler returned by factory() requires a global, a non-local and a local variable to do its job. Call this anonymous handler, anon. The global $A_global is accessible to anon at event processing time because anon is a closure.

The variable $B_non_local is not a global -- it is local to factory(). But $B_non_local is non-local to anon. Nonetheless, because anon is a closure, $B_non_local is available to anon at event processing time.

Finally, $C_local is not available when anon is defined, but is local in the scope in which anon is instantiated. $C_local is made available to anon by passing it as an argument to factory().

    @results = ();
    my $A_global = 'A';

    sub factory {
        my ($local_arg) = @_;
        my $B_non_local = 'B';
        return sub () {
           my ($slr, $event_name) = @_;
           my $result;
           $result = $A_global if $event_name eq 'A';
           $result = $B_non_local if $event_name eq 'B';
           $result = $local_arg if $event_name eq 'C';
           push @results, $result;
           'ok';
        }
    }


    sub example_closure {
        my $C_local = 'C';
        return Marpa::R3::Scanless::R->new(
            {
                grammar        => $grammar1,
                event_handlers => {
                    "'default" => factory($C_local),
                }
            }
        );
    }

Per-location event processing, using pause

Multiple events can occur at one parse location. Call this parse location pos. Callback handlers easily handle situations where the events are indifferent to whether they occur together at pos or not. But, for many apps, it is very important to be able to handle all the events occuring at pos as a group.

There are two ways to do this.

  • Pause method: Return 'pause' from any of the event handlers at pos. The recognizer will return control to the app once all processing at pos is complete, and the app can do as it wishes.

  • AoA method: Store the data for the event in an AoA (array of arrays) indexed by pos, and process the events later, from the AoA.

The "pause" method is the most general. The next example illustrates it.

    my $dsl4 = <<'END_OF_DSL';
            top ::= A B C
            A ::= 'a' B ::= 'b' C ::= 'c'
            event '^A' = predicted A
            event 'A$' = completed A
            event '^B' = predicted B
            event 'B$' = completed B
            event '^C' = predicted C
            event 'C$' = completed C
            :discard ~ ws
            ws ~ [\s]+
    END_OF_DSL

    my $grammar4 = Marpa::R3::Scanless::G->new(
        {
            source => \$dsl4,
        },
    );

    @results = ();
    $recce = Marpa::R3::Scanless::R->new(
        {
            grammar        => $grammar4,
            event_handlers => {
                "'default" => sub () {
                    my ( $slr, $event_name ) = @_;
                    push @results, $event_name;
                    'pause';
                },
            }
        }
    );

Per-location event processing, using an AoA

This example illustrates the AoA (Array Of Arrays) method of handling events that need to be processed in sets, grouped by trigger location. The events are gathered into an array of arrays, and processed when the recognizer is finished.

The previous example illustrated the more powerful and general "pause" method. The AoA method can be more elegant, and is powerful enough for many cases.

    @results = ();
    $recce = Marpa::R3::Scanless::R->new(
        {
            grammar        => $grammar4,
            event_handlers => {
                "'default" => sub () {
                    my ( $slr, @event_data ) = @_;
                    my (undef, $pos) = $slr->block_progress();
                    $pos //= 0;
                    push @{$results[$pos]}, \@event_data;
                    'ok';
                },
            }
        }
    );

A messy example

The Marpa::R3 DSL script in this example is designed to include the unusual cases described in this document. It is also a second example of the "pause" method for per-location processing.

These "corner" cases are unlikely to occur all in a single app. Hopefully this grammar will not resemble any grammar that you encounter in practice.

    sub forty_two { return 42; };

    use Marpa::R3;

    my $dsl = <<'END_OF_DSL';

    test ::= a b c d e e f g h action => main::forty_two
        | a ambig1 | a ambig2
    e ::= <real e> | <null e>
    <null e> ::=
    g ::= g1 | g3
    g1 ::= g2
    g2 ::=
    g3 ::= g4
    g4 ::=
    d ::= <real d> | <insert d>
    ambig1 ::= start1 mid1 z
    ambig2 ::= start2 mid2 z
    start1 ::= b  mid1 ::= c d
    start2 ::= b c  mid2 ::= d

    a ~ 'a' b ~ 'b' c ~ 'c'
    <real d> ~ 'd'
    <insert d> ~ ["] 'insert d here' ["]
    <real e> ~ 'e'
    f ~ 'f'
    h ~ 'h'
    z ~ 'z'

    :lexeme ~ <a> pause => after event => '"a"'
    :lexeme ~ <b> pause => after event => '"b"'
    :lexeme ~ <c> pause => after event => '"c"'
    :lexeme ~ <real d> pause => after event => '"d"'
    :lexeme ~ <insert d> pause => before event => 'insert d'
    :lexeme ~ <real e> pause => after event => '"e"'
    :lexeme ~ <f> pause => after event => '"f"'
    :lexeme ~ <h> pause => after event => '"h"'

    event '^test' = predicted test
    event 'test$' = completed test
    event '^start1' = predicted start1
    event 'start1$' = completed start1
    event '^start2' = predicted start2
    event 'start2$' = completed start2
    event '^mid1' = predicted mid1
    event 'mid1$' = completed mid1
    event '^mid2' = predicted mid2
    event 'mid2$' = completed mid2

    event '^a' = predicted a
    event '^b' = predicted b
    event '^c' = predicted c
    event 'd[]' = nulled d
    event 'd$' = completed d
    event '^d' = predicted d
    event '^e' = predicted e
    event 'e[]' = nulled e
    event 'e$' = completed e
    event '^f' = predicted f
    event 'g[]' = nulled g
    event '^g' = predicted g
    event 'g$' = completed g
    event 'g1[]' = nulled g1
    event 'g2[]' = nulled g2
    event 'g3[]' = nulled g3
    event 'g4[]' = nulled g4
    event '^h' = predicted h

    :discard ~ whitespace
    whitespace ~ [\s]+
    END_OF_DSL

    my $grammar = Marpa::R3::Scanless::G->new(
        {
            source            => \$dsl,
            semantics_package => 'My_Actions'
        }
    );

    my @events = ();
    my @event_history = ();
    my $next_lexeme_length = undef;

    my $slr = Marpa::R3::Scanless::R->new( { grammar => $grammar,
        event_handlers => {
            "'default" => sub () {
                my ($slr, $event_name, undef, undef, undef, $length) = @_;
                if ($event_name eq 'insert d') {
                   $next_lexeme_length = $length;
                }
                push @events, $event_name;
                'pause';
            }
        }
    } );

    push @event_history, join q{ }, "Events at position 0:", sort @events
       if @events;
    @events = ();

    my $input = q{a b c "insert d here" e e f h};
    my $length = length $input;
    my $pos    = $slr->read( \$input );

    READ: while (1) {

        push @event_history, join q{ }, "Events at position $pos:", sort @events
           if @events;
        @events = ();

        if (defined $next_lexeme_length) {
            $slr->lexeme_read_literal('real d', undef, undef, $next_lexeme_length);
            (undef, $pos) = $slr->block_progress();
            $next_lexeme_length = undef;
            next READ;
        }
        if ($pos < $length) {
            $pos = $slr->resume();
            next READ;
        }
        last READ;
    } ## end READ: while (1)

    my $expected_events = <<'=== EOS ===';
    Events at position 0: ^a ^test
    Events at position 1: "a" ^b ^start1 ^start2
    Events at position 3: "b" ^c ^mid1 start1$
    Events at position 5: "c" ^d ^mid2 start2$
    Events at position 6: insert d
    Events at position 21: ^e ^f d$ e[] mid1$ mid2$
    Events at position 23: "e" ^e ^f e$ e[]
    Events at position 25: "e" ^f e$
    Events at position 27: "f" ^h g1[] g2[] g3[] g4[] g[]
    Events at position 29: "h" test$
    === EOS ===

Details

This section contains additional explanations, not essential to understanding the rest of this document. Often they are formal or mathematical. While some people find these helpful, others find them distracting, which is why they are segregated here.

Terminology

The following are more formal definitions of terms previously defined in intuitive terms.

Actual input: Let A be the actual input to a parse. Let prefix(A, L) be the prefix of A of length L. Let S be an arbitrary string in the same alphabet as A. Let prefix(S, L) be the prefix of S of length L. If prefix(A, L) = prefix(S, L), then we say that string S is consistent with the actual input up to location L. For brevity, we say that S is actual to L.

Acceptable: if an instance of symbol <S> that starts at location L is in a valid parse of some string actual to L, we say that a symbol <S> is acceptable at a location L. To say that a symbol is acceptable at a location L is to say that it is the symbol of a symbol instance acceptable at location L.

Recognized: if an instance of symbol <S> that ends at location L is in a valid parse of some string actual to L, we say that the instance of symbol S is recognized at a location L. We say that a symbol is recognized at a location L if it is the symbol of a symbol instance recognized at location L.

COPYRIGHT AND LICENSE

  Marpa::R3 is Copyright (C) 2017, Jeffrey Kegler.

  This module is free software; you can redistribute it and/or modify it
  under the same terms as Perl 5.10.1. For more details, see the full text
  of the licenses in the directory LICENSES.

  This program is distributed in the hope that it will be
  useful, but without any warranty; without even the implied
  warranty of merchantability or fitness for a particular purpose.