The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Marpa::R2::Event - SLIF parse events

Synopsis

    my $input = q{a b c "insert d here" e e f h};
    my $length = length $input;
    my $pos    = $slr->read( \$input );

    my $actual_events = q{};

    READ: while (1) {

        my @actual_events = ();

        my $next_lexeme;
        EVENT:
        for my $event ( @{ $slr->events() } ) {
            my ($name) = @{$event};
            push @actual_events, $name;
        }

        if (@actual_events) {
            $actual_events .= join q{ }, "Events at position $pos:", @actual_events;
            $actual_events .= "\n";
        }
        if ($pos < $length) {
            $pos = $slr->resume();
            next READ;
        }
        last READ;
    } ## end READ: while (1)

The synopsis is extracted from an example given in full below.

About this document

This document is an overview of SLIF parse events. SLIF parse events trigger based on conditions declared in the DSL. Typical events are the prediction or recognition of a symbol.

SLIF parse events are often used to allow an application to switch over to its own custom procedural logic. Among other things, an application can do its own "external" scanning of lexemes. An application may ask Marpa to resume internal scanning at any point.

SLIF parse events may be named or unnamed. Use of unnamed events is discouraged, and should be reserved for legacy code. New applications should only use named events. When not otherwise specified, this document is talking about named events. Unnamed events are described below, in a section dedicated to them.

Terminology

SLIF parse events are called parse events or simply events, in contexts where the meaning is clear.

SLIF parse events evolved over time from simpler mechanisms, and the term SLIF parse event was introduced late in the development of Marpa::R2. In previous versions of Marpa::R2, SLIF parse events and their precursors are called "pauses" or simply "events". For historical reasons, some of the method names dealing with SLIF parse events still have the word "pause" as part of their name.

In this document, an instance of a symbol in a parse means an occurrence of the symbol in that parse with a specific start location and length. An instance of a symbol is also called symbol instance. A consequence of this definition is that every symbol instance has exactly one end location.

In a parse, a nulled symbol instance, or nulled symbol, is a symbol instance with a length of zero. A non-nulled symbol instance, non-nulled instance, or non-nulled symbol, is a symbol instance which is not nulled. A symbol in a grammar is a nullable symbol if it has at least one nulled instance in at least one parse of at least one input. A symbol in a grammar is a nulling symbol if, in all parses of all inputs, all of its instances are nulled instances. A symbol is non-nulling if it is not nulling.

In the following, we say that an input is consistent with the actual input up to location L, or simply, actual to L, if it is the actual input up to location L, and any possible string after location L. We say that a symbol S is acceptable at a location L, if an instance of symbol S starting at location L is in a valid parse, according to the grammar, of some input actual to L. We say that a symbol instance is recognized at a location L if its end location is L, and it is in a valid parse, according to the grammar, of some input actual to L. We say that a symbol is recognized at a location L if it is the symbol of a symbol instance at location L.

The life cycle of events

  • An SLIF parse event must be declared.

  • A declared event may trigger.

  • Once an event triggers, it may be accessed.

Events are declared in the SLIF DSL. Parse events can be either a lexeme event or a non-lexeme event. Lexeme events are declared using :lexeme pseudo-rules. Non-lexeme events are declared using named event statements. The various types of parse events are described in detail below. The description of each type of parse event indicates whether it is a lexeme or a non-lexeme event.

Once declared, events may trigger during any event-triggering SLIF recognizer method. The event-triggering SLIF recognizer methods are read(), resume(), lexeme_read() and lexeme_complete().

The location at which an event triggers is the event location. An event may trigger at any location, including location 0. When an event triggers, it causes the event-triggering method to return immediately, with the current location at the trigger location. The trigger location is the same as the event location, except in the case of pre-lexeme events.

Non-lexeme events may trigger during any of the event-triggering methods. Lexeme events will only trigger during calls of the $slr->read() and $slr->resume() methods.

The triggering of events may be controlled with the activate() method. An event will only trigger if activated. All events are automatically activated when declared.

Events may be accessed using the Scanless recognizer's events() method. The beginning and end of the lexeme triggering a lexeme event may be found using the Scanless recognizer's pause_span() method.

Types of parse event

Completion events

Completion events are declared in the SLIF DSL using the named event statement:

    event subtext = completed <subtext>

Completion SLIF parse events can be specified for any symbol that is not a lexeme. Completion events are non-lexeme events. A completion event triggers whenever a non-nulled instance of its symbol is recognized at the current location.

When a completion event triggers, its trigger location and its event location are set to the current location, which will be the end location of the instance that triggered the event. The event is called a "completion" because, at the trigger location, the recognition of its symbol is "complete".

In the SLIF parse event descriptor returned by the the $slr->events() method, the name of completed event is the only element.

Nulling events

Nulling events are declared in the SLIF DSL using the named event statement:

    event 'A[]' = nulled <A>

A nulling SLIF parse event occurs whenever a nulled instance of its symbol is recognized at the current location. When a completion event triggers, its trigger location and its event location are set to the current location, which will be the location where the triggering instance both begins and ends.

Nulling events are non-lexeme events. Nulling SLIF parse events can be specifed for any symbol that is not a lexeme. Nulled symbols may derive other null symbols, producing nulled trees; and null derivations may be ambiguous, producing nulled forests. All activated nulling events declared for symbols in nulled trees and forests will trigger. Nulled forests are described in more detail in a separate section.

In the SLIF parse event descriptor returned by the the $slr->events() method, the name of nulling event is the only element.

Prediction events

Prediction events are declared in the SLIF DSL using the named event statement:

    event '^a' = predicted A

A prediction event triggers whenever a non-nulling symbol is acceptable at the current location. When a prediction event triggers, its trigger location and its event location are set to the current location. A prediction may not result in an actual instance of the symbol, but no actual symbol instance can start at the event location unless a prediction, if properly declared and activated, would trigger at that location.

Prediction SLIF parse events may be defined for any symbol, whether it is a lexeme or not. But prediction events are non-lexeme events, even when their symbol is a lexeme.

In the SLIF parse event descriptor returned by the the $slr->events() method, the name of prediction event is the only element.

Post-lexeme events

    :lexeme ~ <a> pause => after event => '"a"'

A post-lexeme event is a lexeme event. It triggers if the lexeme is scanned at the current location. The SLIF recognizer will have already read the lexeme when its post-lexeme event triggers.

When a post-lexeme event triggers, its trigger location and its event location are set to the current location, which will also be the location where the lexeme ends. A post-lexeme event also sets the pause span and pause lexeme. Post-lexeme events which trigger during $slr->lexeme_complete() and $slr->lexeme_read() calls are discarded.

In the SLIF parse event descriptor returned by the the $slr->events() method, the name of post-lexeme event is the only element.

Pre-lexeme events

    :lexeme ~ <insert d> pause => before event => 'insert d'

A pre-lexeme event is a lexeme event. It triggers if the lexeme is scanned at the current location. When a pre-lexeme event triggers, its event location is set to the current location. Its trigger location is set to the location where the lexeme starts, which will be before the event location. A pre-lexeme event also sets the pause span and pause lexeme.

The SLIF recognizer will not have read the lexeme when its pre-lexeme event triggers. In effect, it "rewinds" the scanning.

For most events, the trigger location is the current location, but pre-lexeme events are the exception. Its setting of the trigger location to the start of the lexeme is consistent with the pre-lexeme event's behavior as a "rewind". An intended use of pre-lexeme events is catching a lexeme which is about to be read, and giving it special treatment. For more on this, see below. Pre-lexeme events which trigger during $slr->lexeme_complete() and $slr->lexeme_read() calls are discarded.

There is a lot of similarity between pre-lexeme events and predictions, and the two can trigger together, but there are important differences. A pre-lexeme event does not occur unless that the lexeme is actually found in the input. On the other hand, a prediction event is, as the name suggests, only a prediction -- the lexeme may not actually be found in the input.

In the SLIF parse event descriptor returned by the the $slr->events() method, the name of pre-lexeme event is the only element.

Lexeme events

SLIF parse events are divided into lexeme and non-lexeme events, based on their type. The lexeme events are the pre-lexeme event and post-lexeme event.

A lexeme event will trigger at the current location if all of the following criteria, applied in order, are true:

  • It is declared in a :lexeme pseudo-rule.

  • Its lexeme has been scanned by the L0 grammar at that location.

  • The G1 grammar would accept its lexeme at that location.

  • The lexeme is not scanned externally, that is, by a call of the $slr->lexeme_complete() method or of the $slr->lexeme_read() method method.

  • The event is activated. An event is activated by default when it is declared. Deactivation and reactivation of events is done with the SLIF recognizer's activate() method

  • Its lexeme priority is higher than, or equal to, that of any other lexeme remaining after the previous criteria have been applied.

  • If it is a post-lexeme event, none of other remaining events are pre-lexeme events. (In other words, a pre-lexeme event prevents any post-lexeme events from triggering at the same location.)

Marpa allows ambiguous lexemes and, even after all the above criteria have been applied, there may be more than one lexeme event at a location.

Pause span and pause lexeme

When a lexeme event triggers, it will set the pause lexeme to the lexeme symbol. It will also set the pause span to the start physical input stream location and length of the triggering lexeme. The pause span and pause lexeme are originally undefined. Every call to the read() or the resume() methods resets the pause span to undefined.

The pause span may be accessed directly with the $slr->pause_span() method. Accessing the pause lexeme directly is discouraged, because multiple lexeme events may occur at the same G1 location, but only one pause lexeme, arbitrarily chosen, is recorded. This is not a problem with the pause span, because all pause spans at a G1 location will be identical.

Non-lexeme events

Prediction, completion and nulling events are non-lexeme events. The conditions for a non-lexeme event are simpler than those for a lexeme event, because they do not involve lexical processing.

A non-lexeme event will trigger at the current location if all of the following are true:

  • It is declared in a named event statement.

  • It is a prediction and its symbol is acceptable at the current location; or it is a completion or a nulling event and its symbol is recognized at the current location.

  • The event is activated. An event is activated by default when it is declared. Deactivation and reactivation of events is done with the SLIF recognizer's activate() method

Techniques

External scanning

Switching to external scanning is an intended use case for all events. In particular, the behavior of pre-lexeme events is most intuitive when thought about with external scanning in mind.

The example code for this document contains an artificially simple example of external scanning. The symbol <insert d> has a pre-lexeme event declared:

    :lexeme ~ <insert d> pause => before event => 'insert d'

When this triggers, the code in the example switches to external scanning: It reads a <d> symbol externally, skips over lexeme actually in the input, and resumes internal scanning.

Markers

It is quite reasonable to create "markers" -- nulling symbols whose primary (or sole) purpose is to have nulling events declared for them. Markers are the only way to declare events that trigger in the middle of a rule.

Rules

There are no events explicitly defined in terms of rules, but every rule event that is wanted can be achieved in one or more ways. The most flexible of these, and the best for many purposes, is to use markers.

Another method is to use the LHS of a rule to track rule predictions and completions. This requires that the LHS symbol of the rule be unique to that rule.

Implications

This section describes some implications of the SLIF parse events mechanism that may be unexpected at first. These implications are Marpa working as designed, and I hope the reader will usually come to agree, as is desirable.

Ambiguity

If a parse is ambiguous, events trigger for all the possible symbols. A user thinking in terms of one of the parses, and unaware of the ambiguity, may find this unexpected. In the example, events for both the symbols <ambig1> and <ambig2>, as well as all their derived symbols, trigger.

Tentative events

Marpa's events are left-eidetic but right-blind. Left of the event location, Marpa's events are 100% accurate. Right of the event location, they are totally unaware of what the actual input will be -- there is no "lookahead". Because events trigger based on input action only up to the event location, events are tentative.

Once the parse is complete, and the actual input to the right of the event location is taken into account, it is quite possible that none of the parse trees will match a triggered event.

In the example, prediction and completion events are reported for the symbols <start1>, <start2>, <mid1> and <mid2>, but none of these symbols winds up in any of the parse tress. This is because they are derived from <ambig1> or <ambig2>. But for <ambig1> or <ambig2> to be fully recognized, there must be a <z> symbol in the input and the input stream in the example does not contain a <z> symbol.

All SLIF parse events are tentative, not just completion events. In the example, the predictions for <mid1> and <mid2> do not match anything in the final parse tree, because the locations where <mid1> and <mid2> would be predicted are not reached in those trees. For similar reasons, nulling events are tentative.

Lexemes can be ambiguous and when they are ambiguous one or more of the lexeme alternatives may not be used in any final parse tree. Because of this, lexeme events are also tentative.

Nulled forests

When a symbol is nulled, any symbol which can be null-derived from it is also nulled. In the example, when the symbol <g> is nulled, derived symbols <g1>, <g2>, <g3>, <g4> are also nulled.

Note that what was said about ambiguity applies here. In the example, the symbols <g1> and <g2> are in one derivation, while <g3> and <g4> are in another, so that not just a parse tree, but an entire parse forest is nulled. (This section usually speaks of nulled forests because, pedantically, a nulled tree is a forest of a single tree.)

More precisely,

  • If the grammar allows any derivation of the symbol Y from X in which X and Y are both nulled; and

  • a nulling SLIF parse event is declared for Y and activated; and

  • a nulled instance of X is encountered in the parse at location L; then

  • a nulling SLIF parse event for Y will trigger at location L.

Events and instances

As stated above, only nulling instances generate nulling events, and only non-nulled symbols generate prediction events and completion events. Since lexemes cannot be zero length, this means that, for a given symbol instance, nulling events and all other events, are mutually exclusive. In other words, if a nulling event occurs for an instance, no other event will trigger for that instance.

Some cases may seem to violate this rule. For example at position 23 in the parse in the code below, we have four events of four different types, all for the symbol <e>. In addition to a nulling event, there is a post-lexeme event, a prediction event and a completion event:

    Events at position 23: "e" e$ e[] ^e ^f

The reason for this is that these events are for three different symbol instances, all of which share the same trigger location:

  1. A nulled instance at location 23.

  2. A potential non-nulled instance, which may begin at location 23.

  3. A non-nulled instance, which begins at location 22 and ends at location 23.

The prediction of the second instance is, in fact, fulfilled, as reported at location 25:

    Events at position 25: "e" e$ ^f

The second instance is length 1 and predicted at location 23, but its completion is reported at location 25. This is because whitespace delayed its start by one position.

    Events at position 21: d$ mid1$ mid2$ e[] ^e ^f

The third instance is reported as predicted at position 21, even though it actually begins at position 22. The delayed start is because of whitespace.

Prediction and completion events exclude nulled symbols, because there is no practical distinction between predicting a nulled symbol, and actually seeing one. This means that the prediction and completion of a nulled symbol would always occur together. The very special nature of nulled symbols motivates their treatment as a special case.

Hidden events

An important aspect of the event mechanism is that it triggers a return from the event-triggering method at the trigger location. It may happen, however, that the method would return at that location in any case, and in this circumstance the triggering can be said to be hidden. A event which causes hidden triggering is called a hidden event.

As one example, the lexeme_complete() and lexeme_read() methods return at every lexeme at which a lexeme is read, so all triggering in those methods is hidden triggering. In the example code in this document, the events at this location were all caused by hidden triggering inside a call to $slr->lexeme_complete():

    Events at position 21: d$ mid1$ mid2$ e[] ^e ^f

As another example, the $slr->read() and $slr->resume() methods return at end of string, but events may also trigger at end of string. The events at this location were caused by hidden triggering inside $slr->resume() at end of string:

    Events at position 29: "h" test$

The example code for this document is programmed with the possibility of hidden triggering in mind. To do this, it is careful to access events after its calls to the $slr->lexeme_read() as well as to make an additional pass through the event-accessing loop after an end of string is encountered.

Lexeme events and external scanning

During external scanning, lexemes are read using the $slr->lexeme_complete() and $slr->lexeme_read() methods. Non-lexeme events may trigger during these methods, as was discussed in "Hidden events". However, lexeme events that would occur during the $slr->lexeme_complete() and $slr->lexeme_read() methods are ignored, and will never trigger.

This behavior may seem non-orthogonal, but in fact it is the most consistent course of action. A pre-lexeme event occuring during a $slr->lexeme_complete() and $slr->lexeme_read() method call would reverse its effect, a behavior which is at best pointless. A post-lexeme event would be less dangerous, but it would be completely redundant -- its presence or absence would tell the application only what the application already knows from the return of success or failure by the $slr->lexeme_complete() or $slr->lexeme_read() methods.

An example

The SLIF DSL in this example is designed to include the unusual and "corner" cases described in this document. It is not like any grammar that you are likely to encounter in normal practice.

    sub forty_two { return 42; };

    use Marpa::R2;

    my $dsl = <<'END_OF_DSL';
    :default ::= action => [name,values]
    lexeme default = latm => 1

    test ::= a b c d e e f g h action => main::forty_two
        | a ambig1 | a ambig2
    e ::= <real e> | <null e>
    <null e> ::=
    g ::= g1 | g3
    g1 ::= g2
    g2 ::= 
    g3 ::= g4
    g4 ::= 
    d ::= <real d> | <insert d>
    ambig1 ::= start1 mid1 z
    ambig2 ::= start2 mid2 z
    start1 ::= b  mid1 ::= c d
    start2 ::= b c  mid2 ::= d

    a ~ 'a' b ~ 'b' c ~ 'c'
    <real d> ~ 'd'
    <insert d> ~ ["] 'insert d here' ["]
    <real e> ~ 'e'
    f ~ 'f'
    h ~ 'h'
    z ~ 'z'

    :lexeme ~ <a> pause => after event => '"a"'
    :lexeme ~ <b> pause => after event => '"b"'
    :lexeme ~ <c> pause => after event => '"c"'
    :lexeme ~ <real d> pause => after event => '"d"'
    :lexeme ~ <insert d> pause => before event => 'insert d'
    :lexeme ~ <real e> pause => after event => '"e"'
    :lexeme ~ <f> pause => after event => '"f"'
    :lexeme ~ <h> pause => after event => '"h"'

    event '^test' = predicted test
    event 'test$' = completed test
    event '^start1' = predicted start1
    event 'start1$' = completed start1
    event '^start2' = predicted start2
    event 'start2$' = completed start2
    event '^mid1' = predicted mid1
    event 'mid1$' = completed mid1
    event '^mid2' = predicted mid2
    event 'mid2$' = completed mid2

    event '^a' = predicted a
    event '^b' = predicted b
    event '^c' = predicted c
    event 'd[]' = nulled d
    event 'd$' = completed d
    event '^d' = predicted d
    event '^e' = predicted e
    event 'e[]' = nulled e
    event 'e$' = completed e
    event '^f' = predicted f
    event 'g[]' = nulled g
    event '^g' = predicted g
    event 'g$' = completed g
    event 'g1[]' = nulled g1
    event 'g2[]' = nulled g2
    event 'g3[]' = nulled g3
    event 'g4[]' = nulled g4
    event '^h' = predicted h

    :discard ~ whitespace
    whitespace ~ [\s]+
    END_OF_DSL

    my $grammar = Marpa::R2::Scanless::G->new( { source => \$dsl } );
    my $slr = Marpa::R2::Scanless::R->new(
        { grammar => $grammar, semantics_package => 'My_Actions' } );

    my $input = q{a b c "insert d here" e e f h};
    my $length = length $input;
    my $pos    = $slr->read( \$input );

    my $actual_events = q{};

    READ: while (1) {

        my @actual_events = ();

        my $next_lexeme;
        EVENT:
        for my $event ( @{ $slr->events() } ) {
            my ($name) = @{$event};
            if ($name eq 'insert d') {
               my (undef, $length) = $slr->pause_span();
               $next_lexeme = ['real d', 'd', $length];
            }
            push @actual_events, $name;
        }

        if (@actual_events) {
            $actual_events .= join q{ }, "Events at position $pos:", @actual_events;
            $actual_events .= "\n";
        }

        if ($next_lexeme) {
            $slr->lexeme_read(@{$next_lexeme});
            $pos = $slr->pos();
            next READ;
        }
        if ($pos < $length) {
            $pos = $slr->resume();
            next READ;
        }
        last READ;
    } ## end READ: while (1)

    my $expected_events = <<'=== EOS ===';
    Events at position 0: ^test ^a
    Events at position 1: "a" ^b ^start1 ^start2
    Events at position 3: "b" start1$ ^c ^mid1
    Events at position 5: "c" start2$ ^d ^mid2
    Events at position 6: insert d
    Events at position 21: d$ mid1$ mid2$ e[] ^e ^f
    Events at position 23: "e" e$ e[] ^e ^f
    Events at position 25: "e" e$ ^f
    Events at position 27: "f" g[] g1[] g3[] g2[] g4[] ^h
    Events at position 29: "h" test$
    === EOS ===

Unnamed events

Use of unnamed events is strongly discouraged. However, to support legacy code, unnamed events are still supported.

Unnamed events are declared by :lexeme pseudo-rules, when the pause adverb is used without the event adverb. Since the the pause adverb creates a SLIF parse event, but the event adverb provides the event's name, this results in a SLIF parse event without a name -- an unnamed event.

Unnamed events cannot be accessed using the $slr->events() method. The only accessors for unnamed events are the $slr->pause_lexeme() method and the $slr->pause_span() method.

Copyright and License

  Copyright 2014 Jeffrey Kegler
  This file is part of Marpa::R2.  Marpa::R2 is free software: you can
  redistribute it and/or modify it under the terms of the GNU Lesser
  General Public License as published by the Free Software Foundation,
  either version 3 of the License, or (at your option) any later version.

  Marpa::R2 is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser
  General Public License along with Marpa::R2.  If not, see
  http://www.gnu.org/licenses/.