Jeffrey Kegler > Marpa-R3-4.001_042 > Marpa::R3::Event

Download:
Marpa-R3-4.001_042.tar.gz

Annotate this POD

Source   Latest Release: Marpa-R3-4.001_043

NAME ^

Marpa::R3::Event - SLIF parse events

Synopsis ^

    my $input = q{a b c "insert d here" e e f h};
    my $length = length $input;
    my $pos    = $slr->read( \$input );

    my $actual_events = q{};

    READ: while (1) {

        my @actual_events = ();

        my $next_lexeme;
        EVENT:
        for my $event ( @{ $slr->events() } ) {
            my ($name) = @{$event};
            push @actual_events, $name;
        }

        if (@actual_events) {
            $actual_events .= join q{ }, "Events at position $pos:", sort @actual_events;
            $actual_events .= "\n";
        }
        if ($pos < $length) {
            $pos = $slr->resume();
            next READ;
        }
        last READ;
    } ## end READ: while (1)

The synopsis is extracted from an example given in full below.

About this document ^

This document is an overview of SLIF parse events. SLIF parse events trigger based on conditions declared in the DSL. Typical events are the prediction or recognition of a symbol.

SLIF parse events are often used to allow an application to switch over to its own custom procedural logic. Among other things, an application can do its own "external" scanning of lexemes. An application may ask Marpa to resume internal scanning at any point.

Terminology ^

SLIF parse events are called parse events or simply events, in contexts where the meaning is clear.

SLIF parse events evolved over time from simpler mechanisms, and the term SLIF parse event was introduced late in the development of Marpa::R3. In previous versions of Marpa::R3, SLIF parse events and their precursors are called "pauses" or simply "events". For historical reasons, some of the method names dealing with SLIF parse events still have the word "pause" as part of their name.

In this document, an instance of a symbol in a parse means an occurrence of the symbol in that parse with a specific start location and length. An instance of a symbol is also called a symbol instance. A consequence of this definition is that every symbol instance has exactly one end location.

For some string and grammar, we say that a parse is valid if the parse of the string is valid according to the grammar. In a parse, a nulled symbol instance, or nulled symbol, is a symbol instance with a length of zero. A non-nulled symbol instance, non-nulled instance, or non-nulled symbol, is a symbol instance which is not nulled. A symbol in a grammar is a nullable symbol if it has at least one nulled instance in at least one valid parse of at least one string. A symbol in a grammar is a nulling symbol if, for all strings, all of its instances in valid parses are nulled instances. A symbol is non-nulling if it is not nulling.

Intuitively, a string is actual to L if we know the actual input as far as L, and the string is consistent with the input. More formally, let A be the actual input to a parse. Let S be an arbitrary string in the same alphabet. Let prefix(A, L) be the prefix of length L that is of length L, and let prefix(S, L) be defined in the same way. If prefix(A, L) = prefix(S, L), then we say that string S is consistent with the actual input up to location L. For brevity, we will often say that S is actual to L.

If an instance of symbol S that starts at location L is in a valid parse of some string actual to L, we say that a symbol S is acceptable at a location L. We say that a symbol is acceptable at a location L if it is the symbol of a symbol instance acceptable at location L.

If an instance of symbol S that ends at location L is in a valid parse of some string actual to L, we say that a symbol S is recognized at a location L. We say that a symbol is recognized at a location L if it is the symbol of a symbol instance recognized at location L.

The life cycle of events ^

Parse events are declared in the SLIF DSL. A parse event is either a lexeme event or a non-lexeme event. A lexeme event is declared using a :lexeme pseudo-rule. A non-lexeme event is declared using a named event statement. The various types of parse events are described in detail below. The description of each type of parse event will state whether it is a lexeme or a non-lexeme event.

Once declared, a parse event may trigger during any event-triggering SLIF recognizer method. The event-triggering SLIF recognizer methods are read(), resume(), lexeme_read() and lexeme_complete().

The location at which a parse event triggers is the event location. An event may trigger at any location, including location 0. When an event triggers, it causes the event-triggering method to return immediately, with the current location at the trigger location. The trigger location is the same as the event location, except in the case of pre-lexeme events.

A non-lexeme event may trigger during any of the event-triggering methods. A lexeme event will only trigger during calls of the $slr->read() and $slr->resume() methods.

The triggering of events may be controlled with the activate() method. An event will only trigger if activated. All events are automatically activated when declared.

Events may be accessed using the Scanless recognizer's events() method. The beginning and end of the lexeme triggering a lexeme event may be found using the Scanless recognizer's pause_span() method.

Types of parse event ^

Completion events

Completion events are declared in the SLIF DSL using the named event statement:

    event 'a' = completed A
    event 'b'=off = completed B
    event 'c'=on = completed C
    event 'd' = completed D

A completion SLIF parse event can be specified for any symbol that is not a lexeme. Completion events are non-lexeme events. A completion event triggers whenever a non-nulled instance of its symbol is recognized at the current location.

When a completion event triggers, its trigger location and its event location are set to the current location, which will be the end location of the instance that triggered the event. The event is called a "completion" because, at the trigger location, the recognition of its symbol is "complete".

In the SLIF parse event descriptor returned by the the $slr->events() method, the name of completed event is the only element.

Discard events

    :discard ~ ws event => ws
    ws ~ [\s]+
    :discard ~ [,] event => comma=off
    :discard ~ [;] event => 'semicolon'=on
    :discard ~ [.] event => period

Discard events are specified in discard pseudo-rules. They are non-lexeme events. This may seem counter-intuitive, but a lexeme must be a symbol visible to the G1 grammar and discarded symbols are discarded before the G1 grammar can see them.

When a discard event triggers, its trigger location and its event location are set to the current location. This will be the end location of the discarded text.

In the SLIF parse event descriptor returned by the the $slr->events() method, there will be 4 elements:

An intended purpose of the G1 location is to allow the synchronization of data taken from a series of discard events, with data taken from a parse tree. Physical input stream locations can often be used for this purpose, but an application is allowed to move around in the physical input stream. If an application does not move monotonically through the physical input stream, physical input stream locations will not necessarily indicate the order from the point of view of the parse tree, and of the virtual input stream. G1 locations are always in left-to-right order from the point of view of parse tree, and of the virtual input stream on which it is based.

Since discarded text is not seen by G1, it does not really have a G1 location, so the G1 location reported with the event is that of the last lexeme read. All lexemes have G1 locations. If the discarded text is at the beginning of the parse, before any lexemes have been read, the G1 location is reported as zero.

Nulling events

A nulling event is declared in the SLIF DSL using the named event statement:

    event '!a' = nulled A
    event '!b'=off = nulled B
    event '!c'=on = nulled C
    event '!d' = nulled D

A nulling SLIF parse event occurs whenever a nulled instance of its symbol is recognized at the current location. When a completion event triggers, its trigger location and its event location are set to the current location, which will be the location where the triggering instance both begins and ends.

A nulling event is a non-lexeme event. A nulling SLIF parse event can be specifed for any symbol that is not a lexeme. A nulled symbol may derive other null symbols, producing one or more nulled trees; because a null derivation may be ambiguous, a nulled symbol may derive more than one nulled tree. A set of one or more nulled trees is called a nulled forest.

When a nulling event triggers for a symbol instance, all activated nulling events declared for symbols derived from the triggered symbol instance will also trigger. The triggering of nulling events is recursive, so that when a nulled symbol instance triggers an event, it triggers all the events in the nulled forest derived from the triggering symbol instance. Nulled forests are described in more detail in a separate section.

In the SLIF parse event descriptor returned by the the $slr->events() method, the name of nulling event is the only element.

Prediction events

A prediction event is declared in the SLIF DSL using the named event statement:

    event '^a' = predicted A

A prediction event triggers whenever a non-nulling symbol is acceptable at the current location. When a prediction event triggers, its trigger location and its event location are set to the current location. A prediction may not result in an actual instance of the symbol, but no actual symbol instance can start at the event location unless a prediction, if properly declared and activated, would trigger at that location.

Prediction SLIF parse events may be defined for any symbol, whether it is a lexeme or not. But prediction events are non-lexeme events, even when their symbol is a lexeme.

In the SLIF parse event descriptor returned by the the $slr->events() method, the name of prediction event is the only element.

Post-lexeme events

    :lexeme ~ <a> pause => after event => '"a"'

A post-lexeme event is a lexeme event. It triggers if the lexeme is scanned at the current location. The SLIF recognizer will have already read the lexeme when its post-lexeme event triggers.

When a post-lexeme event triggers, its trigger location and its event location are set to the current location, which will also be the location where the lexeme ends. A post-lexeme event also sets the pause span. Post-lexeme events which trigger during $slr->lexeme_complete() and $slr->lexeme_read() calls are discarded.

In the SLIF parse event descriptor returned by the the $slr->events() method, the name of post-lexeme event is the only element.

Pre-lexeme events

    :lexeme ~ <insert d> pause => before event => 'insert d'

A pre-lexeme event is a lexeme event. It triggers if the lexeme is scanned at the current location. When a pre-lexeme event triggers, its event location is set to the current location. Its trigger location is set to the location where the lexeme starts, which will be before the event location. A pre-lexeme event also sets the pause span.

The SLIF recognizer will not have read the lexeme when its pre-lexeme event triggers. In effect, it "rewinds" the scanning.

For most events, the trigger location is the current location, but pre-lexeme events are the exception. Its setting of the trigger location to the start of the lexeme is consistent with the pre-lexeme event's behavior as a "rewind". An intended use of pre-lexeme events is catching a lexeme which is about to be read, and giving it special treatment. For more on this, see below. Pre-lexeme events which trigger during $slr->lexeme_complete() and $slr->lexeme_read() calls are discarded.

There is a lot of similarity between pre-lexeme events and predictions, but there are also very important differences.

In the SLIF parse event descriptor returned by the the $slr->events() method, the name of pre-lexeme event is the only element.

Exhaustion events

my $g = Marpa::R3::Scanless::G->new( { source => \($grammar), exhaustion => 'event', rejection => 'event', } );

        my @shortest_span = ();
        my $recce =
          Marpa::R3::Scanless::R->new( { grammar => $g, }, $recce_debug_args );
        my $pos = $recce->read( \$string, $target_start );

        EVENT:
        for my $event ( @{ $recce->events() } ) {
            my ($name) = @{$event};
            if ( $name eq 'target' ) {
                @shortest_span = $recce->last_completed_span('target');
                diag(
                    "Preliminary target at $pos: ",
                    $recce->literal(@shortest_span)
                ) if $verbose;
                next EVENT;
            } ## end if ( $name eq 'target' )
                # Not all exhaustion has an exhaustion event,
                # so we look for exhaustion explicitly below.
            next EVENT if $name eq q('exhausted);
            die join q{ }, "Spurious event at position $pos: '$name'";
        } ## end EVENT: for my $event ( @{ $recce->events() } )

An exhaustion parse event triggers on asynchronous parse exhaustion, if the grammar's exhaustion setting is "event". The name of the event is "'exhausted" (The initial single quote is part of the event's name, and indicates it is a reserved name, which will not conflict with the name of any user-named event.)

Intuitively, parse exhaustion events are created only when needed for control to return to the application. More precisely, a parse exhaustion event is called synchronous if it occurs at a G1 location where control would return to the application in any case, either due to end of string or another event. A parse exhaustion event is called asynchronous if it is not synchronous.

The event itself is often simply discarded, because an application typically does not care whether exhaustion is synchronous or asynchronous. The exhausted() method can be relied on to report both asynchronous and synchronous parse exhaustion, and it is usually used instead. Exhaustion is discussed further in a separate document.

Rejection events

my $g = Marpa::R3::Scanless::G->new( { source => \($grammar), rejection => 'event' } );

    my $recce =
      Marpa::R3::Scanless::R->new( { grammar => $g, }, $recce_debug_args );
    my $pos = $recce->read( \$suffixed_string, 0, $original_length );

    READ_LOOP: while (1) {
        my $rejection = 0;
        my $pos       = $recce->pos();
        EVENT:
        for my $event ( @{ $recce->events() } ) {
            my ($name) = @{$event};
            if ( $name eq q('rejected) ) {
                $rejection = 1;
                diag("You fool! you forget the semi-colon at location $pos!")
                    if $verbose;
                next EVENT;

            } ## end if ( $name eq q('rejected) )
            die join q{ }, "Spurious event at position $pos: '$name'";
        } ## end EVENT: for my $event ( @{ $recce->events() } )

        last READ_LOOP if not $rejection;

        $recce->resume( $original_length, 1 );
        diag("I fixed it for you.  Now you owe me.") if $verbose;
        $recce->resume( $pos, $original_length - $pos );
    } ## end READ_LOOP: while (1)

A rejection event triggers if all lexemes at a G1 location are rejected, and the grammar's rejection setting is "event". The name of the event is "'rejected" (The initial single quote is part of the event's name, and indicates it is a reserved name, which will not conflict with the name of any user-named event.)

Lexeme events ^

SLIF parse events are divided into lexeme and non-lexeme events, based on their type. The lexeme events are the pre-lexeme event and post-lexeme event.

A lexeme event will trigger at the current location if all of the following criteria, applied in order, are true:

Marpa allows ambiguous lexemes and, even after all the above criteria have been applied, there may be more than one lexeme event at a G1 location.

Pause span

When a lexeme event triggers, it will set the pause span to the start physical input stream location and length of the triggering lexeme. The pause span is originally undefined. Every call to the read() or the resume() methods resets the pause span to undefined. The pause span may be accessed directly with the $slr->pause_span() method.

Non-lexeme events ^

Prediction, completion and nulling events are non-lexeme events. The conditions for a non-lexeme event are simpler than those for a lexeme event, because they do not involve lexical processing.

A non-lexeme event will trigger at the current location if all of the following are true:

Techniques ^

External scanning

Switching to external scanning is an intended use case for all events, other than exhaustion events. In particular, the behavior of pre-lexeme events is most intuitive when thought about with external scanning in mind.

The example code for this document contains an artificially simple example of external scanning. The symbol <insert d> has a pre-lexeme event declared:

    :lexeme ~ <insert d> pause => before event => 'insert d'

When this triggers, the code in the example switches to external scanning: It reads a <d> symbol externally, skips over the lexeme actually found in the physical input, and resumes internal scanning.

Markers

It is quite reasonable to create "markers" -- nulling symbols whose primary (or sole) purpose is to have nulling events declared for them. Markers are the only way to declare events that trigger in the middle of a rule.

Rules

There are no events explicitly defined in terms of rules, but every rule event that is wanted can be achieved in one or more ways. The most flexible of these, and the best for many purposes, is to use markers.

Another method is to use the LHS of a rule to track rule predictions and completions. This requires that the LHS symbol of the rule be unique to that rule.

Implications ^

This section describes some implications of the SLIF parse events mechanism that may be unexpected at first. These implications are Marpa working as designed and, I hope the reader will agree, as is desirable.

Ambiguity

If a parse is ambiguous, events trigger for all the possible symbols. A user thinking in terms of one of the parses, and unaware of the ambiguity, may find this unexpected. In the example, events for both the symbols <ambig1> and <ambig2>, as well as all their derived symbols, trigger.

Tentative events

Marpa's events are left-eidetic but right-blind. Left of the event location, Marpa's events are 100% accurate. Right of the event location, they are totally unaware of what the actual input will be -- there is no "lookahead". Because events trigger based on input action only up to the event location, events are tentative.

Once the parse is complete, and the actual input to the right of the event location is taken into account, it is quite possible that none of the parse trees will actually contain the symbol instance that triggered an event.

In the example, prediction and completion events are reported for the symbols <start1>, <start2>, <mid1> and <mid2>, but none of these symbols winds up in any of the parse tress. This is because they are derived from <ambig1> or <ambig2>, and <ambig1> or <ambig2> will never be fully recognized in any of the parse trees. The reason that <ambig1> and <ambig2> will not be fully recognized is that their full recognition requires that there be a <z> symbol in the input and the input stream in the example does not contain a <z> symbol.

Exhaustion events are not tentative. All other SLIF parse events are tentative.

In the example, the predictions for <mid1> and <mid2> do not match anything in the final parse tree, because the locations where <mid1> and <mid2> would be predicted are not reached in those trees. For similar reasons, nulling events are tentative.

Lexemes can be ambiguous and when they are ambiguous one or more of the lexeme alternatives may not be used in any final parse tree. Because of this, lexeme events are also tentative.

After rejection events, input can be, and typically is, retried at the same G1 location. This is what happens when the Ruby Slippers technique is used. Often, on the second or later attempt, one or more lexemes are found that are acceptable to the grammar. For this reason, rejection events are tentative.

Nulled forests

When a symbol is nulled, any symbol which can be null-derived from it is also nulled. In the example, when the symbol <g> is nulled, derived symbols <g1>, <g2>, <g3>, <g4> are also nulled.

Note that what was said about ambiguity applies here. In the example, the symbols <g1> and <g2> are in one derivation, while <g3> and <g4> are in another, so that not just a parse tree, but an entire parse forest is nulled. (Pedantically, a nulled tree is a forest of a single tree.)

More precisely,

Events and instances

As stated above, only nulling instances generate nulling events, and only non-nulled symbols generate prediction events and completion events. Since lexemes cannot be zero length, this means that, for a given symbol instance, nulling events and all other events, are mutually exclusive. In other words, if a nulling event occurs for an instance, no other event will trigger for that instance.

Some cases may seem to violate this rule. For example at position 23 in the parse in the code below, we have four events of four different types, all for the symbol <e>. In addition to a nulling event, there is a post-lexeme event, a prediction event and a completion event:

    Events at position 23: "e" ^e ^f e$ e[]

The reason for this is that these events are for three different symbol instances, all of which share the same trigger location:

  1. A nulled instance at location 23.
  2. A potential non-nulled instance, which may begin at location 23.
  3. A non-nulled instance, which begins at location 22 and ends at location 23.

The prediction of the second instance is, in fact, fulfilled, as reported at location 25:

    Events at position 25: "e" ^f e$

The second instance is length 1 and predicted at location 23, but its completion is reported at location 25. This is because whitespace delayed its start by one position.

    Events at position 21: ^e ^f d$ e[] mid1$ mid2$ 

The third instance is reported as predicted at position 21, even though it actually begins at position 22. The delayed start is because of whitespace.

Prediction and completion events exclude nulled symbols, because there is no practical distinction between predicting a nulled symbol, and actually seeing one. This means that the prediction and completion of a nulled symbol would always occur together. This very special nature of nulled symbols motivates their treatment as a special case.

Hidden events

An important aspect of the event mechanism is that it triggers a return from the event-triggering method at the trigger location. It may happen, however, that the method would return at that location in any case, and in this circumstance the triggering can be said to be hidden. A event which causes hidden triggering is called a hidden event.

As one example, the lexeme_complete() and lexeme_read() methods return at every lexeme at which a lexeme is read, so all triggering in those methods is hidden triggering. In the example code in this document, the events at this location were all caused by hidden triggering inside a call to $slr->lexeme_complete():

    Events at position 21: ^e ^f d$ e[] mid1$ mid2$ 

As another example, the $slr->read() and $slr->resume() methods return at end of string, but events may also trigger at end of string. The events at this location were caused by hidden triggering inside $slr->resume() at end of string:

    Events at position 29: "h" test$

The example code for this document is programmed with the possibility of hidden triggering in mind. To do this, it is careful to access events after its calls to the $slr->lexeme_read() as well as to make an additional pass through the event-accessing loop after an end of string is encountered.

Lexeme events and external scanning

During external scanning, lexemes are read using the $slr->lexeme_complete() and $slr->lexeme_read() methods. Non-lexeme events may trigger during these methods, as was discussed in "Hidden events". However, lexeme events that would occur during the $slr->lexeme_complete() and $slr->lexeme_read() methods are ignored, and will never trigger.

This behavior may seem non-orthogonal, but in fact it is the most consistent course of action. A pre-lexeme event occuring during a $slr->lexeme_complete() and $slr->lexeme_read() method call would reverse its effect, a behavior which is at best pointless. A post-lexeme event would be less dangerous, but it would be completely redundant -- its presence or absence would tell the application only what the application already knows from the return of success or failure by the $slr->lexeme_complete() or $slr->lexeme_read() methods.

An example ^

The SLIF DSL in this example is designed to include the unusual and "corner" cases described in this document. It is not like any grammar that you are likely to encounter in normal practice.

    sub forty_two { return 42; };

    use Marpa::R3;

    my $dsl = <<'END_OF_DSL';

    test ::= a b c d e e f g h action => main::forty_two
        | a ambig1 | a ambig2
    e ::= <real e> | <null e>
    <null e> ::=
    g ::= g1 | g3
    g1 ::= g2
    g2 ::=
    g3 ::= g4
    g4 ::=
    d ::= <real d> | <insert d>
    ambig1 ::= start1 mid1 z
    ambig2 ::= start2 mid2 z
    start1 ::= b  mid1 ::= c d
    start2 ::= b c  mid2 ::= d

    a ~ 'a' b ~ 'b' c ~ 'c'
    <real d> ~ 'd'
    <insert d> ~ ["] 'insert d here' ["]
    <real e> ~ 'e'
    f ~ 'f'
    h ~ 'h'
    z ~ 'z'

    :lexeme ~ <a> pause => after event => '"a"'
    :lexeme ~ <b> pause => after event => '"b"'
    :lexeme ~ <c> pause => after event => '"c"'
    :lexeme ~ <real d> pause => after event => '"d"'
    :lexeme ~ <insert d> pause => before event => 'insert d'
    :lexeme ~ <real e> pause => after event => '"e"'
    :lexeme ~ <f> pause => after event => '"f"'
    :lexeme ~ <h> pause => after event => '"h"'

    event '^test' = predicted test
    event 'test$' = completed test
    event '^start1' = predicted start1
    event 'start1$' = completed start1
    event '^start2' = predicted start2
    event 'start2$' = completed start2
    event '^mid1' = predicted mid1
    event 'mid1$' = completed mid1
    event '^mid2' = predicted mid2
    event 'mid2$' = completed mid2

    event '^a' = predicted a
    event '^b' = predicted b
    event '^c' = predicted c
    event 'd[]' = nulled d
    event 'd$' = completed d
    event '^d' = predicted d
    event '^e' = predicted e
    event 'e[]' = nulled e
    event 'e$' = completed e
    event '^f' = predicted f
    event 'g[]' = nulled g
    event '^g' = predicted g
    event 'g$' = completed g
    event 'g1[]' = nulled g1
    event 'g2[]' = nulled g2
    event 'g3[]' = nulled g3
    event 'g4[]' = nulled g4
    event '^h' = predicted h

    :discard ~ whitespace
    whitespace ~ [\s]+
    END_OF_DSL


    my $grammar = Marpa::R3::Scanless::G->new(
        {
            source            => \$dsl,
            semantics_package => 'My_Actions'
        }
    );
    my $slr = Marpa::R3::Scanless::R->new( { grammar => $grammar } );

    my $input = q{a b c "insert d here" e e f h};
    my $length = length $input;
    my $pos    = $slr->read( \$input );

    my $actual_events = q{};

    READ: while (1) {

        my @actual_events = ();

        my $next_lexeme;
        EVENT:
        for my $event ( @{ $slr->events() } ) {
            my ($name) = @{$event};
            if ($name eq 'insert d') {
               my (undef, $length) = $slr->pause_span();
               $next_lexeme = ['real d', 'd', $length];
            }
            push @actual_events, $name;
        }

        if (@actual_events) {
            $actual_events .= join q{ }, "Events at position $pos:", sort @actual_events;
            $actual_events .= "\n";
        }

        if ($next_lexeme) {
            $slr->lexeme_read(@{$next_lexeme});
            $pos = $slr->pos();
            next READ;
        }
        if ($pos < $length) {
            $pos = $slr->resume();
            next READ;
        }
        last READ;
    } ## end READ: while (1)

    my $expected_events = <<'=== EOS ===';
    Events at position 0: ^a ^test
    Events at position 1: "a" ^b ^start1 ^start2
    Events at position 3: "b" ^c ^mid1 start1$
    Events at position 5: "c" ^d ^mid2 start2$
    Events at position 6: insert d
    Events at position 21: ^e ^f d$ e[] mid1$ mid2$
    Events at position 23: "e" ^e ^f e$ e[]
    Events at position 25: "e" ^f e$
    Events at position 27: "f" ^h g1[] g2[] g3[] g4[] g[]
    Events at position 29: "h" test$
    === EOS ===

COPYRIGHT AND LICENSE ^

  Marpa::R3 is Copyright (C) 2017, Jeffrey Kegler.

  This module is free software; you can redistribute it and/or modify it
  under the same terms as Perl 5.10.1. For more details, see the full text
  of the licenses in the directory LICENSES.

  This program is distributed in the hope that it will be
  useful, but without any warranty; without even the implied
  warranty of merchantability or fitness for a particular purpose.
syntax highlighting: