pod/Scanless/R.pod - metacpan.org

# Copyright 2014 Jeffrey Kegler
# This file is part of Marpa::R2.  Marpa::R2 is free software: you can
# redistribute it and/or modify it under the terms of the GNU Lesser
# General Public License as published by the Free Software Foundation,
# either version 3 of the License, or (at your option) any later version.
#
# Marpa::R2 is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser
# General Public License along with Marpa::R2.  If not, see
# http://www.gnu.org/licenses/.

=head1 Name

Marpa::R2::Scanless::R - Scanless interface recognizers

=head1 Synopsis

=for Marpa::R2::Display
name: Scanless recognizer synopsis
partial: 1
normalize-whitespace: 1

    my $recce = Marpa::R2::Scanless::R->new( { grammar => $grammar } );
    my $self = bless { grammar => $grammar }, 'My_Actions';
    $self->{recce} = $recce;

    if ( not defined eval { $recce->read($p_input_string); 1 }
        )
    {
        ## Add last expression found, and rethrow
        my $eval_error = $EVAL_ERROR;
        chomp $eval_error;
        die $self->show_last_expression(), "\n", $eval_error, "\n";
    } ## end if ( not defined eval { $event_count = $recce->read...})

    my $value_ref = $recce->value( $self );
    if ( not defined $value_ref ) {
        die $self->show_last_expression(), "\n",
            "No parse was found, after reading the entire input\n";
    }

=for Marpa::R2::Display::End

=for Marpa::R2::Display
name: Scanless recognizer semantics
partial: 1
normalize-whitespace: 1

    package My_Actions;
    sub do_parens    { shift; return $_[1] }
    sub do_add       { shift; return $_[0] + $_[2] }
    sub do_subtract  { shift; return $_[0] - $_[2] }
    sub do_multiply  { shift; return $_[0] * $_[2] }
    sub do_divide    { shift; return $_[0] / $_[2] }
    sub do_pow       { shift; return $_[0]**$_[2] }
    sub do_first_arg { shift; return shift; }
    sub do_script    { shift; return join q{ }, @_ }

=for Marpa::R2::Display::End

=head1 About this document

This page is the reference document for the recognizer objects
of Marpa's SLIF (Scanless interface).

=head1 Internal and external scanning

The Scanless interface is so-called because
it does not require the application to supply a scanner (lexer).
The SLIF contains its own lexer,
one whose use is integrated into
its syntax.
In this document, use of the SLIF's
internal scanner is called
B<internal scanning>.

The SLIF allows applications that find it useful to
do their own scanning.
When an application
bypasses the SLIF's internal scanner
and does its own scanning,
this document calls it
B<external scanning>.
An application can use
external scanning to supplement internal 
scanning,
or to replace the SLIF's internal scanner entirely.

=head1 Locations

=head2 Input stream locations

An input stream location is the offset of a codepoint in the
input stream.
When the input stream is being treated as a string,
input stream location corresponds to Perl's C<pos()> location.
In this document, the word "location" refers
to location in the input stream
unless otherwise specified.

=head2 Negative locations

Several methods allow locations and lengths to be specified
as negative numbers.
A negative location is a location counted
from the end, so that -1 means location before the
last character of the string,
-2 the location before the second to last character of a string, etc.
A negative length indicates a distance to a location
counted from the end.
A length of -1 indicates the distance to the end of the string,
-2 indicates the distance to the location just before
the last character
of the string,
etc.

For example, suppose that we are dealing with input stream locations.
The span (C<0, -1>) is the entire input stream.
The span (C<-1, -1>) is the
last character of input stream.
The span (C<-2, -1>) is the
last two characters of the input stream.
The span (C<-2, 1>) is the 
second to last character of the input stream.

=head2 G1 locations

In addition to input stream location,
the SLIF also tracks G1 location.
G1 location starts at zero,
and increases by exactly one as each lexeme is read.
G1 location is usually not the same
as input stream location.
There is also a concept of G1 length,
which is simply
length calculated in terms of G1 locations.

G1 location can be ignored most of the time,
but it does become relevant to a small degree
when dealing
with ambiguous terminals,
and to a greater degree when tracing the G1 grammar.
(For those more familiar with Marpa's internals,
the G1 location is the G1 Earley set index.)

=head2 Current location

The SLIF tracks
the B<current location in the input stream>, more usually simply
called the B<current location>.
Locations are zero-based, so that location 0 is the start
of the input stream.
A location is said to B<point> to the character after it,
if there is such a character.
For example,
location 0 points to the first character of the input stream,
unless the stream is of zero length,
in which case there is no first character.

A current location equal to the length of the input stream
indicates EOS (end of stream).
In a zero length stream, location 0 is EOS.
The EOS location never points to a character.

In the SLIF, when the current input stream location
moves, it does not necessarily advance -- it
can skip forward,
or can be positioned to an earlier location.
The application can skip sections of the input stream.
The application is also free to
revisit spans of the input stream as often as it wants.

Here are the guarantees:

=over 4

=item *

Initially, the current location is 0.

=item *

The current location will never be negative.

=item *

The current location will never be greater than EOS.

=back

=head2 Literals and G1 spans

Often it is useful to find the literal substring of the input which corresponds to a span
of G1 locations.
If an application reads the input monotonically within the G1 span
this presents no complications.

"Monotonically" here means that,
for the G1 span C<$g1_start, $g1_length>,
the application reads the G1 locations
in sequence and one-by-one, starting at
C<$g1_start> and ending at C<$g1_start+$g1_length>.
This is the usual case.

Reading the input monotonically is the default,
and by far the most common
case.
But Marpa applications
are free to skip forward in the stream,
to jump backward,
to reread the same input multiple times, etc., etc.
It is entirely possible the final input stream location of a G1 span 
will be before the start of the G1 span.

In precise terms,
the substring returned
for a G1 span C<$g1_start, $g1_length>
is determined as follows:
The string
will start at the first input stream location in the span for
G1 location C<$g1_start+1>.
The end of the string will be at
the last input stream location in the span for
G1 location C<$g1_start+$g1_length>.
When an application moves backward in the input,
the end of the string,
as calculated above,
may be before the start of the string.
When the end of a string is before its start,
the substring returned will be the zero-length string.

Applications which do not read monotonically,
but which also
want to associate spans of G1 locations with the input stream,
may need to reassemble the input based on their own ideas.
The L</"literal()"> method can assist in this process.

=head1 How internal scanning works

The SLIF always starts scanning using the L<C<read()>|/"read()">
method.
Pedantically, this means scanning always begins with a phase
of internal scanning.
But that first phase may be of zero length,
and after that,
internal scanning does not have to be resumed.

Internal scanning can be resumed with the L<C<resume()>|/"resume()"> method.
Both the L<C<read()>|/"read()"> and L<C<resume()>|/"resume()"> methods require the application
to specify a span in the input stream.
The L<C<read()>|/"read()"> method sets the input stream,
and that input stream is the one used by all L<C<resume()>|/"resume()"> method calls
for that recognizer.

In what follows, the term "internal scanning method"
refers to either the L<C<read()>|/"read()"> or the L<C<resume()>|/"resume()"> method.
After an internal scanning method,
the current location will indicate how far
in the input stream the internal scanning method actually read.
If the internal scanning method paused before EOS,
the current
location will be the one at which it paused.
If the internal scanning method pauses at EOS,
the current location will be EOS.
The return value of the L<C<read()>|/"read()"> and the L<C<resume()>|/"resume()"> method is
the current location.

=head2 EOS

The location of EOS depends on the
C<$start> and C<$length> arguments
to the last
internal scanning method,
and on the length of the input string.

=over 4

=item *

If the C<$length> argument of the last internal scanning method was
non-negative,
EOS will be at C<$start+$length>.

=item *

If the C<$length> argument was negative,
EOS will be at
C<$length + 1 + length $input_string>.

=item *

The default length for the internal scanning methods is always -1,
so that the default EOS is always at
C<length $input_string>,
the end of the input string.

=back

=head2 Pauses in internal scanning

When a L<C<read()>|/"read()"> and the L<C<resume()>|/"resume()"> method pauses,
one of more of the following occurred.

=over 4

=item * A named event

One or more named events may have triggered.
Named events are
created by
L<named event statements|Marpa::R2::Scanless::DSL/"Named event statement">.
They can also be created by
L<lexeme pseudo-rules|Marpa::R2::Scanless::DSL/"Lexeme pseudo-rules">.
Named events may be queried using
L<the events() method()|/"events()">.

=item * A unnamed lexeme pause event

A lexeme pause that is not a named event may have triggered.
Lexeme pauses are created by
L<lexeme pseudo-rules|Marpa::R2::Scanless::DSL/"Lexeme pseudo-rules">.
Applications can always name lexeme pause events, using the
L<event adverb|Marpa::R2::Scanless::DSL/"event">,
and are strongly encouraged to do so.
If all lexeme pauses are named,
the check for unnamed events can be omitted.
The presence or absence of an unnamed lexeme pause event
may be checked for using
L<the lexeme_pause() method|/"lexeme_pause()">.

=item * EOS

EOS may have been reached.
This may be checked for by comparing the current location
with the expected EOS.

=back

=head1 The input stream

For error message and other purposes,
even external lexemes are required to correspond to a span of the input stream.
An external scanner
must set up a relationship to the input stream,
even if that relationship is completely artificial.

One way to do this is to put an artificial preamble in front of
the input stream.
For example, the first 7 characters of the input stream could be
a preamble containing the characters "C<NO TEXT>".
This preamble could be immediately followed by what is seen as the text
from a more natural point of view.
In this case, the initial call to the L<C<read()>|/"read()"> method
could take the form
C<< $slr->read($input_string, 7) >>.
Lexemes corresponding to the artificial preamble would be read
using a method call similar to
C<< $slr->lexeme_read($symbol_name, 0, 7, $value) >>.

=head1 Constructor

=for Marpa::R2::Display
name: Scanless recognizer synopsis
partial: 1
normalize-whitespace: 1

    my $recce = Marpa::R2::Scanless::R->new( { grammar => $grammar } );

=for Marpa::R2::Display::End

The C<new()> method is the constructor for SLIF recognizers.
The C<new()> constructor accepts a hash of named arguments.
The L<C<grammar>|/"grammar"> named argument is required.
All other named arguments are optional.

The following named arguments are allowed:

=head2 end

Most users will want to ignore this argument.
It is an advanced argument, mainly for use in testing.
The L<C<end>|/"end"> named argument
specifies the parse end, as a G1 location.
The default is for the parse to end where the input did,
so that the parse returned is of the entire input.
The L<C<end>|/"end"> named argument is not allowed
once a parse series has begun.

=head2 grammar

The C<new> method is required to have
a C<grammar> named argument.  Its
value must be
an SLIF grammar object.

=head2 max_parses

If non-zero, causes a fatal error when that number
of parse results is exceeded.
C<max_parses> is useful to
limit CPU usage and output length when testing
and debugging.
Stable and production applications may
prefer to count the number of parses,
and take a less Draconian response when the
count is exceeded.

The value must be an integer.
If it is zero, there will be no
limit on the number of parse results returned.
The default is for
there to be no limit.

=head2 ranking_method

The value must be a string:
one of "C<none>",
"C<rule>",
or "C<high_rule_only>".
When the value is "C<none>", Marpa returns the parse results
in arbitrary order.
This is the default.
The C<ranking_method> named argument is not allowed
once evaluation has begun.

The "C<rule>"
and "C<high_rule_only>" ranking methods
allows the user
to control the order
in which parse results are returned by
the C<value> method,
and to exclude some parse results from the parse series.
For details, see L<the document
on parse order|Marpa::R2::Semantics::Order>.

=head2 semantics_package

Sets the semantic package for the recognizer.
The setting of this argument takes precedence
over any package implied by the blessing of the per-parse arguments to
the SLIF recognizer's C<value()> method.
The semantics package is used when resolving action names to
fully qualified Perl names.
For more details on the SLIF semantics,
see the L<document on SLIF
semantics|Marpa::R2::Semantics>.

=head2 too_many_earley_items

The C<too_many_earley_items> argument is optional,
and very few applications will need it.
If specified, it sets the B<Earley item warning threshold> to
a value other than its default.
If an Earley set becomes larger than the
Earley item warning threshold,
a recognizer event is generated,
and
a warning is printed to the trace file handle.

Marpa parses from any BNF,
and can handle grammars and inputs which produce very large
Earley sets.
But parsing that involves very large Earley sets can be slow.
Large Earley sets
are something most applications can,
and will wish to, avoid.

By default, Marpa calculates
an Earley item warning threshold
for the G1 recognizer
based on the size of the
G1 grammar,
and for each L0 recognizer based on the size
of the L0 grammar.
The default thresholds will never be less than 100.
The default is the result of considerable experience
and almost all users will be happy with it.

If the
Earley item warning threshold is changed from its default,
the change applies to both L0 and G1 -- currently
there is no way to set them separately.
If the Earley item warning threshold is set to 0,
no recognizer event is generated,
and
warnings about large Earley sets are turned off.
An Earley item threshold warning almost always
indicates a serious issue,
and turning these warnings off will
rarely be what an application wants.

=head2 trace_terminals

If non-zero, traces the lexemes --
those tokens passed from the L0 parser to
the G1 parser.
This named argument is the best way to follow
what the L0 parser is doing,
and it is also very helpful for tracing the G1 parser.

=head2 trace_values

The trace_values named argument is a numeric trace level.  If the
numeric trace level is 1, Marpa prints tracing information as values
are computed in the evaluation stack.  A trace level of 0 turns
value tracing off, which is the default. Traces are written to the
trace file handle.

=head2 trace_file_handle

The value is a file handle.
Trace output and warning messages
go to the trace file handle.
By default, the trace file handle is inherited from the
grammar.

=head1 Basic mutators

=head2 ambiguous()

=for Marpa::R2::Display
name: Tutorial 2 synopsis
partial: 1
normalize-whitespace: 1

    if ( my $ambiguous_status = $recce->ambiguous() ) {
        die "Parse is ambiguous\n", $ambiguous_status;
    }

=for Marpa::R2::Display::End

This method should be called after the C<read()> method.
If there is exactly one parse, it returns the empty string.
If there is no parse, it returns a non-empty string indicating that fact.
If there are two or more parses,
it returns a non-empty string describing the ambiguity.

Applications should only test the returned string to see if it is
empty or non-empty.
The non-empty strings are intended only for reading by
humans -- their exact format is subject to change.

=head2 read()

=for Marpa::R2::Display
name: Scanless recognizer synopsis
partial: 1
normalize-whitespace: 1

    $recce->read($p_input_string);

=for Marpa::R2::Display::End

=for Marpa::R2::Display
name: SLIF external read example
partial: 1
normalize-whitespace: 1

    $recce->read( \$string, 0, 0 );

=for Marpa::R2::Display::End

Given a pointer to an input stream,
C<read()> parses it according to the grammar.
Only a single call to C<read()>
is allowed for a scanless
recognizer.

C<read()> recognizes
optional second and third arguments.
The second argument is a location in the input stream
at which internal scanning will start.
The third argument is the length of the section
of the input stream to be scanned before pausing.
The default start location is zero.
The default length is -1.
Negative locations and lengths have the standard
interpretation, as L<described above|/"Negative locations">.

Start location and length can both be zero.
This pauses internal scanning immediately and can be
used to hand complete control of scanning
over to an external scanner.

L<Completion named events|Marpa::R2::Scanless::DSL/"Completion event statement">
can occur during the C<read()> method.
When a named event occurs, 
the C<read()> method pauses.
Named events can be queried using
L<the Scanless recognizer's events()
method|/"events()">.
The C<read()> method also
pauses as specified with the
L<Scanless DSL's pause adverb|Marpa::R2::Scanless::DSL/"pause">.

A parse is said to be exhausted if, based on the input
read so far, there is no way for it
to continue successfully.
Exhaustion is not a problem if that Marpa has read
all the way to the end of the input,
or if it is pausing for some other reason.
Otherwise,
C<read()> treats an exhausted parse as a failure.

On failure, C<read()> throws an exception.
The call is considered successful
if it ended because a parse was found,
or because internal scanning was paused.
On success, C<read()> returns the location in the input
stream at which internal scanning ended.
This value may be zero.

=head2 series_restart()

=for Marpa::R2::Display
name: SLIF recognizer series_restart() synopsis
normalize-whitespace: 1

    $slr->series_restart( { end => $i } );

=for Marpa::R2::Display::End

The C<series_restart()> method ends the current parse series,
and starts another.
It allows, as optional arguments, hashes of named arguments
for the SLIF recognizer.
These named arguments can be any of those allowed by
L<the C<set()> method|/"set()>.

C<series_restart()> resets all the named arguments to their defaults.
An application that wants a non-default named argument to have effect
in each of its parse
series must respecify it at the beginning of each parse series.
C<series_restart()> is particularly useful for
the 
C<end> and C<semantics_package> named arguments,
which cannot be changed once a parse series is underway.
To change their values,
an application must start a new parse series.

=head2 set()

=for Marpa::R2::Display
name: SLIF recognizer set() synopsis
normalize-whitespace: 1

    $slr->set( { max_parses => 42 } );

=for Marpa::R2::Display::End

This method allows the named arguments to be changed after an SLIF
grammar is created.
Currently, the arguments that may be changed are L<C<end>|/"end">,
L<C<max_parses>|/"max_parses">,
L<C<semantics_package>|/"semantics_package"> and
L<C<trace_file_handle>|/"trace_file_handle">.

=head2 value()

=for Marpa::R2::Display
name: Scanless recognizer synopsis
partial: 1
normalize-whitespace: 1

    my $value_ref = $recce->value( $self );

=for Marpa::R2::Display::End

The C<value> method call evaluates the next parse tree
in the parse series,
and returns a reference to the parse result for that parse tree.
If there are no more parse trees,
the C<value> method returns C<undef>.

Because Marpa parses ambiguous grammars, every parse
is a series of zero or more parse trees.
This series of zero or more parse trees is called a B<parse series>.
There are zero parse trees if there was no valid parse
of the input according to the grammar.

The C<value()> method allows one, optional argument.
This argument can be a Perl scalar of any kind, but the most useful
possibilities are references (blessed or unblessed) to hashes or array.
If provided, the argument of the C<value()> method
explicitly specifies the per-parse argument for the
parse tree.
The per-parse argument will be the first argument of all 
Perl semantics closures, and can be used to share data within
the tree,
when that data does not conveniently fit into the bottom-up
flow of parse tree evaluation.
Symbol tables are one example of the kind of data which parses often
require, but which it is not convenient to accumulate bottom-up.

If the L<C<semantics_package>|/"semantics_package"> named argument of the SLIF
recognizer was not specified,
Marpa will use the package into which the per-parse argument was blessed
as the semantics package --
the package in which to look for the parse's Perl semantic closures.
In this case, Marpa will regard the per-parse arguments of all
calls in the same parse series as the source of the semantics package,
and it will require that the calls be consistent --
each call must have a per-parse argument,
and that per-parse argument  must be blessed into the semantics package.

=head1 Mutators for external scanning

=head2 activate()

=for Marpa::R2::Display
name: SLIF activate() method synopsis
partial: 1
normalize-whitespace: 1

        $slr->activate($_, 0) for @events;

=for Marpa::R2::Display::End

The C<activate()> method allows the recognizer to deactivate and reactivate named
events.
Named events allow the recognizer to stop for external scanning at conveniently
defined locations.
Named events can be defined for the prediction and completion of non-zero-length symbols,
and nulled events can be defined to trigger when zero-length symbols are
recognized.

The C<activate()> method takes two arguments.
The first is the name of an event, and the second (optional) argument is
0 or 1.
If the argument is 0, the event is deactivated.
If the argument is 1, the event is reactivated.
An argument of 1 is the default.
but,
since an SLIF recognizer always starts with all defined events
activated,
0 will probably be more common as the second argument to
C<activate()>

Location 0 events are triggered in the SLIF recognizer's 
constructor,
before the C<activate()> method can be called.
This means that currently there is no way to deactivate
location zero events.

The overhead imposed by events
can be reduced by using the C<activate()> method.
But making many calls to the
the C<activate()> method purely for efficiency
purposes will be counter-productive.
Also, deactivated events still impose
some overhead, so if an event is never used
it should be commented out in the SLIF DSL.

=head2 lexeme_alternative()

=for Marpa::R2::Display
name: SLIF lexeme_alternative() example
partial: 1
normalize-whitespace: 1

            if ( not defined $recce->lexeme_alternative($token_name) ) {
                die
                    qq{Parser rejected token "$long_name" at position $start_of_lexeme, before "},
                    substr( $string, $start_of_lexeme, 40 ), q{"};
            }

=for Marpa::R2::Display::End

The C<lexeme_alternative()> method
allows an external scanner to read
ambiguous tokens.
Most applications 
will prefer the simpler L<C<lexeme_read()>|/"lexeme_read()">.

C<lexeme_alternative()> takes one or two arguments.
The first argument,
which is required,
is the name of a symbol to be read
at the current location.
The second argument,
which is optional,
is the value of the symbol.
The value argument is interpreted as described for C<lexeme_read()>.

Any number of tokens may be read using C<lexeme_alternative()>
without advancing the current location.
This allows an application to use ambiguous tokens.
To complete reading at a G1 location,
and advance the current G1 location to the next G1 location,
use the L<C<lexeme_complete()>|/"lexeme_complete()"> method.

On success, returns a non-negative number.
Returns C<undef> if the token was rejected.
Failures are thrown as exceptions.

=head2 lexeme_complete()

=for Marpa::R2::Display
name: SLIF lexeme_alternative() example
partial: 1
normalize-whitespace: 1

            next TOKEN
                if $recce->lexeme_complete( $start_of_lexeme,
                        ( length $lexeme ) );

=for Marpa::R2::Display::End

The C<lexeme_complete()> method allows an external scanner to read
ambiguous tokens.
Most applications will prefer the simpler C<lexeme_read()>.

The C<lexeme_complete()> method
requires two arguments,
a input stream start location and a length.
These are interpreted as described for the
corresponding second and third
arguments to L<C<lexeme_read()>|/"lexeme_read()">.
The C<lexeme_complete()>
method completes the reading of
alternative tokens at the current G1 location,
and advances the current G1 location by one.
Current location in the input stream is moved
to the location after the new lexeme,
as indicated by the arguments.

L<Completion named events|Marpa::R2::Scanless::DSL/"Completion event statement">
can occur during the C<lexeme_complete()> method.
Named events can be queried using
L<the Scanless recognizer's events()
method|/"events()">.

B<Return value:>
On success, C<lexeme_complete()>
returns the new current location.
This will never be location zero, because a succesful
call of C<lexeme_complete()> always advances the location.
On unthrown failure, C<lexeme_complete()> returns 0.

=head2 lexeme_read()

=for Marpa::R2::Display
name: SLIF read/resume example
partial: 1
normalize-whitespace: 1

    $re->lexeme_read( 'lstring', $start, $length, $value ) // die;

=for Marpa::R2::Display::End

The C<lexeme_read()> method reads a single, unambiguous, lexeme.
It takes four arguments, only the first of which is required.
The first argument is the lexeme's symbol name.
The second and third arguments specify the span in the input
stream to be associated with the lexeme.
The last argument indicates its value.

The second and third arguments are, respectively,
the start and length of a span in the input stream.
The start defaults to the current location.
If the pause span is defined,
and the start of the pause lexeme is
the same as the current location,
length defaults to the length of the pause span.
Otherwise length defaults to -1.

Negative values are allowed and are interpreted
as L<described above|/"Negative locations">.
This span will be treated as the section of the input stream
that corresponds to the tokens read at the current location.
This correspondence may be artificial, but a span must
always be specified.

The fourth argument specifies the value of the lexeme.
If the value argument is omitted,
the token's value will be a string
containing the corresponding substring
of the input stream.
Omitting the value argument does not have the same
effect as passing an explicit Perl C<undef>.
If the value argument is an explicit Perl C<undef>,
the value of the lexeme will be a Perl C<undef>.

=for Marpa::R2::Display
ignore: 1

    $slr->lexeme_read($symbol, $start, $length, $value)

=for Marpa::R2::Display::End

is the equivalent of

=for Marpa::R2::Display
ignore: 1

    $slr->lexeme_alternative($symbol, $value)
    $slr->lexeme_complete($start, $length)

=for Marpa::R2::Display::End

Current location in the input stream is moved
to the place where C<read()> paused or,
if it never pauses,
to C<$start+$length>.
Current G1 location is advanced by one.

L<Completion named events|Marpa::R2::Scanless::DSL/"Completion event statement">
can occur during the C<lexeme_read()> method.
Named events can be queried using
L<the Scanless recognizer's events()
method|/"events()">.

B<Return value>:
On success, C<lexeme_read()> returns the new current location.
This will never be location zero, because lexemes cannot be zero length.
If the token was rejected, returns a Perl C<undef>.
On other unthrown failure, returns 0.

=head2 resume()

=for Marpa::R2::Display
name: SLIF read/resume example
partial: 1
normalize-whitespace: 1

    my $re = Marpa::R2::Scanless::R->new(
        {   grammar           => $parser->{grammar},
            semantics_package => 'MarpaX::JSON::Actions'
        }
    );
    my $length = length $string;
    for (
        my $pos = $re->read( \$string );
        $pos < $length;
        $pos = $re->resume()
        )
    {
        my ( $start, $length ) = $re->pause_span();
        my $value = substr $string, $start + 1, $length - 2;
        $value = decode_string($value) if -1 != index $value, '\\';
        $re->lexeme_read( 'lstring', $start, $length, $value ) // die;
    } ## end for ( my $pos = $re->read( \$string ); $pos < $length...)
    my $per_parse_arg = bless {}, 'MarpaX::JSON::Actions';
    my $value_ref = $re->value($per_parse_arg);
    return ${$value_ref};

=for Marpa::R2::Display::End

The C<resume()> method takes two arguments,
a start location and a length.
The default start location is the current location.
The default length is -1.
Negative arguments are interpreted
as L<described above|/"Negative locations">.

The C<resume()> method resumes
the SLIF's internal scanning,
L<as described
above|/"How internal scanning works">.

L<Completion named events|Marpa::R2::Scanless::DSL/"Completion event statement">
can occur during the C<resume()> method.
When a named event occurs, 
the C<resume()> method pauses.
Named events can be queried using
L<the Scanless recognizer's events()
method|/"events()">.
The C<resume()> method also
pauses as specified with the
L<Scanless DSL's pause adverb|Marpa::R2::Scanless::DSL/"pause">.

On success, C<resume()> moves
the current location to where it paused,
or to the EOS.
The return value is the new current location.
On unthrown failure,
C<resume()> return a Perl C<undef>.

=head1 Accessors

=head2 ambiguity_metric()

=for Marpa::R2::Display
name: Scanless ambiguity_metric() synopsis

    my $ambiguity_metric = $slr->ambiguity_metric();

=for Marpa::R2::Display::End

Returns 1 if there is an unambiguous parse,
and 2 or greater if there is a ambiguous parse.
Returns 0 if called before parsing.
Returns 0 or less than zero on other unthrown failure.

=head2 current_g1_location()

=for Marpa::R2::Display
name: Scanless current_g1_location() synopsis

    my $current_g1_location = $slr->current_g1_location();

=for Marpa::R2::Display::End

Returns the current G1 location.

=head2 events()

=for Marpa::R2::Display
name: SLIF events() method synopsis
normalize-whitespace: 1


        EVENT:
        for my $event ( @{ $slr->events() } ) {
            my ($name) = @{$event};
            push @actual_events, $name;
        }

=for Marpa::R2::Display::End

The C<events()> method takes no arguments,
and returns an array of event descriptors.
It returns the empty array
if there were no event.

Each named event descriptor is a reference to an array of one,
and potentially more, elements.
The first element of every named event descriptor is a string
containing the name of the event,
and this is typically the only element.
In certain cases, there could be
other elements of a named event descriptor,
which will be
as described for the type of named event.
Named events are
L<described in the SLIF
DSL|Marpa::R2::Scanless::DSL/"Named event statement">.

Events occur during the
L<the Scanless recognizer's read()|/"read()">,
L<resume()|/"resume()">,
L<lexeme_complete()|/"lexeme_complete()">,
and L<lexeme_read()|/"lexeme_read()">
methods.
Any subsequent call
to an SLIF recognizer mutator may clear the list
of triggered events,
The assumption is that
an application interested in events
will call
the C<events()> method almost as soon as
control is returned to it.

Named events are returned in order by type.
Completion events are first.
They are followed by the nulled events.
These are in turn followed by prediction events.
Within each type,
the order of events is arbitrary.

Applications may find it convenient to
turn specific events off,
temporarily or permanently.
Events may be activated or deactivated
with the SLIF recognizer's
L<activate() method|/"activate()">.

=head2 exhausted()

=for Marpa::R2::Display
name: $slr->exhausted example

    my $exhausted_status = $slr->exhausted();

=for Marpa::R2::Display::End

The exhausted method returns a Perl true if parsing in a SLIF
recognizer is
exhausted, and a Perl false otherwise. Parsing is exhausted when the
recognizer will not accept any further input.

An attempt to read input into an
exhausted parser causes an exception to be thrown.
The exception is all that most applications require,
but this method
allows the recognizer's exhaustion status to be discovered directly.

=head2 g1_location_to_span()

=for Marpa::R2::Display
name: Scanless g1_location_to_span() synopsis

        my ( $span_start, $span_length ) =
            $slr->g1_location_to_span($g1_location);

=for Marpa::R2::Display::End

G1 locations do not correspond to a single input stream
location, but to a span of them.
The C<g1_location_to_span()> method
returns an B<array> of two elements, representing a span in the input stream.
The first element of the array is the input stream location
where the span starts.
The second element of the array is the length of the span.
As a special case,
the input stream span for G1 location 0 is always (0,0).

Sometimes it is convenient to think of
G1 location as corresponding to a single input stream location.
When this is the case,
what is usually intended is the last input stream location
of the span.
The last input stream location
of the span
will always be C<$span_start+$span_length>.

=head2 input_length()

=for Marpa::R2::Display
name: SLIF input_length() example
partial: 1
normalize-whitespace: 1

    my $input_length = $slr->input_length();

=for Marpa::R2::Display::End

The C<input_length()> method accepts no arguments,
and returns the length of the input stream.

=head2 last_completed()

=for Marpa::R2::Display
name: Scanless recognizer diagnostics
partial: 1
normalize-whitespace: 1

    sub show_last_expression {
        my ($self) = @_;
        my $recce = $self->{recce};
        my ( $g1_start, $g1_length ) = $recce->last_completed('Expression');
        return 'No expression was successfully parsed' if not defined $g1_start;
        my $last_expression = $recce->substring( $g1_start, $g1_length );
        return "Last expression successfully parsed was: $last_expression";
    } ## end sub show_last_expression

=for Marpa::R2::Display::End

=for Marpa::R2::Display
name: Scanless recognizer diagnostics
partial: 1
normalize-whitespace: 1

    my ( $g1_start, $g1_length ) = $recce->last_completed('Expression');

=for Marpa::R2::Display::End

Given the name of a symbol,
returns the start G1 location and
the length in G1 locations of the most recent match.
If there was more than one most recent match, it returns
the longest.
If there was no match, returns the empty array in array context
and a Perl false in scalar context.

=head2 line_column()

=for Marpa::R2::Display
name: SLIF trace example
partial: 1
normalize-whitespace: 1

    my ( $start, $span_length ) = $re->pause_span();
    my ( $line,  $column )      = $re->line_column($start);

=for Marpa::R2::Display::End

The C<line_column()> method accepts one, optional, argument:
a location in the input stream.
The location defaults to the current location.
C<line_column()> returns the corresponding line and column position,
as a 2-element array.
The first element of the array is the line position,
and the second element is the column position.

Numbering of lines and columns is 1-based,
following UNIX editor tradition.
Except at EOF, the line and column will be that of an
actual character.
At EOF the line number
will be that of the last line,
and the column number will be that of the last column
plus one.
Applications which want to treat EOF as a special case
can test it for using the L<C<pos()> method|/"pos()">
and the L<C<input_length()> method|/"input_length()">.

A line is considered to end with any newline sequence
as defined in the
Unicode Specification 4.0.0, Section 5.8.
Specifically, a line ends with one of the following:

=over 4

=item *

a LF (line feed U+000A);

=item * 

a CR (carriage return, U+000D), when it is not followed by a LF;

=item *

a CRLF sequence (U+000D,U+000A);

=item *

a NEL (next line, U+0085);

=item *

a VT (vertical tab, U+000B);

=item *

a FF (form feed, U+000C);

=item *

a LS (line separator, U+2028) or

=item *

a PS (paragraph separator, U+2029).

=back

=head2 literal()

=for Marpa::R2::Display
name: SLIF trace example
partial: 1
normalize-whitespace: 1

    my $literal_string = $re->literal( $start, $span_length );

=for Marpa::R2::Display::End

The C<literal()> method accepts two arguments,
the start location and length of a span in the
input stream.
It returns the substring of the input stream
corresponding to that span.

=head2 pause_lexeme()

=for Marpa::R2::Display
name: SLIF trace example
partial: 1
normalize-whitespace: 1

   my $lexeme = $re->pause_lexeme();

=for Marpa::R2::Display::End

The C<pause_lexeme()> method accepts no arguments,
and returns the name of the lexeme which caused the most
recent pause.
The pause lexeme is initially undefined
and it is reset to undefined at the beginning of
each call to the L<C<read()>|/"read()"> or L<C<resume()>|/"resume()"> methods.


More than one lexeme may cause a pause.
When this is the case,
all the causal lexemes will be
acceptable to the G1 grammar,
and all the causal lexemes will have
the same lexeme priority.
When more than one lexeme causes a pause,
the choice of pause
lexeme is arbitrary.
Applications may not rely on a particular choice,
or on that choice being repeated,
even when the choice is made
in similar or identical circumstances.

Not every pause is caused by a lexeme.
A pause often occurs because
of the length argument of an internal scanning method.
When the most recent pause was not caused by a lexeme,
the pause lexeme is undefined.
C<pause_lexeme()> returns a Perl C<undef> when
the pause lexeme is undefined.

=head2 pause_span()

=for Marpa::R2::Display
name: SLIF read/resume example
partial: 1
normalize-whitespace: 1

    my ( $start, $length ) = $re->pause_span();

=for Marpa::R2::Display::End

The C<pause_span()> method accepts no arguments,
and returns the "pause span" as a 2-element array.
The "pause span" is
the start location and length of the lexeme which caused
the most recent pause.
The pause span is initially undefined
and it is reset to undefined at the beginning of
each call to the L<C<read()>|/"read()"> or L<C<resume()>|/"resume()"> methods.

A pause is not always caused by a lexeme -- internal
scanning may be paused
because of the length argument of an internal scanning method.
When the most recent pause was not caused by a lexeme,
no span can be associated with it,
and the pause span is undefined.
C<pause_span()> returns a Perl C<undef> if
the pause span is undefined.

=head2 pos()

=for Marpa::R2::Display
name: SLIF pos() example
partial: 1
normalize-whitespace: 1

    my $pos = $slr->pos();

=for Marpa::R2::Display::End

The C<pos()> method accepts no arguments,
and returns the current input stream location.

=head2 progress()

=for Marpa::R2::Display
name: Scanless progress() synopsis

    my $progress_output = $slr->progress();

=for Marpa::R2::Display::End

Returns an array that describes the progress
of a parse
at a location.
With no argument, C<progress()> reports progress at
the current location.
If a G1 location is
given as its argument,
C<progress()> reports progress at that G1 location.
The G1 location may be negative.
An argument of I<-X>
will be interpreted as location I<N+X+1>, where I<N> is
the current G1 location.
In other words, an argument of -1 indicates the current G1 location,
an argument of -2 indicates the G1 location just before
the current one, etc.

The progress reports returned by
the C<progress()> method
identify rules by their G1 rule ID.
G1 rules IDs can be converted to a list of the rule's
symbols using the L<C<rule()> method
of the SLIF grammar|Marpa::R2::Scanless::G/"rule()">.
Details on progress reports can be found in
L<their own document|Marpa::R2::Progress>.

=head2 show_progress()

=for Marpa::R2::Display
name: Scanless show_progress() synopsis
partial: 1
normalize-whitespace: 1

    my $show_progress_output = $slr->show_progress();

=for Marpa::R2::Display::End

Shows the progress of the G1 parse.
For a description of its output,
see L<Marpa::R2::Progress>.

With no arguments,
the string contains reports for
the current location.
If locations are specified as arguments to
C<show_progress()>, they need to be 
G1 locations.

With a single integer argument I<N>,
the string contains reports for G1 location I<N>.
With two numeric arguments, I<N> and I<M>, the arguments are interpreted
as the start and end points of a range of G1 locations
and the returned string contains
reports for all locations in the range.

If an argument is negative,
I<-N>,
it indicates
the I<N>th location counting backward
from the furthest location of the parse.
For example, if 42 was the furthest G1 location,
-1 would be G1 location 42 and -2 would be location 41.
For example, the method call
C<< $recce->show_progress(-3, -1) >>
returns reports for the last three G1 locations of the parse.
The method call C<< $recce->show_progress(0, -1) >>
will print progress reports for the entire parse.

Locations are G1 locations instead of string offsets,
for two reasons.
First, G1 parse state is only defined at the start of parsing,
and at the end of a non-discarded lexeme.
Therefore many strings offsets will not have a G1 parse state.
Second, SLIF recognizers using external scanning are allowed
to rescan the same string repeatedly.
Therefore, a single string offset may have many
G1 parse states.

=head2 substring()

=for Marpa::R2::Display
name: Scanless recognizer diagnostics
partial: 1
normalize-whitespace: 1

    my $last_expression = $recce->substring( $g1_start, $g1_length );

=for Marpa::R2::Display::End

Given a G1 span -- that is, a G1 start location and a length in G1 locations --
the C<substring()> method
returns a substring of the input
stream.
A G1 length of zero will produce the zero-length string.

The substring of the input stream is determined on the assumption
that the application reads the input monotonically.
When this is not the case, the substring is determined as
L<described above|/"Literals and G1 spans">.

=head2 terminals_expected()

=for Marpa::R2::Display
name: Scanless terminals_expected() synopsis
normalize-whitespace: 1

    my @terminals_expected = @{$slr->terminals_expected()};

=for Marpa::R2::Display::End

Returns a reference to a list of strings, where the strings are the
names of the lexemes acceptable at the current location.
The presence of a lexeme in this list means
that lexeme will be acceptable in the next call of the L<C<resume()>|/"resume()"> method.

This is highly useful for Ruby Slippers parsing.
A more fine-tuned approach is to identify the lexemes of interest
and create "predicted symbol" events for them.

=head1 Discouraged methods

Methods in this section continue to be supported, but their use is
discouraged in favor of other, better solutions.
New applications should avoid using discouraged methods.

=head2 event()

=for Marpa::R2::Display
name: SLR event() method synopsis
normalize-whitespace: 1
partial: 1

            my $event    = $slr->event($event_ix);

=for Marpa::R2::Display::End

Use of this method is discouraged in favor of the more efficient
L<events() method|/"events()">.
The C<event()> method requires one argument,
an event index.
It returns a descriptor of
the named event with that index, or a Perl C<undef>
if there is no such event.
For more details on events, see the 
L<description of the events() method|/"events()">.

=head2 last_completed_range()

Use of this method is discouraged in favor of 
L</"last_completed()">.
Given the name of a symbol,
C<last_completed_range()>
returns the G1 start and G1 end locations of the most recent match.
If there was more than one most recent match,
C<last_completed_range()>
returns the longest.
If there was no match,
C<last_completed_range()>
returns the empty array in array context
and a Perl false in scalar context.

=head2 range_to_string()

Use of this method is discouraged in favor of 
L</"substring()">.
Given a G1 start and a G1 end location,
C<range_to_string()>
returns the substring of the input
stream that is between the two.
The C<range_to_string()> method 
assumes that
the application read forward smoothly in the input stream,
while reading
the sequence of G1 locations.
When that is not the case,
C<range_to_string()> behaves in
much the same way as described above
for L</"substring()">.

=head1 Copyright and License

=for Marpa::R2::Display
ignore: 1

  Copyright 2014 Jeffrey Kegler
  This file is part of Marpa::R2.  Marpa::R2 is free software: you can
  redistribute it and/or modify it under the terms of the GNU Lesser
  General Public License as published by the Free Software Foundation,
  either version 3 of the License, or (at your option) any later version.

  Marpa::R2 is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser
  General Public License along with Marpa::R2.  If not, see
  http://www.gnu.org/licenses/.

=for Marpa::R2::Display::End

=cut

# Local Variables:
#   mode: cperl
#   cperl-indent-level: 4
#   fill-column: 100
# End:
# vim: expandtab shiftwidth=4:
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)