The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Name

Marpa::R2::Exhaustion - Parse exhaustion in the SLIF

About this document

This page is part of the reference documents for the recognizer objects of Marpa's SLIF (Scanless interface). It contains a detailed discussion of parse exhaustion.

Exhaustion

At bottom, parse exhaustion is a simple concept. The recognizer may reach a point where there is simply no way to continue successfully. Regardless of what it reads next, the parse will fail. When this happens, the parse is said to be exhausted.

Parse exhaustion is sometimes confused with parse failure. But parse exhaustion is also sometimes confused with parse success. That is because there can be a strong association either way, depending on the kind of application a user is most focused on. The problem is that there are two very different kinds of application: exhaustion-loving and exhaustion-hating. Both are very important.

Hate and love

In an exhaustion-hating application, parse exhaustion is typically parse failure. C programs, Perl scripts and most programming languages are exhaustion-hating applications. If a C program is well-formed, it is always possible to read more input. The same is true of a Perl program that does not have a __DATA__ section.

In an exhaustion-loving applications parse exhaustion means parse success. A toy example of an exhaustion-loving application is the language consisting of balanced parentheses. When the parentheses come into perfect balance the parse is exhausted, because any further input would unbalance the brackets. And the parse succeeds when the parentheses come into perfect balance. Exhaustion means success.

Any language which balances start and end indicators will tend to be exhaustion-loving. HTML and XML, with their start and end tags, can be seen as exhaustion-loving languages.

For many languages, it's not strictly love or hate. I mentioned Perl's __DATA__ as a complication in a basically exhaustion-hating language. Most large practical languages will be some mix of exhaustion-loving and exhaustion-hating. We can call them exhaustion-conflicted.

What the SLIF does

The methods that may encounter parse exhaustion are those that read input: read(), resume(), lexeme_complete(), and lexeme_read(). These are also, and not by coincidence, the event-triggering methods. In this document, we will call them the reading methods.

The reading methods need to treat exhaustion-hating and exhaustion-loving languages equally. They also need to handle the exhaustion-conflicted ones smoothly.

In this document, an exhaustion location is a location at which parse exhaustion occurs. The return location is a location at which the current reading method returns. A non-return location is a location that is not a return location.

  • If parse exhaustion occurs at a non-return location, it is considered a failure, because the reading method would go on to attempt to read more input at later locations. The failure is thrown as an exception.

  • If parse exhaustion occurs at a return location, and the return would be successful, parse exhaustion is ignored. The reading methods return success when they trigger a SLIF parse event, or when they are at the end of the current input string.

  • If parse exhaustion occurs at a location where the reading method fails for another reason, the parse exhaustion is considered to be of lesser importance, and the other failure overrides it.

One consequence of this is that the two reading methods for external scannning, lexeme_complete(), and lexeme_read(), always ignore parse exhaustion. This is because these methods read input only at a single location, so that every exhaustion location is a return location.

A second consequence is that in most cases it is not obvious whether parse exhaustion occurred or not. Below I will do a case-by-case analysis, showing that the parse exhaustion is usually

  • not important, or

  • about to become obvious in another way.

Applications can directly query the parse exhaustion status with the exhaustion() method.

Cases

To explain how this behavior works for exhaustion-loving and exhaustion-hating applications, we'll consider the various possibilities.

Exhaustion-loving and successful

If the application is exhaustion-loving and it returns success at an exhaustion location, all should be well. A successful return is usually exactly what is wanted. If an application wants to confirm that exhaustion occurred, it can use the $slr->exhausted() method.

Exhaustion-hating and successful

It may happen that an application is exhaustion-hating but exhausts the parse at a return location. This will usually not be what was wanted, but the return value will give no indication of the problem. Typically, an application will continue processing, and either try to evaluate the parse, or to read more input.

In either case the problem will be reported quickly. If the application tries to evaluate the parse, it will discover there is no parse value. If it tries to read more input, this will fail and be reported as an attempt to read by an exhausted recognizer.

Some applications will be unable to rely on later methods to report the problem, will wish to fast fail, or will want finer control of the error reporting. These applications can check the exhaustion status explicitly.

Failures

If an application returns or throws a failure that occurs at an exhaustion location, but for another reason, the other failure overrides. This is the case regardless of whether the application is exhaustion-loving or exhaustion-hating.

A non-recoverable error makes parse exhaustion irrelevant. If the other error is recoverable, exhaustion status can be checked using the $slr->exhausted() method.

Exhausted-hating and continuing despite exhaustion

If the application is exhaustion-hating, but it is exhausted and the exhaustion location is not a return location, Marpa's SLIF throws a failure. Since the application would ordinarily go on to read more input at a later location, something which is certain to fail, an exception is appropriate.

Exhausted-loving and continuing despite exhaustion

If the application is exhaustion-loving, but would continue reading past the exhaustion location, Marpa throws an exception. Exhaustion-loving usually means that when exhaustion occurs, an application succeeds and is at the end of the parse. For attempts to read past the end of parse, throwing an exception is the right thing to do.

But some exhaustion-loving applications do not know where the end of parse is in advance. As a specialized technique, these applications may try to use parse exhaustion to detect where the parse ends. See below on exhaustion-sensitivity. If it is desirable to use exhaustion to help find the end of parse, the exception may be caught.

Exhaustion-sensitve applications

Sometimes an application, rather than read an entire input, wants to find the longest occurrence starting at some location. (Lexers are typically applications of this kind.) Looking for exhaustion is one way to try to implement this kind of "longest acceptable input stream" search.

Exhaustion-sensitivity is not necessarily the best way, or even a good way, to find the "longest parse". Exhaustion may not happen until after last successful parse -- sometimes not until long after it. Completion parse events may be a cleaner way to deal with this. Applications which do want to use parse exhaustion as part of a strategy for finding the end of parse can catch the exception and check the exhaustion status explicitly.

Copyright and License

  Copyright 2014 Jeffrey Kegler
  This file is part of Marpa::R2.  Marpa::R2 is free software: you can
  redistribute it and/or modify it under the terms of the GNU Lesser
  General Public License as published by the Free Software Foundation,
  either version 3 of the License, or (at your option) any later version.

  Marpa::R2 is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser
  General Public License along with Marpa::R2.  If not, see
  http://www.gnu.org/licenses/.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 271:

=back without =over