The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
#LyX 1.6.4 created this file. For more info see http://www.lyx.org/
\lyxformat 345
\begin_document
\begin_header
\textclass article
\begin_preamble
% Copyright 2012 Jeffrey Kegler
% This document is not part of the Marpa or Marpa::XS source.
% Although it may be included with a Marpa distribution that
% is under an open source license, this document is
% not under that open source license.
% Jeffrey Kegler retains full rights.
\end_preamble
\use_default_options false
\language english
\inputencoding auto
\font_roman default
\font_sans default
\font_typewriter default
\font_default_family default
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100

\graphics default
\paperfontsize 12
\spacing single
\use_hyperref false
\papersize default
\use_geometry true
\use_amsmath 1
\use_esint 1
\cite_engine basic
\use_bibtopic false
\paperorientation portrait
\leftmargin 0.75in
\topmargin 0.75in
\rightmargin 0.75in
\bottommargin 0.75in
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\author "" 
\author "" 
\end_header

\begin_body

\begin_layout Title
Leo 1991 and Aycock-Horspool 2002 Work Together: A Proof
\end_layout

\begin_layout Author
Jeffrey Kegler
\end_layout

\begin_layout Abstract
Joop Leo in a 1991 article modified Earley's algorithm to run in O(n) time
 for LR-regular grammars.
 Aycock and Horspool introduced practical innovations to Earley's in their
 2002 article.
 Earley items were defined differently in the two algorithms.
 Leo used Earley items containing a single dotted rule, which was their
 original form.
 Aycock and Horspool modified Earley's items to contain sets of dotted rules
 which were states of a finite automata (AHFA states).
 Proof is given that Marpa's adaptation of the Leo algorithm to use AHFA
 states is correct.
\end_layout

\begin_layout Abstract
\begin_inset CommandInset toc
LatexCommand tableofcontents

\end_inset


\end_layout

\begin_layout Section
Introduction
\end_layout

\begin_layout Standard
Marpa is a new parser based on Earley's algorithm.
 It includes modifications to Earley's algorithm from Joop Leo's 1991 parser
 and from the parser described by Aycock and Horspool in their 2002 paper.
 This document assumes that the reader is familiar with LR(0) automata,
 Earley's algorithm, Leo's 1991 paper
\begin_inset Foot
status collapsed

\begin_layout Plain Layout
Leo, Joop, "A General Context-Free Parsing Algorithm Running in Linear Time
 on Every LR(k) Grammar Without Using Lookahead", 
\emph on
Theoretical Computer Science
\emph default
, Vol.
 82, No.
 1, 1991, pp 165-176.
 
\end_layout

\end_inset

 and Aycock and Horspool's 2002 paper
\begin_inset Foot
status collapsed

\begin_layout Plain Layout
Aycock, John and Horspool, R.
 Nigel, "Practical Earley Parsing", 
\emph on
The Computer Journal
\emph default
, Vol.
 45, No.
 6, 2002, pp.
 620-630.
\end_layout

\end_inset

.
\end_layout

\begin_layout Standard
In this article the recognizer described in Joop Leo's 1991 paper will be
 called Leo1991.
 The recognizer described in Aycock and Horspool's 2002 paper will be called
 AH2002.
\end_layout

\begin_layout Standard
The major obstacle to combining the two algorithms was the difference in
 the way each algorithm defined an Earley item.
 Leo1991 followed Earley is having one dotted rule per Earley item.
 AH2002 combined dotted rules into states of a finite automaton.
 (That automaton will be called an AHFA in this document.) AHFA states are
 sets of one or more dotted rules.
\end_layout

\begin_layout Standard
When a grammar was right-recursive, Earley's original algorithm needed to
 add a chain of Earley items to every Earley set where the recursion might
 end.
 This chain was a series of completions, culminating in what Leo 1991 calls
 a 
\begin_inset Quotes eld
\end_inset

topmost
\begin_inset Quotes erd
\end_inset

 item.
 In these chains, most of the Earley items were almost completely useless.
 They were used once when creating the chain and never after that.
 Only one chain, the chain that was final Earley set of the right recursion,
 went on to have any of its non-topmost items included in the parse.
\end_layout

\begin_layout Standard
As the recursion lengthened, the chain of useless Earley items lengthened
 with it.
 The effect was that Earley's time complexity was quadratic for right recursion.
\end_layout

\begin_layout Standard
Leo's idea was to create only topmost items until the semantic phase.
 At that point the chain in the final Earley set of the right recursion
 could be expanded, based on the information in the series of topmost items.
 In Marpa, these topmost items are called Leo completions.
\end_layout

\begin_layout Standard
This paper proves that Leo completions are always Aycock-Horspool singletons
 -- Earley items whose AHFA states contain only a single dotted rule.
 AHFA singletons are in effect identical to the conventional Earley items
 used by Leo, because both correspond to a single dotted rule.
\end_layout

\begin_layout Standard
The proof that Leo completions are singletons is the only non-intuitive
 part of the correctness proof, and is the only part described in this document.
 The proof that non-topmost completions in the chains are also always AHFA
 singletons would simply repeat the arguments of the proof for Leo completions.
 It is clear that Leo's correctness proof in his 1991 can be extended to
 Marpa.
\end_layout

\begin_layout Standard
Most of this paper assumes the context is a context-free grammar, 
\begin_inset Formula $G$
\end_inset

.
 The AFHA is assumed to be constructed from 
\begin_inset Formula $G$
\end_inset

.
 This assumption is only made explicit when properties of the grammar itself
 are important.
\end_layout

\begin_layout Standard
In this document, the term 
\series bold
NFA
\series default
 is often used in its inclusive sense, and can mean either a deterministic
 finite automaton (DFA) or a properly non-deterministic finite automaton.
 The AHFA is an proper NFA, while most of the steps in its construction
 are DFA's.
\end_layout

\begin_layout Section
Marpa's Grammar and the AHFA States
\end_layout

\begin_layout Standard
Marpa follows AH2002 in rewriting its grammars.
 Here are some features of this rewrite that are relevant in this document.
\end_layout

\begin_layout Enumerate
Marpa augments its grammars by adding a special start symbol.
\end_layout

\begin_layout Enumerate
Marpa may have a null parse rule.
 If it exists, the null parse rule is an empty rule with Marpa's special
 start symbol on its LHS.
\end_layout

\begin_layout Enumerate
All symbols in Marpa are either nulling or non-nullable.
 A Marpa grammar contains no proper nullable symbols.
\end_layout

\begin_layout Enumerate
There are no empty rules.
 The null parse rule is the only exception to this.
\end_layout

\begin_layout Enumerate
Every rule contains at least one non-nullable symbol.
 The null parse rule is the only exception to this.
\end_layout

\begin_layout Standard
Marpa also follows AH2002 in its special treatment of dotted rules in the
 AHFA states.
 Here are special features of Marpa's dotted rules which are relevant in
 this document.
\end_layout

\begin_layout Enumerate
As previously noted, Earley items contain AHFA states, which represent sets
 of one or more dotted rules.
\end_layout

\begin_layout Enumerate
No dotted rule has a nulling symbol after the dot.
\end_layout

\begin_layout Enumerate
All transitions between AHFA states are over non-nullable symbols.
\end_layout

\begin_layout Enumerate
Any transition over a non-nullable symbol 
\begin_inset Formula $A$
\end_inset

 in an AHFA state also moves the dot past any nulling symbols after the
 symbol 
\begin_inset Formula $A$
\end_inset

.
\end_layout

\begin_layout Enumerate
A prediction is a dotted rule where the dot is before the first 
\series bold
non-nullable
\series default
 symbol.
 As a consequence, the dot in an AHFA's prediction is not necessarily before
 the first symbol of the rule.
\end_layout

\begin_layout Enumerate
A completion always has the dot after the last symbol of the rule.
 In the terminology of Leo 1991, AHFA states contain no quasi-completions.
 
\end_layout

\begin_layout Enumerate
Predictions are never completions.
 This is because rules must contain at least one non-nullable symbol.
 The null parse rule is the only exception to this.
\end_layout

\begin_layout Section
The Strategy of the Proof
\end_layout

\begin_layout Standard
The proof defines 
\begin_inset Quotes eld
\end_inset

Leo conformance
\begin_inset Quotes erd
\end_inset

.
 This is a property of an NFA state.
 A Leo-conformant NFA state either is an AHFA singleton or else it contains
 no candidates for Leo completions.
 If all the states in a NFA are Leo-conformant, clearly all candidates for
 Leo completions in the NFA will be in AHFA singletons.
\end_layout

\begin_layout Standard
AH2002 Earley items use sets of dotted rules in their Earley items, instead
 of individual dotted rules.
 These sets of dotted rules are states of a NFA which Aycock and Horspool
 call a 
\begin_inset Quotes eld
\end_inset

split 
\begin_inset Formula $\epsilon$
\end_inset

-DFA
\begin_inset Quotes erd
\end_inset

.
 In this document, a split 
\begin_inset Formula $\epsilon$
\end_inset

-DFA is called an Aycock-Horspool Finite Automaton (AHFA).
\end_layout

\begin_layout Standard
The proof contains many inductions on the construction of the AHFA.
 The one which establishes the Leo conformity of the AHFA is the 
\series bold
main induction
\series default
.
 AHFA construction starts with the creation of the LR(0) DFA start states
 from the LR(0) NFA.
 As the basis for the induction, the LR(0) DFA start states are shown to
 be Leo-conformant.
 Each step of the construction after that is a step of the induction.
 It is shown that all new NFA states are Leo-conformant, and that all transforme
d NFA states preserve Leo conformance.
\end_layout

\begin_layout Standard
Most of the work of the proof is done in a series of theorems which preceed
 the main induction.
 Most of these prove 
\begin_inset Quotes eld
\end_inset

conservation
\begin_inset Quotes erd
\end_inset

 properties to be used in the induction.
 Intuitively, a property or object is conserved if, once it comes into existence
, it never changes.
 The conservation properties depend on each other and must be established
 in order.
 They often use induction on the construction on the AHFA.
 
\end_layout

\begin_layout Standard
Key to the proof will be the idea of 
\begin_inset Quotes eld
\end_inset

shadowing
\begin_inset Quotes erd
\end_inset

.
 It is possible for completions to be bound together in the same AFHA state,
 and this would appear to be an obstacle, perhaps a fatal one, to using
 Leo1991 with AHFA states.
 But Leo1991's additional logic is only applied when the postdot symbol
 in the completion's parent (or predecessor) Earley set is unique.
 The concept of shadowing converts this uniqueness requirement into a property
 of dotted rule instances.
 In the form of shadowing, the uniqueness requirement of Leo1991 can be
 applied in an induction on the steps in the construction of a DFA.
\end_layout

\begin_layout Standard
Leo 1991 had its own reasons for requiring that postdot symbols be unique
 -- it enabled Leo's algorithm to have linear time-complexity for LR-regular
 grammars.
 But the result is very fortunate for adapting Leo1991 to AHFA states.
 It is an interesting question whether this is simply a happy coincidence
 or the consequence of a deep property.
\end_layout

\begin_layout Section
Definitions
\end_layout

\begin_layout Subsection
Dotted Rules and Instances
\end_layout

\begin_layout Standard
A 
\series bold
dotted rule
\series default
 is a duple 
\begin_inset Formula $(rule,dot)$
\end_inset

, where 
\begin_inset Formula $rule$
\end_inset

 is a rule of 
\begin_inset Formula $G$
\end_inset

 and 
\begin_inset Formula $dot$
\end_inset

 is a number from 0 to 
\begin_inset Formula $n$
\end_inset

, where 
\begin_inset Formula $n$
\end_inset

 is the number of symbols in 
\begin_inset Formula $rule$
\end_inset

's right hand side.
 A dotted rule is more traditionally represented by showing the rule as
 a production with a raised dot to indicate the dot position: 
\begin_inset Formula $A\rightarrow\alpha\bullet\beta$
\end_inset

.
\end_layout

\begin_layout Standard
A 
\series bold
dotted rule instance
\series default
 is a triple 
\begin_inset Formula $(rule,dot,id)$
\end_inset

.
 
\begin_inset Formula $rule$
\end_inset

 and 
\begin_inset Formula $dot$
\end_inset

 are as in the definition of a dotted rule.
 
\begin_inset Formula $id$
\end_inset

 is an unique identifier.
 More formally, an 
\begin_inset Formula $id$
\end_inset

 is assigned to a dotted rule instance when it is created, such that 
\begin_inset Formula $id$
\end_inset

 will never be equal to the 
\begin_inset Formula $id$
\end_inset

 of any other dotted rule instance.
 Where the meaning is clear, a dotted rule instance is called a 
\series bold
rule instance
\series default
 or an 
\series bold
instance.
\end_layout

\begin_layout Standard
NFA states are defined as sets of dotted rule instances.
 Within an NFA state there is never more than one dotted rule instance for
 any dotted rule.
 But an NFA may contain different instances of the same dotted rule.
 Without the use of instances, confusion can arise when dealing with the
 movement of an element from one NFA state into another NFA state.
\end_layout

\begin_layout Standard
We say that a dotted rule instance is 
\series bold
in an Earley
\series default
 
\series bold
item
\series default
, if that dotted rule instance is an element of the AHFA state in that Earley
 item.
 We say that a dotted rule instance is 
\series bold
in an Earley set
\series default
, if that dotted rule instance is in any Earley item in that Earley set.
\end_layout

\begin_layout Standard
The reader may wish to note that even dotted rule instances are not necessarily
 unique within an Earley set.
 The same AHFA state can occur in multiple Earley items in the same Earley
 set.
 This does not seem to present any problem for the proofs in this document,
 but it was a complication that needed to be watched.
\end_layout

\begin_layout Standard
A s
\series bold
ingleton
\series default
 is an NFA state that contains only one dotted rule instance.
 A 
\series bold
completion
\series default
 is a dotted rule instance with its dot position after its last symbol.
 A 
\series bold
singleton completion
\series default
 is an NFA state whose only element is a completion.
\end_layout

\begin_layout Standard
A 
\series bold
prediction
\series default
 is a dotted rule instance with its dot position before the first non-nulling
 symbol of its rule.
 An 
\series bold
initial
\series default
 instance is a prediction of a start rule.
\end_layout

\begin_layout Standard
The 
\series bold
source
\series default
 of a dotted rule instance is another dotted rule instance that was used
 in its creation.
 Initial instances do not have sources.
 If the instance is a non-prediction, the 
\series bold
source
\series default
 is also called a predecessor.
\end_layout

\begin_layout Subsection
Location Mappings
\end_layout

\begin_layout Standard
Location mappings are properties of dotted rule instances and NFA states.
 Let 
\begin_inset Formula $w$
\end_inset

 be a string in the symbol vocabulary of the current grammar: 
\begin_inset Formula $w\in V^{*}$
\end_inset

.
 Let 
\begin_inset Formula $G$
\end_inset

 be the current grammar.
 Let 
\begin_inset Formula $V$
\end_inset

 be the symbol vocabulary of 
\begin_inset Formula $G$
\end_inset

.
 The location mapping for an AHFA state 
\begin_inset Formula $s$
\end_inset

 is 
\begin_inset Formula $Locate_{s}(w)$
\end_inset

, where 
\begin_inset Formula $Locate_{s}(w)$
\end_inset

 is the set of Earley sets such that 
\begin_inset Formula $s$
\end_inset

 is the AFHA state of some Earley item in the Earley set.
 The location mapping for a dotted rule instance 
\begin_inset Formula $d$
\end_inset

 is 
\begin_inset Formula $Locate_{d}(w)$
\end_inset

, where 
\begin_inset Formula $Locate_{d}(w)$
\end_inset

 is the 
\begin_inset Formula $Locate_{s}(w)$
\end_inset

 such that 
\begin_inset Formula $d\in s$
\end_inset

.
 Note that 
\begin_inset Formula $w$
\end_inset

 is not necessarily in the language of 
\begin_inset Formula $G$
\end_inset

, 
\begin_inset Formula $L(G)$
\end_inset

, that the parse is not necessarily successful, and that the set of Earley
 sets may be infinite.
\end_layout

\begin_layout Standard
Intuitively, the location mapping describes the locations where dotted rule
 instances are found when Marpa parses 
\begin_inset Formula $w$
\end_inset

 according to the AHFA generated from 
\begin_inset Formula $G$
\end_inset

.
 There may be more than one Earley set occurrence of the same dotted rule
 instance, but 
\begin_inset Formula $Locate_{d}(w)$
\end_inset

 is not defined as a multiset.
 Even if the same dotted rule instance 
\begin_inset Formula $d$
\end_inset

 is present in more than Earley item in an Earley set when string 
\begin_inset Formula $w$
\end_inset

 is parsed, that Earley set is only a single element in the range of the
 mapping 
\begin_inset Formula $Locate_{d}(w)$
\end_inset

.
\end_layout

\begin_layout Subsection
Ties
\end_layout

\begin_layout Standard
An Earley set that contains an AHFA state also contains all the dotted rule
 instances in that AHFA state.
 For this reason, dotted rule instances in an AHFA state have the same location
 mapping as that AHFA state.
\end_layout

\begin_layout Standard
Two dotted rule instances are 
\series bold
tied
\series default
 if they have the same location mapping.
 From the above, it is clear that all pairs of dotted rule instances in
 an AHFA state are tied.
\end_layout

\begin_layout Standard
We say that an NFA state is 
\series bold
tied
\series default
 if all the pairs of dotted rule instances in it are tied.
 AHFA states are 
\series bold
tied
\series default
.
 It will be proved (section 
\begin_inset CommandInset ref
LatexCommand vref
reference "NFA States Tied"

\end_inset

) that all NFA states arising in the consruction of an AHFA are tied.
\end_layout

\begin_layout Standard
Two AHFA states are
\series bold
 tied together
\series default
 if they have the same location mapping.
 The tying together of AHFA states will become relevant when dealing with
 the NFA State Splitting operation (section 
\begin_inset CommandInset ref
LatexCommand vref
reference "NFA-State-Splitting"

\end_inset

).
 (The result of NFA State Splitting is always an AHFA state.)
\end_layout

\begin_layout Standard

\series bold
Theorem
\series default
: 
\begin_inset CommandInset label
LatexCommand label
name "States Tied Together Theorem"

\end_inset

If two AHFA states are tied togther, every pair of dotted rule instances
 from either state is tied.
 
\series bold
Proof
\series default
: Because they are tied together, the two AHFA states have the same location
 mapping, call it 
\begin_inset Formula $m$
\end_inset

.
 Because they are AHFA states, all dotted rule instances in both states
 also have mapping 
\begin_inset Formula $m$
\end_inset

.
 Therefore all pairs of these are tied.
 QED.
\end_layout

\begin_layout Subsection
Relevant Ties
\end_layout

\begin_layout Standard
By definition, dotted rule instances are tied if they have the same location
 mapping.
 This property is conserved during AHFA construction, but that proof would
 be complex because a coincidence of location mappings can arise in many
 ways.
 A weaker theorem, which restricts itself to relevant ties, is much easier
 to prove and is all that is necessary to prove the main results in this
 document.
\end_layout

\begin_layout Standard
A tie between dotted rule instances is 
\series bold
relevant
\series default
 if it is between two dotted rule instances in the same NFA state, or if
 it is between dotted rule instances in two AHFA states which are tied together.
 This definition does not require that a relevant tie be a tie, but that
 will be proved in section 
\begin_inset CommandInset ref
LatexCommand vref
reference "Relevant Ties=Ties"

\end_inset

.
\end_layout

\begin_layout Subsection
Shadowing
\end_layout

\begin_layout Standard
The predecessor of a dotted rule instance may be 
\begin_inset Quotes eld
\end_inset

shadowed
\begin_inset Quotes erd
\end_inset

.
 This definition is key to the proof.
 A predecessor p is shadowed by another dotted rule instance 
\begin_inset Formula $q$
\end_inset

, if
\end_layout

\begin_layout Itemize
\begin_inset Formula $q$
\end_inset

 is relevantly tied to 
\begin_inset Formula $p$
\end_inset

,
\end_layout

\begin_layout Itemize
\begin_inset Formula $p\neq q$
\end_inset

, and
\end_layout

\begin_layout Itemize
\begin_inset Formula $p$
\end_inset

 and 
\begin_inset Formula $q$
\end_inset

 have the same postdot symbol.
 
\end_layout

\begin_layout Standard
Where 
\begin_inset Formula $q$
\end_inset

 is not important, we simply say that 
\begin_inset Formula $p$
\end_inset

 is shadowed.
 Intuitively, shadowing captures the notion that, whenever a dotted rule
 instance with 
\begin_inset Formula $p$
\end_inset

 as its predecessor occurs in the recognizer, Leo1991's uniqueness requirement
 fails to be met.
\end_layout

\begin_layout Subsection
Candidates and Rejects
\end_layout

\begin_layout Standard
As a reminder, a 
\series bold
Leo completion
\series default
 is one of Earley items that replace chains of Earley items when the methods
 of Leo1991 are used.
 The Leo completion is a stand-in for the entire chain, and especially for
 its topmost item.
\end_layout

\begin_layout Standard
A dotted rule instance is a 
\series bold
candidate for Leo completion
\series default
, if it is a completion with an unshadowed predecessor.
 A candidate for Leo completion is usually called just a 
\series bold
candidate
\series default
, or sometimes 
\series bold
Leo candidate
\series default
.
 An NFA state is a 
\series bold
candidate
\series default
 if any of its dotted rule instances is a candidate.
 Intuitively, a candidate is an dotted rule instance or an NFA state which
 could be included in a Leo completion.
\end_layout

\begin_layout Standard
A dotted rule instance is a 
\series bold
reject
\series default
 if it is not a candidate.
 An NFA state is a 
\series bold
reject
\series default
 if all of its dotted rule instances are rejects.
 Where confusion might arise, rejects are called Leo rejects.
\end_layout

\begin_layout Subsection
Leo Conformance
\end_layout

\begin_layout Standard

\series bold
Leo conformance
\series default
 is a property of an NFA state.
 A state of a NFA may be 
\series bold
Leo-conformant
\series default
 in one of two ways.
 An NFA state is Leo-conformant if it is a singleton completion.
 A state of an NFA is also Leo-conformant if it is a Leo reject.
 An NFA is 
\series bold
Leo-conformant
\series default
 if all of its states are Leo-conformant.
\end_layout

\begin_layout Subsection
Conservation
\end_layout

\begin_layout Standard
The term 
\series bold
conservation
\series default
 will be used for a series of related concepts in this document.
 The concept will be formalized differently in each case, but the cases
 share a common intuition.
 A property is 
\series bold
conserved
\series default
 if the AHFA construction never changes it.
 An object is 
\series bold
conserved
\series default
 if, once it comes into existence, it never disappears.
\end_layout

\begin_layout Section
Operations in Constructing the AHFA
\end_layout

\begin_layout Standard
In the construction of the AHFA, three operations on NFA states occur: creation,
 expansion, and splitting.
 The pre-operation NFA state which provides data is called an 
\series bold
operand
\series default
.
 There will be only one operand, which may or may not be changed by the
 operation.
 The NFA states which are potentially changed or which are created by an
 operation are its 
\series bold
results
\series default
.
 Each operation has one or two results.
\end_layout

\begin_layout Subsection
NFA State Creation
\end_layout

\begin_layout Standard
When a NFA state is created, the operand is an existing NFA state, 
\begin_inset Formula $s$
\end_inset

, which is not changed.
 A symbol transition from 
\begin_inset Formula $s$
\end_inset

 is used to created dotted rule instances to put into the result, 
\begin_inset Formula $s'$
\end_inset

.
 
\begin_inset Formula $s'$
\end_inset

 is created during the operation.
 All dotted rule instances in 
\begin_inset Formula $s'$
\end_inset

 will be non-predictions, and will have predecessors in 
\begin_inset Formula $s$
\end_inset

.
\end_layout

\begin_layout Subsection
NFA State Expansion
\end_layout

\begin_layout Standard
When a NFA state is created, the operand is an existing NFA state, 
\begin_inset Formula $s$
\end_inset

.
 
\begin_inset Formula $s$
\end_inset

 is also the result.
 NFA state expansion adds a new dotted rule instance, 
\begin_inset Formula $D$
\end_inset

, to 
\begin_inset Formula $s$
\end_inset

.
 
\begin_inset Formula $D$
\end_inset

 is a non-initial prediction.
 
\begin_inset Formula $D$
\end_inset

 will have as its source a dotted rule instance in 
\begin_inset Formula $s$
\end_inset

.
\end_layout

\begin_layout Subsection
\begin_inset CommandInset label
LatexCommand label
name "NFA-State-Splitting"

\end_inset

NFA State Splitting
\end_layout

\begin_layout Standard
When a NFA state is split, the operand is an existing NFA state, 
\begin_inset Formula $k$
\end_inset

.
 The result of the split will be 
\begin_inset Formula $k$
\end_inset

 and optionally, a new state 
\begin_inset Formula $nk$
\end_inset

.
 The result state 
\begin_inset Formula $k$
\end_inset

 is called the kernel state.
 
\begin_inset Formula $nk$
\end_inset

 is the non-kernel state.
\end_layout

\begin_layout Standard
Every non-initial prediction instance, 
\begin_inset Formula $D$
\end_inset

, is moved from 
\begin_inset Formula $s$
\end_inset

 into 
\begin_inset Formula $nk$
\end_inset

.
 
\begin_inset Formula $D$
\end_inset

 is not changed in the move.
 Since by definition 
\begin_inset Formula $D$
\end_inset

 is the triple 
\begin_inset Formula $(rule,dot,id)$
\end_inset

, 
\begin_inset Formula $rule$
\end_inset

, 
\begin_inset Formula $dot$
\end_inset

, and 
\begin_inset Formula $id$
\end_inset

 are also not changed.
 Because there will always be either an initial rule instance or a non-predictio
n instance in 
\begin_inset Formula $s$
\end_inset

, an NFA State Splitting operation always leaves 
\begin_inset Formula $k$
\end_inset

 with at least one dotted rule instance.
\end_layout

\begin_layout Standard
By the definitions of the Marpa and AH2002 algorithms, for every parse,
 whenever 
\begin_inset Formula $k$
\end_inset

 occurs in an Earley set, 
\begin_inset Formula $nk$
\end_inset

 occurs along with it.
 In the terminology of this document, 
\begin_inset Formula $k$
\end_inset

 and 
\begin_inset Formula $nk$
\end_inset

 are tied together.
\end_layout

\begin_layout Standard
NFA State Splitting is always the last operation on any operand.
 This means that its result states are AHFA states, and are never the operands
 of any operation.
\end_layout

\begin_layout Section
Instances are Conserved
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
:
\end_layout

\begin_layout Enumerate
Dotted rule instances are not deleted once created.
\end_layout

\begin_layout Enumerate
Where a dotted rule instance is the triple 
\begin_inset Formula $(rule,dot,id)$
\end_inset

, 
\begin_inset Formula $rule$
\end_inset

, 
\begin_inset Formula $dot$
\end_inset

, and 
\begin_inset Formula $id$
\end_inset

 never change.
\end_layout

\begin_layout Enumerate
For the properties of being a completion rule instance, being a prediction
 rule instance, and their negations, once a dotted rule instance has that
 property, it keeps that property.
\end_layout

\begin_layout Enumerate
For the properties of being an initial rule instance and its negation, once
 a dotted rule instance has that property, it keeps that property.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: 1.
 By the definitions of NFA State Creation, NFA State Expansion and NFA State
 Splitting, no dotted rule instance is ever deleted.
 So by induction on the AHFA construction, no dotted rule instance is ever
 deleted.
 This establishes part 1 of the theorem.
\end_layout

\begin_layout Standard
2.
 A dotted rule instance 
\begin_inset Formula $D$
\end_inset

 is the triple 
\begin_inset Formula $(rule,dot,id)$
\end_inset

 as a matter of definition.
 From part 1, we know that 
\begin_inset Formula $D$
\end_inset

 is not deleted.
 This establishes part 2 of the theorem.
\end_layout

\begin_layout Standard
3.
 The properties listed are all consequences of rule and dot position.
 From part 2, we know that rule and dot position never change.
 Therefore the properties listed cannot change.
 This establishes part 3 of the theorem.
\end_layout

\begin_layout Standard
4.
 An initial rule is a prediction whose rule is a start rule.
 Therefore being an initial rule instance, and its negation, are consequences
 of the rule and the dot position.
 From part 2, we know that rule and dot position never change.
 Therefore the properties of being initial and of being non-initial cannot
 change.
 This establishes part 4 of the theorem.
 QED.
\end_layout

\begin_layout Section
Sources are Conserved
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: Let 
\begin_inset Formula $D$
\end_inset

 be a dotted rule instance and 
\begin_inset Formula $S=(srule,sdot,sstate)$
\end_inset

 its source.
 If 
\begin_inset Formula $S$
\end_inset

 ever exists, 
\begin_inset Formula $srule$
\end_inset

 and 
\begin_inset Formula $sdot$
\end_inset

 never change.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: When a non-inital rule instance is added, it has a source.
 By the Instance Conservation Theorem, the source will continue to exist
 and its rule and dot position will never change.
 QED.
\end_layout

\begin_layout Section
Singleton Completions are Conserved
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: Once an NFA state becomes a singleton completion, it will remain one for
 the entire AHFA construction.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: By the Instance Conservation Theorem, completions are never deleted and
 never lose the property of being a completion.
 It remains to show that singletons are conserved when their only dotted
 rule instance is a completion.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Creation
\emph default
: By its definition, NFA State Creation does not add instances to any existing
 NFA states.
 This establishes the case for NFA State Creation.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Expansion
\emph default
: The only NFA state to which NFA State Expansion adds instances is its
 operand.
 However, the operand of an NFA State Expansion will never be a singleton
 completion.
 This is because the added rule instances are predictions, and in Earley's
 algorithm these are made using postdot symbols.
 A singleton completion has only one rule instance, a completion.
 Completions have no postdot symbols.
 Therefore, NFA State Expansion will never affect a singleton completion.
 This establishes the case for NFA State Expansion.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Splitting
\emph default
: The dotted rule instance in a singleton completion is either an initial
 instance (the null parse rule instance) or a non-prediction.
 In either case, it remains in the kernel state.
 No dotted rule instances are ever added to the operand when it becomes
 the kernel state, so the operand of NFA State Splitting remains a singleton.
 Since it remains a singleton and its sole dotted rule instance remains
 a completion, the operand remains a singleton completion when it becomes
 the kernel state.
 This established the case for NFA State Splitting.
\end_layout

\begin_layout Standard

\emph on
Concluding the Proof
\emph default
: We have shown that all of the operations in an AHFA construction preserve
 singleton completions.
 By induction on the AHFA construction, singleton completions, once created,
 are preserved.
 QED.
\end_layout

\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "NFA States Tied"

\end_inset

All NFA States are Tied
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: At every point in an AHFA construction, every NFA state is a tied state.
\end_layout

\begin_layout Standard

\emph on
Starting the Proof: 
\emph default
The proof that the NFA states in the AHFA Construction are tied proceeds
 by induction on its construction.
 The induction proceeds in reverse from the usual order.
 The basis of the induction is the final product, the AHFA states.
 The steps of the induction will be toward the initial states.
 We show the steps of the induction by cases, one for each operation.
\end_layout

\begin_layout Standard

\emph on
Basis of the
\emph default
 
\emph on
Induction
\emph default
: It was mentioned above that AHFA states are tied.
 That followed immediately from the definitions of the Marpa and AH2002
 algorithms: The Earley items for these algorithms contain AHFA states,
 so that all the dotted rule instances in an AHFA state must always occur
 together in an Earley set.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Splitting
\emph default
: NFA State Splitting is always the last operation in the construction of
 the NFA states involved.
 The one or two NFA states which result (
\begin_inset Formula $k$
\end_inset

 and 
\begin_inset Formula $nk$
\end_inset

) are AHFA states.
 As AHFA states, 
\begin_inset Formula $k$
\end_inset

 and 
\begin_inset Formula $nk$
\end_inset

 are tied.
 We need to show that the operand is tied.
 Call this operand, 
\begin_inset Formula $s$
\end_inset

.
\end_layout

\begin_layout Standard
By the definition of NFA State Splitting, 
\begin_inset Formula $k$
\end_inset

 and 
\begin_inset Formula $nk$
\end_inset

 are not only tied states, they are tied together.
 This means that states 
\begin_inset Formula $k$
\end_inset

 and 
\begin_inset Formula $nk$
\end_inset

 have the same location mapping.
 Since they are AHFA states, all the dotted rule instances in 
\begin_inset Formula $k$
\end_inset

 and 
\begin_inset Formula $nk$
\end_inset

 have the same location mapping as their AHFA states, and as each other.
 The dotted rule instances in 
\begin_inset Formula $s$
\end_inset

 are exactly the union of those in 
\begin_inset Formula $k$
\end_inset

 and 
\begin_inset Formula $nk$
\end_inset

.
 Therefore all the dotted rule instances in 
\begin_inset Formula $s$
\end_inset

 have the same location mapping.
 This means that all pairs of dotted rule instances in 
\begin_inset Formula $s$
\end_inset

 are tied.
 So, by the definition of a tied state, 
\begin_inset Formula $s$
\end_inset

 is tied.
 This establishes the case for NFA State Splitting.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Expansion
\emph default
: We assume, from the reverse induction, that 
\begin_inset Formula $s'$
\end_inset

, the result of NFA State Expansion, is tied.
 We seek to prove that 
\begin_inset Formula $s$
\end_inset

, the operand of NFA State Expansion, is tied.
\end_layout

\begin_layout Standard
Since 
\begin_inset Formula $s'$
\end_inset

 is tied, all pairs of dotted rule instances in 
\begin_inset Formula $s'$
\end_inset

 are tied.
 The dotted rule instances in 
\begin_inset Formula $s$
\end_inset

 are a subset of those in 
\begin_inset Formula $s'$
\end_inset

.
 Therefore, all pairs of dotted rule instances in 
\begin_inset Formula $s$
\end_inset

 are tied.
 From the definition of a tied state, this means that 
\begin_inset Formula $s$
\end_inset

 is tied.
 This establishes the case for NFA State Expansion.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Creation
\emph default
: Because this operation does not alter its operand, this case is true trivially.
\end_layout

\begin_layout Standard

\emph on
Concluding the Proof
\emph default
: We have show that the final product of AHFA construction, the AFHA states,
 are tied.
 We have shown that for every operation in the construction, if the NFA
 states which result are tied, then the operands are tied.
 This completes the induction and shows that all NFA states in an AHFA construct
ion are tied.
 QED.
\end_layout

\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "Relevant Ties=Ties"

\end_inset

Relevant Ties are Ties
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: All relevant ties are ties.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: The definition of a relevant tie has two cases: relevant ties within an
 NFA state and relevant ties between NFA states.
 The proof is by these cases.
\end_layout

\begin_layout Standard

\emph on
Case for Within
\emph default
: By a previous theorem (section
\begin_inset CommandInset ref
LatexCommand vref
reference "NFA States Tied"

\end_inset

), all NFA states are tied.
 By the definition of a tied NFA state, every pair of dotted rule instances
 which are elements of the same NFA state is tied.
 This establishes the case for relevant ties between dotted rule instances
 within a single NFA state.
\end_layout

\begin_layout Standard

\emph on
Case for Between
\emph default
: By another previous theorem (section
\begin_inset CommandInset ref
LatexCommand vref
reference "States Tied Together Theorem"

\end_inset

) all dotted rule instances are tied if they are in two AHFA states which
 are tied together.
 This establishes the case for relevant ties between dotted rule instances
 in NFA states which are distinct, but tied together.
\end_layout

\begin_layout Standard

\emph on
Concluding the Proof
\emph default
: Since in both cases, every pair of dotted rule instances which is relevantly
 tied is also tied, all relevant ties are ties.
 QED.
\end_layout

\begin_layout Section
Relevant Ties are Conserved
\end_layout

\begin_layout Subsection*
Lemma
\end_layout

\begin_layout Standard

\emph on
Statement of the Lemma
\emph default
: The operand of an operation is never an AHFA state tied to another state.
\end_layout

\begin_layout Standard

\emph on
Proof of the Lemma
\emph default
: An NFA state tied to another state is always the result of NFA State Splitting.
 NFA State Splitting is always the the last operation on any NFA state.
 Therefore, the results of NFA State Splitting never become the operand
 of any operation, including another NFA State Splitting operation.
 QED.
\end_layout

\begin_layout Subsection*
Theorem: Relevant Ties are Conserved
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: Once two dotted rule instances become relevantly tied in the course of
 AHFA construction, they remain relevantly tied.
\end_layout

\begin_layout Standard

\emph on
Starting the Proof
\emph default
: This proof is by induction on the construction of the AHFA.
\end_layout

\begin_layout Standard

\emph on
Basis of the Induction: 
\emph default
The basis of the induction is the first step of the AHFA construction: the
 initial states of the LR(0) DFA construction.
 The basis is vacuously true, since prior to the initial states are no dotted
 rule instances, no ties between them and therefore no relevant ties between
 them.
\end_layout

\begin_layout Standard

\emph on
Inductive Step
\emph default
:
\end_layout

\begin_layout Standard
We first note that it is vacuously true that all relevant ties between dotted
 rule instances in two distinct NFA states are preserved by the induction
 steps.
 This is because a relevant tie between dotted rule instances in distinct
 NFA states must be due to one NFA state being tied to another.
 By the previous Lemma, an NFA state tied to another NFA state is never
 the operand for any operation.
\end_layout

\begin_layout Standard
It remains to show that relevant ties between dotted rule instances in a
 single NFA state are preserved.
 This is shown by cases, one for each operation.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Creation
\emph default
: NFA State Creation does not change the dotted rule instances in any existing
 NFA states, including its operand.
 So all pre-existing relevant ties will be preserved.
 This established the case for NFA State Creation.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Expansion
\emph default
: The only NFA state whose dotted rule instances are changed by NFA State
 Expansion is its operand.
 All relevant ties not in the operand will be preserved.
\end_layout

\begin_layout Standard
All pairs of dotted rule instances in the operand are preserved in the result.
 The result is a single NFA state, so all such instance pairs are relevantly
 tied.
 This means that any relevant ties pre-existing the operation are preserved.
 This establishes the case for NFA State Expansion.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Splitting
\emph default
: The only NFA state whose dotted rule instances are changed by NFA State
 Splitting is its operand, so it is only necessary to show that any relevant
 ties between dotted rule instances in the operand are preserved.
\end_layout

\begin_layout Standard
Call the two result states, 
\begin_inset Formula $k$
\end_inset

 and 
\begin_inset Formula $nk$
\end_inset

.
 Consider two arbitrary dotted rule instances from the operand, 
\begin_inset Formula $D1$
\end_inset

 and 
\begin_inset Formula $D2$
\end_inset

.
\end_layout

\begin_layout Standard
We need to deal with two subcases.
 First, 
\begin_inset Formula $D1$
\end_inset

 and 
\begin_inset Formula $D2$
\end_inset

 might be in the same AHFA state.
 Second, they might be in different AHFA states.
\end_layout

\begin_layout Standard

\emph on
First subcase
\emph default
: If 
\begin_inset Formula $D1$
\end_inset

 and 
\begin_inset Formula $D2$
\end_inset

 are in the same AHFA state, they are relevantly tied, so that any relevant
 tie they had previously is preserved.
 This establishes the first subcase.
\end_layout

\begin_layout Standard

\emph on
Second subcase
\emph default
: Consider the other subcase, where 
\begin_inset Formula $D1$
\end_inset

 and 
\begin_inset Formula $D2$
\end_inset

 are in different AHFA states.
 Without loss of generality, put 
\begin_inset Formula $D1$
\end_inset

 into 
\begin_inset Formula $n$
\end_inset

 and 
\begin_inset Formula $D2$
\end_inset

 into 
\begin_inset Formula $nk$
\end_inset

.
 By the definition of NFA State Splitting, 
\begin_inset Formula $n$
\end_inset

 and 
\begin_inset Formula $nk$
\end_inset

 are tied together.
 Therefore 
\begin_inset Formula $D1$
\end_inset

 and 
\begin_inset Formula $D2$
\end_inset

 are relevantly tied, and any relevant tie they had previously is preserved.
 This establishes the second subcase.
\end_layout

\begin_layout Standard

\emph on
Concluding the case
\emph default
: Since the choice of 
\begin_inset Formula $D1$
\end_inset

 and 
\begin_inset Formula $D2$
\end_inset

 was arbitrary, every pair of dotted rule instances from either 
\begin_inset Formula $k$
\end_inset

 or 
\begin_inset Formula $nk$
\end_inset

 is relevantly tied.
 Therefore any relevant tie which existed previously is conserved.
 This establishes the case for NFA State Splitting.
\end_layout

\begin_layout Standard

\emph on
Concluding the proof
\emph default
: We have shown that relevant ties are vacuously preserved when they are
 due to NFA states tied together.
 We have shown that relevant ties of dotted rule instances in a single NFA
 state are preserved in the basis of the induction and in its steps.
 This establishes the theorem.
\end_layout

\begin_layout Standard
QED.
\end_layout

\begin_layout Section
Shadowing is Conserved
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: If a dotted rule instance becomes shadowed during AHFA construction, it
 remains shadowed.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: By the definition of shadowing, it is based on existence of two dotted
 rule instances, their postdot symbols and the relevant tie between them.
 The postdot symbol is a function of the rule and dot position in a dotted
 rule instance and is therefore conserved.
 Since the existence of dotted rule instances, their postdot symbols, and
 relevant ties between them are conserved, shadowing is conserved.
 QED.
\end_layout

\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "Reject Instances Conserved"

\end_inset

Reject Instances are Conserved
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: If a dotted rule instance becomes a Leo reject instance during AHFA constructi
on, it remains a Leo reject instance.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: A dotted rule instance may be a Leo reject because it is a non-completion
 or because its predecessor is shadowed.
 Dotted rule instances, predecessor relationships, shadowing and non-completion
 are all conserved objects or properties.
 Therefore, Leo reject instances are conserved.
 QED
\end_layout

\begin_layout Section
Reject States are Conserved
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: If an NFA state becomes a Leo reject state during AHFA construction, it
 remains a Leo reject state.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: A Leo reject state is one all of whose dotted rule instances are Leo rejects.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State
\emph default
 
\emph on
Creation
\emph default
: NFA State Creation does not affect existing states, so it conserves all
 Leo reject states.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Expansion
\emph default
: NFA State Expansion adds non-initial predictions to existing states.
 Non-initial predictions are always non-completions and therefore Leo rejects.
 Adding a Leo reject instance to a Leo reject NFA state results in a Leo
 reject NFA state.
\end_layout

\begin_layout Standard

\emph on
Case for NFA State Splitting
\emph default
: NFA State Splitting produces two new states from an existing one.
 Each of these will contain only dotted rule instances from the original
 state.
 If the operand was a Leo reject, then it contained only Leo reject instances.
 It was proved in section 
\begin_inset CommandInset ref
LatexCommand vref
reference "Reject Instances Conserved"

\end_inset

 that Leo reject instances are conserved, including by NFA State Splitting.
 Therefore both new NFA states (kernel and non-kernel) will contain only
 Leo reject instances.
 And therefore, both the kernel and non-kernel NFA states will be Leo rejects.
\end_layout

\begin_layout Standard

\emph on
Concluding the Proof
\emph default
: Since all the operations conserve Leo reject NFA states, Leo reject NFA
 states are conserved.
 QED.
\end_layout

\begin_layout Standard

\emph on
Note
\emph default
: Leo candidates are 
\series bold
not
\series default
 conserved.
\end_layout

\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Leo-Conformance-is"

\end_inset

Leo Conformance is Conserved
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: If an NFA state becomes Leo conformant during AHFA construction, it remains
 Leo conformant.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: An NFA state is Leo-conformant either because it is singleton completion,
 or because it is a Leo reject state.
 As we have shown, both these properties are conserved.
 Therefore, once an NFA state is Leo-conformant, it remains Leo-conformant.
 QED.
\end_layout

\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:NFA-State-Creation"

\end_inset

NFA State Creation is Leo-conformant
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: The NFA states produced by the NFA State Creation operation are Leo-conforming.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: NFA State Creation is by transition over a symbol 
\begin_inset Formula $T$
\end_inset

 from its operand, 
\begin_inset Formula $s$
\end_inset

.
 The operand is also the source of the transition.
 Call the new NFA state 
\begin_inset Formula $r$
\end_inset

.
 We distinguish three cases.
\end_layout

\begin_layout Standard

\series medium
\emph on
Singleton Completion Case
\emph default
:
\series default
 
\begin_inset Formula $r$
\end_inset

 contains a single dotted rule instance, and that instance is a completion.
 Then 
\begin_inset Formula $r$
\end_inset

 is a singleton completion, and therefore Leo-conforming.
 This establishes the singleton completion case.
\end_layout

\begin_layout Standard

\emph on
Singleton Non-completion Case
\emph default
: 
\begin_inset Formula $r$
\end_inset

 contains a single dotted rule instance, 
\begin_inset Formula $D$
\end_inset

, and 
\begin_inset Formula $D$
\end_inset

 is a non-completion.
 Therefore, 
\begin_inset Formula $D$
\end_inset

 is a Leo reject.
 Since 
\begin_inset Formula $D$
\end_inset

 is the only dotted rule instance in 
\begin_inset Formula $r$
\end_inset

, all the rule instances in 
\begin_inset Formula $r$
\end_inset

 are Leo reject instances, and 
\begin_inset Formula $r$
\end_inset

 is a Leo reject.
 As a Leo reject, 
\begin_inset Formula $r$
\end_inset

 is Leo conforming.
 This establishes the singleton non-completion case.
\end_layout

\begin_layout Standard

\emph on
Non
\emph default
-
\emph on
singleton Case
\emph default
: There is more than one dotted rule instance in 
\begin_inset Formula $r$
\end_inset

.
 These dotted rule instances were created by transition over a single symbol,
 
\begin_inset Formula $T$
\end_inset

.
 Call one of the dotted rule instances in 
\begin_inset Formula $r$
\end_inset

, 
\begin_inset Formula $D1$
\end_inset

.
 Call another dotted rule instance in 
\begin_inset Formula $r$
\end_inset

, 
\begin_inset Formula $D2$
\end_inset

.
 We are allowed to assume 
\begin_inset Formula $D1\neq D2$
\end_inset

, because the assumption for this case is that there is more than one dotted
 rule instance in 
\begin_inset Formula $r$
\end_inset

.
 Call the predecessor of 
\begin_inset Formula $D1$
\end_inset

, 
\begin_inset Formula $P1$
\end_inset

.
 Call the predecessor of 
\begin_inset Formula $D2$
\end_inset

, 
\begin_inset Formula $P2$
\end_inset

.
\end_layout

\begin_layout Standard
We know that 
\begin_inset Formula $P1\neq P2$
\end_inset

, because they are predecessors of 
\begin_inset Formula $D1$
\end_inset

 and 
\begin_inset Formula $D2$
\end_inset

 through the same transition over symbol 
\begin_inset Formula $T$
\end_inset

, and if 
\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\noun off
\color none

\begin_inset Formula $P1=P2$
\end_inset

, then 
\begin_inset Formula $D1=D2$
\end_inset

, which we have assumed not to be the case.
\end_layout

\begin_layout Standard
\begin_inset Formula $P1$
\end_inset

 and 
\begin_inset Formula $P2$
\end_inset

 are both in 
\begin_inset Formula $s$
\end_inset

 and therefore are relevantly tied.
 Since they are relevantly tied and share the same postdot symbol, 
\begin_inset Formula $T$
\end_inset

, and since 
\begin_inset Formula $P1\neq P2$
\end_inset

, 
\begin_inset Formula $P1$
\end_inset

 shadows 
\begin_inset Formula $P2$
\end_inset

.
\end_layout

\begin_layout Standard
Because 
\begin_inset Formula $P1$
\end_inset

 shadows 
\begin_inset Formula $P2$
\end_inset

, and 
\begin_inset Formula $P2$
\end_inset

 is the predecessor of 
\begin_inset Formula $D2$
\end_inset

, 
\begin_inset Formula $D2$
\end_inset

 is a Leo reject.
 Since the choice of 
\begin_inset Formula $D2$
\end_inset

 was arbitrary, all dotted rule instances in 
\begin_inset Formula $r$
\end_inset

 are Leo rejects.
 Since all dotted rule instances in 
\begin_inset Formula $r$
\end_inset

 are Leo rejects, 
\begin_inset Formula $r$
\end_inset

 is a Leo reject.
 Since 
\begin_inset Formula $r$
\end_inset

 is a Leo reject, it is Leo conforming.
\end_layout

\begin_layout Standard
This establishes the non-singleton case.
\end_layout

\begin_layout Standard

\emph on
Concluding the Proof
\emph default
: It has been shown in all three cases that the NFA states created by the
 NFA State Creation operation are Leo conforming.
 QED.
\end_layout

\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:Non-kernel-States-are"

\end_inset

Non-kernel States are Leo-conformant
\end_layout

\begin_layout Standard

\emph on
Statement of the Theorem
\emph default
: When the NFA State Splitting operation produces a non-kernel state as
 a result, that non-kernel state is Leo-conforming.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: By the definiton of NFA State Splitting, only non-initial predictions
 go into the non-kernel state.
 The only prediction which is a completion is the null parse rule instance,
 and that is an initial instance.
 Therefore, all dotted rule instances in the non-kernel state are non-completion
s.
\end_layout

\begin_layout Standard
Since dotted rule instances in the non-kernel state are non-completions,
 they are all Leo rejects.
 Since all rule instances in the non-kernel state are Leo rejects, the non-kerne
l state is a Leo reject.
 Since the non-kernel state is a Leo reject, it is Leo-conformant.
 QED.
\end_layout

\begin_layout Section
Construction of the AHFA
\end_layout

\begin_layout Standard
This section reviews the AHFA construction to show that everything in it
 is accounted for by the operations listed above.
\end_layout

\begin_layout Subsection
Construction of the LR(0) DFA
\end_layout

\begin_layout Standard
It is assumed that the reader is familiar with the procedure for constructing
 an LR(0) DFA.
 After creation of the initial states of the LR(0) DFA, construction of
 its accessible states proceeds by transitions over symbols and by 
\begin_inset Formula $\epsilon$
\end_inset

-transitions.
 These correspond to NFA State Creation and NFA State Expansion operations,
 respectively.
\end_layout

\begin_layout Subsection
Construction of the 
\begin_inset Formula $\epsilon$
\end_inset

-DFA
\end_layout

\begin_layout Standard
Aycock and Horspool, in their presentation of the construction of the AHFA,
 show an intermediate transformation -- conversion of an LR(0) DFA into
 a 
\begin_inset Formula $\epsilon$
\end_inset

-DFA.
 Aycock and Horspool then go on to suggest a grammar rewriting, which they
 call NNF.
 This grammar rewriting accomplishes the same thing more easily.
 Their 
\begin_inset Formula $\epsilon$
\end_inset

-DFA is apparently only for pedagogic purposes.
\end_layout

\begin_layout Standard
In this proof the grammar is assumed to be the result of the Marpa grammar
 rewritings as described above.
 Marpa's grammar rewritings incorporate the NNF rewriting as suggested by
 Aycock and Horspool.
 The grammar rewriting is done before construction of the LR(0) NFA, and
 the result is that the LR(0) DFA is also an 
\begin_inset Formula $\epsilon$
\end_inset

-DFA.
\end_layout

\begin_layout Subsection
Construction of the Split 
\begin_inset Formula $\epsilon$
\end_inset

-DFA
\end_layout

\begin_layout Standard
The final stage splits NFA states into one or two non-empty NFA states.
 This is the operation described above as NFA State Splitting.
 The result of NFA State Splitting is what Aycock and Horspool call a 
\begin_inset Quotes eld
\end_inset

split 
\begin_inset Formula $\epsilon$
\end_inset

-DFA
\begin_inset Quotes erd
\end_inset

.
 It is the final product of the construction.
 The 
\begin_inset Quotes eld
\end_inset

split 
\begin_inset Formula $\epsilon$
\end_inset

-DFA
\begin_inset Quotes erd
\end_inset

 is what this document has been calling an AHFA.
\end_layout

\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sub:The-Main-Induction"

\end_inset

The Main Induction
\end_layout

\begin_layout Subsection
\begin_inset CommandInset label
LatexCommand label
name "sub:Induction-Basis"

\end_inset

 Basis
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: The construction of the AHFA begins with one or two initial states.
 The null parse start state is Leo conformant because it is a singleton
 completion.
 The non-null parse start state is Leo conformant because all dotted rule
 instances in it are non-completions and therefore Leo rejects.
 Therefore all initial states are Leo conformant.
\end_layout

\begin_layout Subsection
\begin_inset CommandInset label
LatexCommand label
name "sub:Inductive-Step"

\end_inset

Inductive Step
\end_layout

\begin_layout Standard
Most of the work of the induction steps was done in the preceeding theorems.
 All states once created remain Leo-conformant, as proved in the section
 
\begin_inset CommandInset ref
LatexCommand vref
reference "sec:Leo-Conformance-is"

\end_inset

.
\end_layout

\begin_layout Standard
By the definition of the operations, new NFA states are created in three
 ways:
\end_layout

\begin_layout Enumerate
As initial states.
 These were proved Leo-conformant in section 
\begin_inset CommandInset ref
LatexCommand ref
reference "sub:Induction-Basis"

\end_inset

 and do not form part of the inductive step.
\end_layout

\begin_layout Enumerate
In the NFA State Creation operation.
 These were proved Leo-conformant in section 
\begin_inset CommandInset ref
LatexCommand vref
reference "sec:NFA-State-Creation"

\end_inset

.
\end_layout

\begin_layout Enumerate
As non-kernel states, during the NFA State Splitting operations.
 These were proved Leo-conformant in section 
\begin_inset CommandInset ref
LatexCommand vref
reference "sec:Non-kernel-States-are"

\end_inset

.
\end_layout

\begin_layout Section
\begin_inset CommandInset label
LatexCommand label
name "sec:All-Leo-Candidates"

\end_inset

All Leo Candidates are in Singleton AHFA States
\end_layout

\begin_layout Standard

\emph on
Theorem
\emph default
: All Leo candidates are in singleton AHFA states.
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: The main induction (section 
\begin_inset CommandInset ref
LatexCommand vref
reference "sub:The-Main-Induction"

\end_inset

) shows that all the states in the AHFA are Leo conformant.
 Therefore, by the defiinition of Leo-conformant, all AHFA states are either
 contain no Leo candidates, or are singletons.
 This proves the theorem.
 QED.
\end_layout

\begin_layout Section
All Leo Completions are in Singleton AFHA States
\end_layout

\begin_layout Standard

\emph on
Theorem
\emph default
: All Leo completions are in singleton AFHA States
\end_layout

\begin_layout Standard

\emph on
Proof
\emph default
: An Earley item can only be a Leo completion if its AHFA state contains
 a candidate for Leo completion.
 The previous section 
\begin_inset CommandInset ref
LatexCommand eqref
reference "sec:All-Leo-Candidates"

\end_inset

 showed that these these AHFA states are always singletons.
 Therefore Leo completions will always be singletons.
 QED
\end_layout

\end_body
\end_document