The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
# Copyright (C) 2008-2010, Parrot Foundation.

=head1 [DRAFT] PDD 29: Compiler Tools

=head2 Abstract

This PDD specifies the Parrot Compiler Tools (PCT).

=head2 Definitions

=head3 Compiler

In this document, when we speak of a I<compiler>, we mean
PCT-based compilers.

=head3 HLL

A High-Level Language. Examples are: Perl, Ruby, Python, Lua, Tcl, etc.

=head2 Synopsis

Creating a PCT-based compiler can be done as follows:

=begin PIR

 .sub 'onload' :anon :load :init
     load_bytecode 'PCT.pbc'
     $P0 = get_hll_global ['PCT'], 'HLLCompiler'
     $P1 = $P0.'new'()
     $P1.'language'('Foo')
     $P1.'parsegrammar'('Foo::Grammar')
     $P1.'parseactions'('Foo::Grammar::Actions')
 .end

 .sub 'main' :main
     .param pmc args
     $P0 = compreg 'Foo'
     $P1 = $P0.'command_line'(args)
 .end

=end PIR

{{ this is the most important part; is this enough? }}

The Parrot distribution contains a Perl script to generate a compiler
stub, containing all necessary files. This generated compiler will
compile out of the box. It is highly suggested to use this script to
get started with the PCT.
The script is located in C<tools/dev/mk_language_shell.pl>.

{{ Not sure whether the mk_language_shell.pl script should be mentioned long
term.  In a sense, this script can also be considered part of the parrot
compiler "tools", as it is used to create a compiler.  }}

=head3 Parser Synopsis

 grammar Foo is PCT::Grammar;

 rule TOP {
     <statement>*
     {*}
 }

 rule statement {
     <ident> '=' <expression>
     {*}
 }

 rule expression is optable { ... }

 proto infix:<+> is precedence('1') is pirop('n_add') { ... }

 rule 'term:' is tighter(infix:<+>) is parsed(&term) { ... }

 rule term {
     | <ident> {*}       #= ident
     | <number> {*}      #= number
 }

=head3 Actions Synopsis

{{ Is this a good idea? }}

 class Foo::Grammar::Actions;

 method TOP($/) {
     my $past := PAST::Block.new( :blocktype('declaration'), :node($/) );
     for $<statement> {
         $past.push( $( $_ ) );
     }
     make $past;
 }

 method statement($/) {
     make PAST::Op.new( $( $<ident> ),
                        $( $<expression> ),
                        :pasttype('bind'),
                        :node($/) );
 }

 method expression($/, $key) {
     ...
 }

 method term($/, $key) {
     make $( $/{$key} );
 }

=head3 Running the compiler

Running the compiler is then done as follows:

 $ parrot foo.pbc [--target=[parse|past|post|pir]] <file>

{{ other options? Maybe --target=pbc in the future, once PBC can be
   generated? }}

=head2 Description

The Parrot Compiler Tools are specially designed to easily create a
compiler targeting the Parrot Virtual Machine. The tools themselves
run on Parrot, which implies that no other external programs are
needed.

The PCT is a set of libraries and programs, to:

=over 4

=item * create a parser

=item * create an intermediate data structure (Abstract Syntax Tree)

=item * generate executable Parrot code

=back

{{ Maybe just say it's used to create parrot-targeting compilers, not
   list these as =items }}


=head2 Implementation

The PCT is made up of the following libraries and programs:

=over 4

=item * Parrot Grammar Engine (PGE)

=item * Parrot Abstract Syntax Tree (PAST) classes

=item * Parrot Opcode Syntax Tree (POST) classes

=item * PCT::HLLCompiler class

=item * PCT::Grammar class

=back

Although strictly speaking it is not part of the PCT, the
Not Quite Perl (6) (NQP) language is typically used in all PCT-based
compilers. NQP is a subset of the Perl 6 language, and is a
high-level language as an alternative for PIR.

=head3 Compilation phases

A PCT-based compiler has by default four compilation phases, or
I<transformations>. Phases can be removed and added through the API of
the C<HLLCompiler> class. These are:

=over 4

=item * source to parse tree

The source is read, parsed and stored in a parse tree.

=item * parse tree to PAST

The parse tree is converted into a Parrot Abstract Syntax Tree.

=item * PAST to POST

The PAST is converted into a Parrot Opcode Syntax Tree.

=item * POST to PIR

The POST is converted into executable Parrot Intermediate Representation.

=back

=head4 Source to Parse Tree

The first stage of a PCT-based compiler is done by the C<parser>. The
parser is defined as a set of Perl 6 Rules, which is processed by the
Perl 6 Rules compiler. This results in a generated PIR file that
implements the parser.

{{ Doesn't this make the Perl 6 rules compiler part of the PCT? }}

During the first stage, the source (input string) is parsed, resulting
in a C<parse tree>.

=head4 Parse tree to PAST

The second stage converts the parse tree into a Parrot Abstract Syntax Tree
(PAST). PAST is a data structure consisting of PAST nodes, each of which
represents a common HLL construct. While all languages differ in syntax,
many constructs in different HLLs map to the same semantics. This second
transformation is executed during the parse stage. The transformations
of the parse tree nodes into PAST nodes is done by so-called parse actions,
which are methods of a class that is specified through the C<parseactions>
attribute of the HLLCompiler. Such classes are implemented in NQP.

{{ How do we say that this is not obligatory; you could also use PIR,
   and in the future maybe other languages.
}}

=head4 PAST to POST

The third transformation converts the PAST into a Parrot Opcode Syntax Tree
(POST). PAST nodes represent HLL constructs, which are transformed into a
set of low-level POST nodes. A POST node is a low-level node, representing
a single instruction, label, or a subroutine. While a PAST is very close to
a HLL program, a POST is much closer to PIR code.

=head4 POST to PIR

The last transformation generates PIR code from the POST.

The generated PIR is then fed into the Parrot executable, and processed
into Parrot Byte Code (PBC) by the PIR compiler.

=head3 Parrot Grammar Engine

The Parrot Grammar Engine (PGE) is a component that I<executes> regular
expressions. Besides I<classic> regular expressions, it also understands
Perl 6 Rules. Such rules are special regular expressions to define a grammar.

The I<start> symbol in a grammar is named C<TOP>; this is the top-level
rule that is executed when the parser is invoked.

=head4 Operator precedence parsing

{{ insert stuff about using an operator prec. table here }}

=head3 Parrot Abstract Syntax Tree

The PCT includes a set of PAST classes. PAST classes represent common language
constructs, such as a C<while statement>.
These are described extensively in L<docs/pdds/pdd26_ast.pod>.

=head3 Parrot Opcode Syntax Tree

=head4 POST::Node

POST::Node is the base class for all other POST classes.

=head4 POST::Op

=head4 POST::Ops

=head4 POST::Label

=head4 POST::Sub



=head3 PCT::Grammar

The class C<PCT::Grammar> is a built-in grammar class that can be used as
a parent class for a custom grammar. This class defines a number of rules and
tokens that are inherited by child classes. Note that the concept of C<class>
and C<grammar> are equivalent.

The following rules are predefined:

{{ is this necessary, or just a reference to the file? }}

=over 4

=item ident

=item ws

=back

=head3 PCT::HLLCompiler

All PCT-based compilers use a HLLCompiler object as a compiler driver.
It acts as a I<facade> for the compiler. This object invokes the different
compiler phases.

=head4 HLLCompiler API Methods

{{ TODO: complete this }}

=over 4

=item language

=begin PIR_FRAGMENT

  $P0.'language'('Foo')

=end PIR_FRAGMENT

=item parsegrammar

=begin PIR_FRAGMENT

  $P0.'parsegrammar'('Foo::Grammar')

=end PIR_FRAGMENT

=item parseactions

=begin PIR_FRAGMENT

  $P0.'parseactions'('Foo::Grammar::Actions')

=end PIR_FRAGMENT

=item commandline_prompt

=begin PIR_FRAGMENT

  $P0.'commandline_prompt'($S0)

=end PIR_FRAGMENT

sets the string in C<$S0> as a commandline prompt on the compiler in C<$P0>.
The prompt is the text that is shown on the commandline before a command is
entered when the compiler is started in interactive mode.

=item commandline_banner

=begin PIR_FRAGMENT

  $P0.'commandline_banner'($S0)

=end PIR_FRAGMENT

sets the string in C<$S0> as a commandline banner on the compiler in C<$P0>.
The banner is the first text that is shown when the compiler is started in
interactive mode. This can be used for a copyright notice or other
information.

=back

=head2 References

F<docs/pdd26_ast.pod>, L<http://dev.perl.org/perl6/doc/design/syn/S05.html>

=cut

__END__
Local Variables:
  fill-column:78
End: