The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
----
presentation_topic: A Peek into Pugs Internals
presentation_title: 
presentation_place: http://perlcabal.org/~gaal/peek/start.html
presentation_date: 2006-02-28
author_name: Gaal Yahas
author_email: gaal@forum2.org
author_webpage: http://gaal.livejournal.com/
copyright_string: Copyright © 2006 Gaal Yahas
----
{image: ceri-peek2.jpg 495}

A Peek into Pugs Internals


Gaal Yahas

/pugscode.org/

----
== Warmup question

  package MyMod;
  use base 'Exporter';

  @EXPORT = 'foo';

  sub foo {
      print 42, "\n";
  }

----
== Warmup question

  package MyMod;                 package A;
  use base 'Exporter';
                                 use MyMod;
  @EXPORT = 'foo';
                                 foo();
  sub foo {
      print 42, "\n";
  }

----
== Warmup question

  package MyMod;                 package A;      package B;
  use base 'Exporter';
                                 use MyMod;      use MyMod;
  @EXPORT = 'foo';
                                 foo();          foo();
  sub foo {
      print 42, "\n";
  }

----
== Warmup question

  package MyMod;                 package A;      package B;
  use base 'Exporter';
                                 use MyMod;      use MyMod;
  @EXPORT = 'foo';
                                 foo();          foo();
  sub foo {
      print 42, "\n";
  }

                         # main.p5
                         use A;
                         use B;

----
== Warmup question


  module MyMod;                  module A;      module B;

  sub foo is export {            use MyMod;     use MyMod;
      say 42;
  }                              foo();         foo();
  
  
   
   
                         # main.p6
                         use A;
                         use B;

----
== Up to r8288:

  % ./pugs test.pl
+  42
+  *** No such sub: "&foo"
  at B.pm line 5, column 1-6

+* What went wrong?

----
== `use MyMod` in Perl 5

.vim
filetype: perl
BEGIN { require MyMod; MyMod->import }
.vim

* The parser sees "use MyMod"
+** Loads MyMod.pm
+** Compiles it
+** Runs its `import` routine
+* `use` is a /digression/ in the caller's compilation
+* BEGIN is a powerful but /weird/ tool
** The compiler and the evaluator intermix

----
== Perl 6 version

* `is export` is a trait of `&foo` marking it as exportable

* This is much nicer for the programmer than @EXPORT stuff

+* But is more work for the language implementor

----
== Hackaround

* Load module MyMod when it is first used

* At parse time, push `is export` routines into the caller's namespace immediately

* /Obviously/ broken, because MyMod.pm only gets parsed once

----
== Bleargh!

* Don't get the impression that Pugs is a pile of mud because of this "feature"

+* To do it right, we'd need to support lexical import, which is a new feature in Perl 6

+* A deliberate makeshift that let us act as users of Perl would before we had that much of perl available

+* Good enough in many cases 
** for example, writing 8,000 tests for /other/ things!

+* But the fact that it don't work don't mean it don't need no fixing

.c

The way import works in Perl 6 is actually much more clever than Perl 5;
you can say `{ use MyMod }` and not see the the effect of the use outside
the braces. This is known as lexical import (or export -- depending on
whose perspective you take).

.c
----
== Okay, but what's the fix?
.vim
filetype: perl
BEGIN { require MyMod; MyMod->import }
.vim

+* At parse time, make a note about what MyMod is willing to export

+* Perform the export when somebody uses it
----
== `[patch]`
.vim
filetype: haskell
  - unsafeEvalExp $ mkSym nameExported
.vim
+.vim
filetype: haskell
  + -- %*INC<This::Package><exports><&this_sub>
  + --   = expression-binding-&this_sub
  + unsafeEvalExp $
  +   Syn "=" [Syn "{}" [Syn "{}" [Syn "{}"
  +     [Var "%*INC", Val $ VStr pkg], Val (VStr "exports")]
  +     , Val $ VStr name], Val sub]
.vim

+* Don't freak out!

+* We're just running Perl 6 in the parser!

----
== Classic compilation
{html:<img width="900" src="images/compilation_flow_no_snip.png" />}

* This is a typical description of how compilation works in a language like c

----
== Classic compilation
{html:<img width="900" src="images/compilation_flow_with_snip.png" />}

* This is a typical description of how compilation works in a language like c

* Some of that doesn't interest us right now.

* Every language needs a parser
+** (except maybe LISP :-)

----
== Perl 5 model
{html:<img width="900" src="images/interpretation_flow.png" />}

* The code we were looking at is in the parser itself
+* We could have done `eval $some_perl_code`, but that's slow
+* We could have used the internal API, but that's hard
+* Instead, express what you want to do in the same way the parser does it
** Use ASTs
** Whatever those are

.c

This is probably the source of the confusion about calling Perl
an interpreter: there's no storage for the result of the a parse
that is kept outside of the evaluator (unless you're doing very
fancy stuff). Unlike the strict definition of an interpreter,
though, Perl does compile as much as it can straight off when it can. 

...cgi -> fastcgi -> mod_perl
...parser always available to support eval
...btw did you know that the javac compiler is also available via a
programmatic interface? And that you can load new classes at runtime?
also available via a programmatic interface? And that you can load
new classes at runtime?

.c

----
== Pugs AST
.vim
filetype: perl6
if 42 { say "hello" }
 else { say "oh no!" }
.vim
+
{image: gaal_no_ann.png 640}
* An /abstract syntax tree/ is a structure representing the parsed program
+* Each implementation picks the types of nodes it carries
** In practice, the language drives the implementation choices
** together with the implementor's emphasis (speed, education...)
.c
Why does `42` parse as `Val (VInt 42)`, and not just `42`? Because an
`if` can take any expression as the condition; this one just happens
to be a value. And Perl distinguishes between different types of
values, so this is an integer and not, say a sting.

If I had a more complex condition here, we'd just look at a more complex
AST for this.
.c
----
== Pugs AST (with annotations)
.vim
filetype: perl6
if 42 { say "hello" } else { say "oh no!" }
.vim
{html:<img name="img" id="img" width="900" src="images/gaal_with_ann.png" />}
----
== Pugs AST
.vim
filetype: haskell
data Exp
    = Noop
    | FunctionApplication Exp (Maybe Exp) [Exp]

    | Syntax String [Exp]

    | Statements Exp Exp
    | Value Val
    | Variable Var
.vim
.c
This isn't really how write this, because we like to golf.
"Annotation" is one variant of `Exp`, which contains an Ann and an Exp.
`Ann` is another data type defined elsewhere, and haskell does not confuse
a variant (called "constructor") with names of data types, so we abbreviate
and write "Ann" in both places.
.c
----
== Pugs AST
.vim
filetype: haskell
data Exp
    = Noop                      -- ^ No-op
    | App Exp (Maybe Exp) [Exp] -- ^ Function application
                                --     e.g. myfun($invocant: $arg)
    | Syn String [Exp]          -- ^ Syntactic construct that cannot
                                --     be represented by 'App'.
    | Stmts Exp Exp             -- ^ Multiple statements
    | Val Val                   -- ^ Value
    | Var Var                   -- ^ Variable
.vim
.c
This is how it appears in the Pugs tree (this week)
.c

.c
 XXXXX elided
  ----
== Perl 6 model

{image: http://pugs.blogs.com/photos/visiolization/simplecompilation.png 500}
----
== Parsec
* A parser combinator library
+* /Combine/ simple parsing functions into smarter parsers
+* Then combine them into even smarter ones

.c
----
== Simple example
.vim
filetype: haskell
undefLiteral = do
    symbol "undef"
        return $ Val VUndef
.vim
+* `symbol` is a Parsec builtin
+* it asserts the next symbol is a literal ("undef" in this case)
+* eats it up
+* and returns a small chunk of AST representing undef

----
== Another simple example
.vim
filetype: haskell
ruleVerbatimBlock = verbatimRule "block" $ do
    body <- between (symbol "{") (char '}') ruleBlockBody
    return $ Syn "block" [body]
.vim

* "to parse a verbatim block, look for a block body /between/ `{` and `}`"
+** `between A B C` is a Parsec function that looks for C between A and B
+** A, B, and C are themselves parsers/functions
+** `between` returns whatever C returns.

----
== More rules
.vim
filetype: haskell
ruleStatement = do
    exp <- ruleExpression
    f <- option return $ choice
        [ rulePostConditional
        , rulePostLoop
        , rulePostIterate
        ]
    f exp
.vim

+.vim
filetype: perl6
say "OSDC was great" if $cakes.yummy
.vim

+* `choice` means try out a few parsers
+** the first that matches wins, otherwise backtrack
+** if none succeeded, the `choice` fails
+* in this case, it's protected by an `option`, letting you provide a fallback

----
== Parsing `for`
.vim
filetype: haskell
ruleForConstruct = rule "for construct" $ do
    symbol "for"
    list  <- maybeParens ruleExpression
    optional ruleComma
    block <- ruleBlockLiteral
    retSyn "for" [list, block]
.vim
* `optional` means try something out
* if it wasn't found, don't fail the current rule but don't consume anything
* skip what did parse though
+* In Perl 5 regexps: `/(?:,)?/`

----
== Brace yourself
.vim
filetype: perl
my $car = %models{$wanted};
.vim

* Let's look at the code to extract `$wanted` out of this expression into an AST

+
.vim
filetype: haskell
ruleHashSubscriptBraces = do
    between (symbol "{") (char '}') $ option id $ do
        exp <- ruleExpression; return $ \x -> Syn "{}" [x, exp]
.vim

+* Three lines is more than it'd take with a regexp...
** but it does a little more.
* We know some of this already

----
== Parsing with parsec - what goes down
.vim
filetype: haskell
ruleHashSubscriptBraces = do
    between (symbol "{") (char '}') $ option id $ do
        exp <- ruleExpression; return $ \x -> Syn "{}" [x, exp]
.vim
----
== Parsing with parsec - what goes down
.vim
filetype: haskell
ruleHashSubscriptBraces = do
    between start end subscript
    where
    start       = symbol "{"
    end         = char   '}'
.vim
+
.vim
filetype: haskell
    -- try out subscriptExp, but it's okay if it fails
    subscript   = option id subscriptExp
.vim
+
.vim
filetype: haskell
    subscriptExp = do
        exp <- ruleExpression
        return $ \x -> Syn "{}" [x, exp]
.vim

* `id` *?*
* `\x -> Syn "{}" [x, exp]` /*????*/

----
== Many happy returns
.vim
filetype: haskell
ruleHashSubscriptBraces :: RuleParser (Exp -> Exp)
ruleHashSubscriptBraces = do
    between (symbol "{") (char '}') $ option id $ do
        exp <- ruleExpression; return $ \x -> Syn "{}" [x, exp]
.vim

* The return value of ruleHashSubscriptBraces is `(Exp ->Exp)`
+* That is, a function taking an `Exp` and returning an `Exp`
+* Most parser functions return an Exp

+* This one returns a closure
----
== Back to Braces
.vim
filetype: haskell
ruleHashSubscriptBraces :: RuleParser (Exp -> Exp)
ruleHashSubscriptBraces = do
    between (symbol "{") (char '}') $ option id $ do
        exp <- ruleExpression; return $ \x -> Syn "{}" [x, exp]
.vim
+
.vim
filetype: haskell
firstPossibleReturnValue = id
    where
    id x = x    -- actually, this is a standard function
otherPossibleReturnValue = \x -> Syn "{}" [x, exp]
.vim
+
.vim
filetype: perl
return sub { my $x = shift; return $x };
return sub { my $x = shift; return Syn("{}", [$x, $exp]) };
.vim
+
.vim
filetype: perl6
return -> $x { $x };
return -> $x { Syn("{}", [$x, $exp]) };
.vim
----
== Back to Braces
.vim
filetype: haskell
ruleHashSubscriptBraces :: RuleParser (Exp -> Exp)
ruleHashSubscriptBraces = do
    between (symbol "{") (char '}') $ option id $ do
        exp <- ruleExpression; return $ \x -> Syn "{}" [x, exp]
.vim

* Why do we need the `option id`, anyway?
.vim
filetype: perl6
%siblings{};       # exactly the same as %siblings, except
.vim
+.vim
filetype: perl6
say "%siblings";   # "%siblings"
say "%siblings{}"; # charlie   => donald etc.
.vim

----
== Breather (whew)

* You now know quite a bit about parsing Perl 6 with Haskell

* Any questions before we go on?

----
== "Why do I need this?"

* Maybe you prefer to keep writing Perl
** You already know Perl 5 regular expressions
** It's Perl 6 you want to use now

+* Whenever you see a Buddha on the road
----
== "Why do I need this?"

* Maybe you prefer to keep writing Perl
** You already know Perl 5 regular expressions
** It's Perl 6 you want to use now

* Whenever you see a Good Idea on the road
+** Steal it!
----
== Perl 6 Rules

* Of course it does!

+* A lot like regexps
+* Lots of reusability improvements
+* Powerful enough to do everything we've seen Parsec do

----
== Use case
.vim
filetype: perl
my $obj;
if (/Car=(?:(Ferrari)|(ModelT)(\d\d\d\d))/) {
    if ($1) {
        $obj = Car->new({color => "red", ... });
        ...
    } elsif ($2) {
        $obj = Car->new({color => "black", year => $3});
        ...
    }
}
.vim

* Well, fine, but can't this be made shorter /and/ more readable?

----
== Rules version
.vim
filetype: perl6
my $obj = m{
    Car =
        [ Ferrari
          : { return Car.new(:color<red>) }
        | ModelT $<year>:=(\d\d\d\d)
          : { return Car.new(:color<black> :$<year>) }
        ]
};

----
== Parsing undef
.vim
filetype: perl
my $undefined = qr/undef/;
.vim

+.vim
filetype: perl6
rule undefined { undef }
.vim

+.vim
filetype: perl6
rule undefined { undef : { return Val(VUndef) } }
.vim

----
== Parsing literals
.vim
filetype: perl6
rule term
    { undef            : { return Val(VUndef) }
    | <digits>
    }
.vim

----
== Parsing literals
.vim
filetype: perl6
rule term
    { undef            : { return Val(VUndef) }
    | \d+
    }
.vim

----
== Parsing literals
.vim
filetype: perl6
rule term
    { undef            : { return Val(VUndef)   }
    | (\d+)            : { return Val(VInt($0)) }
    }
.vim

----
== Parsing literals
.vim
filetype: perl6
rule term
    { undef            : { return Val(VUndef)          }
    | $<digits>:=(\d+) : { return Val(VInt($<digits>)) }
    }
.vim

----
== Parsing literals
.vim
filetype: perl6
rule term
    { undef            : { return Val(VUndef)   }
    | (\d+)            : { return Val(VInt($0)) }
    }
.vim

----
== Parsing terms
.vim
filetype: perl6
rule term
    { undef            : { return VUndef   }
    | (\d+)            : { return VInt($0) }
    }

rule expr
    { <term> ...
.vim

----
== Parsing terms
.vim
filetype: perl6
rule term
    { undef            : { return VUndef    }
    | (\d+)            : { return VInt($0)  }
    }

rule expr
    { <term>           : { return Val($term) }
    |
.vim

----
== Parsing terms
.vim
filetype: perl6
rule term
    { undef            : { return VUndef    }
    | (\d+)            : { return VInt($0)  }
    }

rule expr
    { <term>           : { return Val($term) }
    | (<expr>) (<[+-]>) (<expr>)
                       : { return Op(VStr($1), $0, $2) }
    }
.vim

----
== Parsing terms
.vim
filetype: perl6
rule term
    { undef            : { return VUndef    }
    | (\d+)            : { return VInt($0)  }
    }

rule expr
    { <term>           : { return Val($term) }
    | $<lhs> := <expr>
                <operator>
      $<rhs> := <expr>
                       : { return Op(Val($<operator>), $<lhs>, $<rhs>) }
    }
.vim
+.vim
filetype: perl6
rule operator
    { $<op> := <[+-]>
                       : { return VStr($<op>) }
    }
.vim

----
== Conclusion

* Parsec is flexible
** not as scary as it looks
+* Perl 6 Rules can be as powerful as Parsec
----
== ObMoose
{image: http://forum2.org/gaal/m00se.png}


=== /Thank you!/
----