docs/notes/p6ast_draft - metacpan.org

(Gobby log from collaborative P6AST brainstorming session - this needs major
refactoring to be useful, perhaps ultimately as Perl6::API::AST.)

# Goals of this P6AST
# Easy to manipulate; deparse back into something still resembling Perl6
# Strong normalization - 0x1 and 1 become the same node at this stage
# _no_ BEGIN-time concepts - everything is runtime (compiler will need to emit BEGIN-time objects separately)
# all magical twigils like $?SELF get their own node? (not sure,sounds resonable)
# a lot but simple nodes (not sure)
	- this can be achieved by simple post-processing; maybe it doesn't belong
	in the 'basic' parser?
# Nodes are hashes, or arrays, but not arrays-with-attributes or captures
#  - Easy to represent exactly in p5, js, etc.
# Gives all positioning neccessary for highly detailed error messages
#  - ???
#  - Do we also give the original source text, so the interpreter can show the code?
#  - This gets very big, very fast
#  - Maybe optional?
#  - Each AST node could attach its original parse tree node, or source position range (for error reporting)
#  - normally range is quite sufficient for highlighting errors or recovering text
#  - the main ast node could store the orginal source or an url to it
#	- maybe store a checksum of the source code
# I found e4x a nice language for manipulating trees, it would be nice to be able to emit P6AST code
# as XML to be able to leverage that tool.
# xyx? :-)

= Compilation Unit

Toplevel AST node. Name, checksum of source, time of compilation etc all goes here.

= Literals

== "unboxed" type needs their own Literal node

int: -10
num: 1.5
str: "moose"

== "boxed" objects that are used all the time may need convenience node

Undef: undef

Code: ->$x{$x+1}
 - Pad - mapping from name to storage_class (state, my)
 - Signature (for the incoming args)
 - Signature (for the return values)
 - Body (AST)
 - Class (Pointy, Routine, etc)
 - Multiness (proto/multi/none)
 
Capture: \(x=>1)
 - invocant
 - possibly multiple sequences of:
    - positional
    - named

Signature: :($x?=2)
 - params, sorted by tiebreaking level  (the last level is non-tiebreaking)
 - each param has: type, name, default, storage_class, context

== finally, a "Literal object" for BEGIN-time splicing

Object: Class.new(field1=>..., field2=>...)

= Calls

misc/pX/Common/AST-Design/Perl 6 has two forms of calls, based on the presence of invocant - so we can unify
them into one node based on Capture literal. However, there are still two variants
of calls that have different lookup logic:

== Named call

Call: $x.meth(1)
      meth($x, 1)
 - name: str
 - arguments: Capture
 - context

== Closure object call

Apply: &code(1)
        &code($x: 1)
 - closure: Expression
 - arguments: Capture
 - context

In either case, the context is statically known and made part of the node.

= Variable

Variable lookups are either early (compile-time-bound) or late (runtime-discovered):

    { my $x; $x } # early
    { my $x; MY::{'$'.'x'} } # late

As you can see, the latter form is always desugared into a postcircumfix:<{}>, so
a single node form for the early bound is sufficient.

== Static variable lookup

Var: $x (maybe also $OUTER::x, by adding "depth" to the static var's node)

== Dynamic variable lookup

This is so-called "soft" references:

Sym: $::('Moose');  # postcircumfix:<{ }>(::, '$Moose')

yeah, that's possible, but that also makes Sym->Var analysis _really_ hard :)
need to recover it from a Call node = pain

Also! the :: form is entirely unspecced ;-)
my $root_of_evil = ::;
$root_of_evil.<$Moose>; # wha?
well, a pad does hash...
but :: is more than a pad
it's lex (+outer/outerouter/...) + pkg + magic
sure it can be emulated somehow, but I think it's prolly too expensive to save a node type here
$::('?SELF'); # note that this is actually legal
$::('ENV'); # too, if $*ENV is visible (imported from GLOBAL)

the magic of ?SELF and ENV are in what the .{} returns...

er, sorry, I mean the spec says that if $::('constant') is seen
it must be converted into a Var lookup if it can
this is a static analysis
and as such, without a Sym node, is quite hard
('ordinary' constant folding?) (how do you fold over a .{}? the C<::> is not a constant!)
(if it is, then it must be available as MagicCurrentScope that is treated as constant --
that's what I mean below - also the .{} must be statically closed and not rebindable)

you'll have to statically know that .{} havn't been rebound on :: or some such
(in anycase, if we really get a C<::> form in spec and is available as MagicCurrentScope
node, I can see the Sym node going away - but not until that)
np - i was just pushing your earlier statement forward to see where it went.
Sym node for practicality or need, sure.


== Voodoo variables

$?SELF and the like. They are not (neccessarily) in the pad, so we need
to enumerate them and make a node out of it.

should they have a VoodoVar node?
yeah, I think so - the content can be a simple string like '$?SELF'
or we can bound it to an enumerated set of symbols as subnodetypes
(the latter is arguably cleaner.)
meaning we will have a Self node?
probably MagicSelf or something
nice

Magic: <one of the nodes>
 - MagicSelf | MagicClass | MagicGLOBAL | MagicCOMPILING MagicMushroom(etc)

= Expression

An Expression is either a Literal, a LValue (Variable/Apply/Call), or one of the
special forms (Binding/Assignment):

Operator nodes need to include the 'fixity' in which the operator 
was found, in order to tell ++$a from $a++.
# or would it just store the long name?
# is "sub '++' {}" declaring "prefix:++" by default, or it a separate sub?
# which just happens to clash with prefix:++ in some context?

# Bare block is a call of a literal block?
Aye, it's Apply of a Code literal.

# Are all the loop constructions now normal calls?
They are Call nodes to statement_control:<foo>.

# PG-P6 currently considers a program a list of expressions, connected with infix:<;>
# if-blocks have an implicit ';'

problem with infix:<;> is that if you have nested infix:<;>
the pretty printer can't print them back
also the semantic of infix:<;> is, uhm, "interesting"
can one redefine infix:<;>?
ooh monadic programming here we come
(I hope not. which is why I think Stmts is a node and thus special form)

It's easier to define the syntax this way - but it can be post-processed into the real thing

yeah, as p6 does distinguish between stmt and exp
for example, you can't label an exp
and pragmas operate on stmts not exps
to parse labels, you'd need a real Stmt structure to hang it on
# is this a surface constraint that might be gone by AST time?
(I think it's something deeper - i.e. when you optimize, you can't produce AST that
cannot be prettyprinted back)
so preferably the "hard" surface constraints are reflected in AST
(if not so, we only need Var Call Lit and nothing else - aka PIL^N)

what do you mean by nested ';'?  a;b;(c;d);e ?
yeah.
nested ; is impossible in surface syntax
so I think it shouldn't be possible in AST

PG-P6 generates a circumfix:<( )> in this case, so it can be pretty-printed
the thing though is that
$x = (1;2);
doesn't mean that 1 and 2 are statements
rather, that is merely special Capture syntax that constructs a parallel Capture
mm, I'm going out of battery... :-/
(a sec)

so PG-P6 really needs a semantic analysis step - the precedence parser can't tell
the difference
yup - the semantic analysis can happen as part of the rule for simple cases
but otherwise it needs to defer to another level
for now it's okay just to generate a lot of Call nodes
and pray that all works :)
(but mostly the semantic analysis will turn Sym and Call into Var and Apply --
i.e. dynamic nodes into static ones)

== Binding

Bind: Signature := Capture

== Assignment

Assignment could be a normal "call" to infix:<=>, but currently the LHS dominates
the context in a way that's rather hard to specify. So might warrant two nodes,
one for each context -- that leaves room for listopfying the latter as TimToady has been pondering.

AssignOne: LHS = Expression-non-flattening
  $x = 3

AssignMany: (LHS) = Expression-flattening
  ($x,) = 3

= Statement

Statement: list of:
        - Label
        - Pragmas (#line, fatal, whatnot)
        - Expression

== Issue: How are mini languages interfaced?

  $a = rule { .* }
  
  { op => '=', 
    var => $a, 
    rule => { language => rule, ast => ... }   # eval-like ???
    #or rule => { language => rule, code => ... }? the backend may contain a library 
      which takes code instead of ast ie. for p5 regexp
    
  }
  
  - mini-languages may have their own emitters
  
  - PG-P6 has quoted-string and angle-string 
  - could these be considered mini-languages ???
  - maybe a special node?
  
== Issue: preserving source formatting/whitespace

== Issue: preserving preprocessor history

== Issue: Overcomplexity
- post proccesing

== Parsing phases

- Tokenizer
    - some nodes are not completely qualified
- Precedence parser
    - may ignore signatures
- Semantic analysis
    - constants have actual values, ...
    - allocate pads
- Optimizer
...

  -- should we have specs for intermediate data on each phase?
        - Each parser is bound to have a different ParseTree
        - May make sense to unite parse tree emitted from Rule-based ones
  -- each optimizer might request a custom format (with some nodes turned into others)
        - Optimizer can work on P6AST->P6AST -- this part will be shared
        - But each codegen will have their own optimizer at P6AST->(say PIR) level as well
  -- different backends have different needs
        - low level backends (C,C--,asm) need info for memory managment
        - high level backends (ruby,python) need higher level information to generate better intergrated code 
        
= Examples

say "Hello World"

==> Call( name => 'say', args => Capture(invocant => LitStr("Hello World")))

"Hello World".say()

==> Call( name => 'say', args => Capture(invocant => LitStr("Hello World"))

say "Hello ", "World"

==> Call( name => 'say', args => Capture([positional => [LitStr("Hello "), LitStr("World"]])
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)