The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=head1 TITLE

Perl 6 Theories - Grand unified object model

=head1 SYNOPSIS

    # theory representing a total order
    theory Ordered{::T} {
        multi infix:«<=» (T $x, T $y --> Bool) {...}
        multi infix:«>=» (T $x, T $y --> Bool) { $y <= $x }
        multi infix:«>»  (T $x, T $y --> Bool) { not $x <= $y }
        ...
    }

    # a cons list data type
    union MyList[::T] (
        MyNil,
        MyCons(T $.head, MyList[T] $.tail),
    );
    
    # Lists of ordered things are lexographically ordered
    # (read <= as "if" rather than "less than or equal to")
    model Ordered{MyList[::T]} <= Ordered{::T} {
        multi infix:«<=» (MyNil,  _)     { true }
        multi infix:«<=» (MyCons, MyNil) { false }
        multi infix:«<=» (MyCons ($h1, $t1), MyCons ($h2, $t2)) {
            when $h1 <= $h2 { true }
            when $h2 >  $h1 { false }
            default         { $t1 <= $t2 }
        }

        # the theory automatically fills in <, >, >=, ==, etc.
    }

=head1 DESCRIPTION

If Perl 6 is to have a strongly typed dialect but still maintain its
dynamicity, it must have a theoretical model for its typing.  Strong
typing without a model ends up with kludges and special cases (see C++).
This document presents a model that is precise and general, but still
feels like Perl.

The root of this model is the distinction between interface and
implementation.  In particular, it holds the idea of a I<mock object>
very dearly; that is, if you can create an object that pretends to be of
a particular type, then it can be used anywhere that a real object of
that type can.  In fact, this proposal destroys the concept of a mock
object, by saying that any object that conforms to the interface of a
type I<is> of that type.

You can never refer to a concrete type explicitly, and you cannot create
a concrete type without creating an interface to go along with it.

Before we get to the type theory, there are two pieces of machinery that
need to be built: tuples and pattern constructors.

=head2 Tuples

The first piece of machinery we need is the tuple.  The tuple underlies
all of the semantics here, but you never really see it.  Perl 6's tuple
is much richer than tuples of other languages; in short, it is an object
representing a parameter list.

The tuple has three parts: positional, named, and block, any of which can
be empty.  It is constructed using the comma.

    my $tuple = (1, :foo(2), 3, 4):{ $_ + 1 };

This tuple has three positionals, C<1,3,4>;  one named, C<< foo => 2 >>;
and one block, C<{ $_ }>.  The block parameters are specified as adverbs
on the tuple as a whole.  If you want a block inside a positional, just
put one there.  If you want a pair inside a positional, enclose it in
parentheses (that is, named elements are I<lexically> determined; in the
tuple (1,$x), $x will always be positional, whether or not it contains a
pair):

    my $tuple = ({ $_ + 1 }, (foo => 2));  # two positionals

Unlike in most other languages, a single-argument tuple is not the same
as a value.  To create a single-argument tuple (which ought to be an
uncommon thing), use a null comma at the end:

    my $tuple = ("foo",);                  # a tuple with one positional

Tuples, like lists, may be flattened into other tuples:

    my $a = (1,2,3);    # a tuple with positionals 1,2,3
    my $b = (0, [,]($a), 4);  # a tuple with positionals 0,1,2,3,4

As a special case, function application will automatically (lexically)
tuplize a single argument:

    foo($x);            # equivalent to foo($x,)

And therefore, you can think of all functions as taking only a single
argument: the tuple of the argument list.  This will be important later.

Lists and tuples should interact in a reasonable way so that you can
pretend that they're the same thing (but in a strongly typed dialect,
they are quite different).  XXX handwave

Since a tuple is essentially an argument list, a tuple pattern is
essentially a parameter list.  The meaning of the slurpy scalar C<*$x>
has changed.  Now it is just like a slurpy list, except that it gathers
the rest of the signature into a tuple C<$x> instead of a list.  To
facilitate the technique that the slurpy scalar was originally
introduced for, we can use this:

    sub first (*[ $x, *@xs ]) { $x }

The upshot is that you can easily delegate an arbitrary argument list to
another function:

    sub foo (*$args) { bar([,] $args) }   # call bar with whatever args foo
                                       # was called with

Tuples also have a pattern constructor, comma, for use in signatures and
bindings.  So you can say:

    multi snd (($x, $y)) { $y }

C<snd> takes only one argument (no, I mean that in the usual sense, not
the every-function-takes-exactly-one-argument sense), where that
argument is a two-tuple, and it extracts the second element.  Now, what
the heck is a pattern constructor?

=head2 Pattern Constructors

Before we get to exactly what a pattern constructor is, let's define
what a pattern is.  A pattern is a thing that can extract
"subinformation" out of an object as it's being bound.  It appears in
argument lists like so:

    rule parameter { <type> <zone> <parameter_name>  <pattern>? 
                   |        <zone> <parameter_name>? <pattern>? }

There are several patterns with special syntax: square brackets for
matching inside arrays, curly brackets for matching out of hashes, bare
parentheses for matching a tuple, all numeric and string literals (for
matching themselves under C<===>), perhaps among others.  These either
require a set of syntactic categories (like C<< pattern:term:<...> >>)
or a straight-up grammar munge.  

The patterns that are easy to define are the I<normal patterns>.  These
are given by the rule:

    rule normal_pattern { <pattern_name> <tuple_pattern>? }

(If the C<tuple_pattern> is omitted, the empty tuple C<()> is assumed)

So you'll have things like:

    multi traverse (Leaf ($v), *&code) { &code($v) }
    multi traverse (Branch ($l, $r), *&code) { 
        traverse($l):&code;
        traverse($r):&code;
    }

Where that first line could have also been either of:

    multi traverse ($leaf Leaf ($v), *&code)   {...}
    multi traverse (Tree _ Leaf ($v), *&code)  {...}

But those don't add anything to the readability in this case.

How does one define a normal pattern constructor?  By placing names into
the C<pattern:> grammatical category.  A normal pattern constructor
takes a type and returns a tuple, which is then unified against the
given tuple pattern.

For instance, I could define an identity pattern that would just return
the one-tuple of the object:

    sub pattern:<Id> ($obj) { ($obj,) }

Or the typed version:

    sub pattern:<Id> (::T $obj --> (::T,)) { ($obj,) }

You could use this pattern to give more than one name to an argument:

    sub mul3 ($x Id ($y Id ($z))) { $x + $y + $z }

Other than that, it's not useful, of course.  But that goes to show that
if the pattern constructors are typed, they imply those type constraints
on the corresponding argument:

    sub pattern:<Leaf> (Tree $t --> (Any,)) {...}

    multi traverse (Tree _ Leaf ($value)) {...}

The C<Tree> in the second declaration is redundant; the C<Leaf> pattern
already implies it.

=head2 Classes and Roles

There are two ways to create a concrete type, otherwise known as a
class.  One is with the C<class> declarator, the other is the C<union>
declarator (which we'll get to in a bit).  A class defined by its
I<instance data> (or attributes) and its I<interface> (or methods).

    class Account {
        has $.balance;     # instance data and interface
        has $.name;        # instance data and interface
        has $._password;   # instance data
        
        method withdrawl ($amount, $password) {  # interface
            # ...
        }
    }

This definition does two things: it defines the package C<Account> with
a C<new> constructor (which puts together the instance data), and it
defines the role C<Account> that contains pretty much the body of the
class, specifying its interface with default implementations (when you
say C<has $.balance>, you are specifying the interface method C<balance>
with a default implementation of object state).  Whenever you say
C<Account> in type position (after a C<my> or in a signature), you are
referring to this role.  Whenever you say C<Account> elsewhere, you are
referring to the package.

To subclass a class, you say that a new class C<does> the old one:

    class CreditAccount {
        does Account;
        
        method withdrawl ($amount, $password) {
            # try to borrow some money
        }
    }

Of course, you are specializing the role C<Account>, not the class.  In
fact, the class doesn't exist at all; it is just a notation to define
the package, concrete type, and role all at once. 

=head2 Unions and Factories

Perl 6 supports union types (also called case types or algebraic data
types). Union types are to factores what classes are to roles (more on
that later).  To demonstrate the syntax, let's define a simple binary
tree:

    union Tree (
        Leaf($value),
        Branch(Tree $left, Tree $right),
    );

The case types (C<Leaf> and C<Branch>) are defined as a I<constructor
name> followed by a tuple pattern; i.e. an argument list.  Like
C<class>, there is no such thing as the union C<Tree>; it is just a
notation to define a bunch of related things:

It defines the packages C<Tree>, C<Tree::Leaf>, and C<Tree::Branch>.  It
defines (so far empty) roles C<Tree::Leaf> and C<Tree::Branch>, and the
I<factory> C<Tree>:

    # automatically defined
    factory Tree {
        generator Leaf ($value) {...}
        generator Branch (Tree $left, Tree $right) {...}
    }

A factory is the dual of a role: it defines a bunch of functions that
I<return> C<Tree>s (where roles define functions that I<accept> trees).
C<generator> is the corresponding dual to C<method> (it returns the
thing in question).  We'll see why there's all this vocabulary later.

Unions can be parameterized over one or more types:

    union Tree[::T] (
        Leaf(T $value),
        Branch(Tree[T] $left, Tree[T] $right),
    );

And the return type of the constructor functions can be stated
explicitly to form a GADT:

    union AST[::] (    # :: is a siglet meaning "a type"
        Const(::T                          --> AST[::T]),
        Cond(AST[Bool], AST[::T], AST[::T] --> AST[::T]),
        Succ(AST[Int]                      --> AST[Int]),
        ...
    );

If the argument names to the constructors are given C<$.attribute> form,
then attribute accessors are defined on the type:

    union Tree (
        Leaf($.value),
        Branch(Tree $.left, Tree $.right),
    );
    Leaf(4).value;                         # 4
    Branch(Leaf(2), Leaf(3)).right.value;  # 3
    Leaf(4).left;   # method found but pattern match failed

Unions can be extended:

    union AnnotatedTree does Tree (
        Annotation($.str, Tree $.tree),
    );

When you extend a union, you create a superset (but still a subtype).
That is, everything that is a C<Tree> is also an C<AnnotatedTree>
(superset), but an C<AnnotatedTree> may be used anywhere a C<Tree> can.

=head2 Theories

Roles and factories are special cases of a much more general structure
called a I<theory>.   A theory declares a general algebraic structure
over several types.  A good example of a theory that is not a role or a
factory is the definition of a ring (roughly an integer-like thing):

    theory Ring{::R} {
        multi infix:<+>   (R, R --> R) {...}
        multi prefix:<->  (R --> R)    {...}
        multi infix:<->   (R $x, R $y --> R) { $x + (-$y) }
        multi infix:<*>   (R, R --> R) {...}
        # only technically required to handle 0 and 1
        multi coerce:<as> (Int --> R)  {...}
    }

This says that in order for a type R to be a ring, it must supply these
operations.  The operations are necessary but not sufficient to be a
ring; you also have to obey some algebraic laws (which are, in general,
unverifiable programmatically), for instance, associativity over
addition: C<(a + b) + c == a + (b + c)>.

To I<instantiate> the theory, that is, declare that some type obeys
these laws, you I<model> it:

    model Ring{Int} {
        # define all needed operations
    }

C<model> is really just a shorthand for an anonymous theory that models
another:

    theory {
        models Ring{Int};
        # definitions
    }

Saying that a theory C<models> another theory says that if you obey the
laws of the former, then you automatically obey the laws of the latter.

How do we actually use theories?  We can specify constraints over type
variables with them:

    multi pow (::T $x, Int $power --> ::T <= Ring{::T}) {
        my $result = $x;
        for (2..$power) { $result *= $x }
        return $result;
    }


This says that C<pow> works on any first argument, as long as it belongs
to a type that models C<Ring>.  C<< <= >> is to be read as "where" or
"requires" or "is implied by".  It needs a better spelling. 

Roles and factories reduce to theories using a I<topic type>.  So the
role:

    role Foo {
        method bar () {...}
    }

Is equivalent to the theory:

    theory Foo{::T} {
        multi bar (T) {...}
    }

Because the role specifies a topic type (which we named C<::T> when we
converted to a theory), and C<method> defines a multi with its first
argument being the topic type.  It is an error to define a C<method>
when there is no topic type.

Conversely, a factory just reduces to a theory with a topic type.
C<generator> defines a multi with its I<return> type being the topic
type.

But roles and factories are a bit more special than theories.  They have
one additional property each:

=over

=item *

If every A is also a B, and R is a role, then R{B} implies R{A}.

=item *

If every A is also a B, and F is a factory, then F{A} implies F{B}.

=back

That is, every subset of something that does a role also does that role,
and every superset of something that does a factory also does that
factory.  This is self-evident when you notice that roles can only have
methods and factories can only have generators.

When you specify a type in a type signature, you're really specifying a
type variable with a constraint:

    multi foo (Bar $x) {...}

Is equivalent to:

    multi foo (::T $x <= Bar{::T}) {...}

And so a role that returns the type the role defines is still a role,
even though it appears to depend on the type contravariantly:

    role Foo {
        method bar (--> Foo) {...}
    }

Is equivalent to:

    theory Foo{::T} {
        multi bar (T --> Foo) {...}
        # which is
        multi bar (T --> ::U <= Foo{U}) {...}
    }

That is, you don't return the I<same> type, you just return one that
conforms to the same interface.

Finally, let's look at the full reduction of our C<Tree> union type and
try to understand it:

    union Tree (
        Leaf($value),
        Branch(Tree $left, Tree $right),
    );

This turns into:

    theory Leaf{::T} {   # expanded from "role Leaf"
        multi pattern:<Leaf> (T --> (Any,)) {...}
    }

    theory Branch{::T} {
        multi pattern:<Branch> (T --> (::U, ::V) 
                                <= Tree{::U} && Tree{::V}) {...}
    }

    theory Tree{::T} <= Leaf{::T} || Branch{::T} {   # note the or
        multi Leaf ($value --> T) {...}
        multi Branch (::A $left, ::B $right --> T 
                        <= Tree{::A} && Tree{::B}) 
        {...}
    }

Whew, good thing we don't have to write that.  As you can see, even
though it looks like C<Branch> is behaving covariantly by looking at the
union, it isn't.  A branch accepts any two things that behave like this
C<Tree>, not the tree itself.

=head2 Vocabulary

Here is a summary of the object-oriented vocabulary in Perl:

=over

=item * C<theory>

The big one in the structures department.  Specifies a general algebra
over one or more types.

=item * C<role>

A I<covariant>, I<unary> theory, that is, a theory over a single type
where the functions defined within only take the type as a parameter,
never return it.

=item * C<factory>

A I<contravariant>, I<unary> theory, that is, a theory over a single
type where the functions defined within only return the type, never take
it as a parameter.

=item * C<class>

A notation that simultaneously defines a role, a concrete type, and a
package of the given name.

=item * C<union>

A notation that simultaneously defines a factory together with a
concrete type, a role, and a package for each case in the factory.

=item * C<model>

A notation for a zero-parameter theory that models another theory.

=item * C<models>

Declares implication over structures.  If one structure models another,
that means that if a type obeys the former, then it obeys the latter.

=item * C<does>

A unary shorthand for C<models>.  C<A does B> is a shorthand for
C<A{::T} models B{::T}>.

=item * C<multi>

The big one in the functions department.  Specifies a
multiply-implemented function dispatched based on the involved types.

=item * C<method>

Goes with roles.  Defines a multi that implicitly takes the topic type
as its first parameter.

=item * C<generator>

Goes with factories.  Defines a multi that implicitly returns the topic
type.

=back

=head2 Notes

There's a little problem with the GADT form:
    
    union AST[::] (
        Const(::T --> AST[::T]),
    );

This convolves to:

    theory AST{::A, ::B} {
        multi Const(::T --> ::U <= AST[::T]{::U}) {...}
    }

Which doesn't even say anything about C<::B>.  It should be:

    theory AST{V} {
        multi Const(::T --> ::U <= V{::T, ::U}) {...}
    }

That is, it is a theory-valued theory.  I guess this makes sense, as all
types are really theories.  But so far type variables can only assume
concrete types; that's the cool thing about them.  You can never
explicitly mention a concrete type, but that's the only thing a variable
can be, so you have to go through interfaces.

I suppose theory-valued theories are necessary if we want GADT (and
there are probably some other uses as well), but I fear what it will do
to the type inferencer.  Maybe there is another way.

UPDATE: There is, and here it is.  Each case gets its own theory, which
lifts the return value from the constructor.  Let's consider a
two-constructor union:

    union AST[::] (
        Const(::T         --> AST[::T]),
        TestZero(AST[Int] --> AST[Bool]),
    );
    
This convolves to:

    theory AST::Const{::T, ::ASTT} {
        multi Const(T --> ASTT) {...}
    }

    theory AST::TestZero{::ASTBool} {
        multi TestZero(::T --> ASTBool 
                    <= AST{::TInt, ::T} && Int{::TInt}) {...}
    }

    # I'll use => as a shorthand for "implies"; i.e. a => b is
    # equivalent to !a || b
    theory AST{::Param, ::Tree}
            <= AST::Const{::Param, ::Tree}
            && (Bool{::Param} => AST::TestZero{::Tree})
    { }

So it all reduces to complex logic between concrete (non-variable)
theories C<T> and type variables C<::V> in the form C<T{::V}>.