Herbert Breunung > Perl6-Doc > Synopsis_02

Download:
Perl6-Doc-0.36.tar.gz

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Source  

NAME ^

Synopsis_02 - Bits and Pieces

AUTHOR ^

Larry Wall <larry@wall.org>

VERSION ^

  Maintainer: Larry Wall <larry@wall.org>
  Date: 10 Aug 2004
  Last Modified: 15 Feb 2008
  Number: 2
  Version: 129

This document summarizes Apocalypse 2, which covers small-scale lexical items and typological issues. (These Synopses also contain updates to reflect the evolving design of Perl�6 over time, unlike the Apocalypses, which are frozen in time as "historical documents". These updates are not marked--if a Synopsis disagrees with its Apocalypse, assume the Synopsis is correct.)

One-pass parsing ^

To the extent allowed by sublanguages' parsers, Perl is parsed using a one-pass, predictive parser. That is, lookahead of more than one "longest token" is discouraged. The currently known exceptions to this are where the parser must:

Lexical Conventions ^

Whitespace and Comments ^

Built-In Data Types ^

Native types

Values with these types autobox to their uppercase counterparts when you treat them as objects:

    bit         single native bit
    int         native signed integer
    uint        native unsigned integer (autoboxes to Int)
    buf         native buffer (finite seq of native ints or uints, no Unicode)
    num         native floating point
    complex     native complex number
    bool        native boolean

Since native types cannot represent Perl's concept of undefined values, in the absence of explicit initialization, native floating-point types default to NaN, while integer types (including bit) default to 0. The complex type defaults to NaN + NaN.i. A buf type of known size defaults to a sequence of 0 values. If any native type is explicitly initialized to * (the Whatever type), no initialization is attempted and you'll get whatever was already there when the memory was allocated.

If a buf type is initialized with a Unicode string value, the string is decomposed into Unicode codepoints, and each codepoint shoved into an integer element. If the size of the buf type is not specified, it takes its length from the initializing string. If the size is specified, the initializing string is truncated or 0-padded as necessary. If a codepoint doesn't fit into a buf's integer type, a parse error is issued if this can be detected at compile time; otherwise a warning is issued at run time and the overflowed buffer element is filled with an appropriate replacement character, either U+FFFD (REPLACEMENT CHARACTER) if the element's integer type is at least 16 bits, or U+007f (DELETE) if the larger value would not fit. If any other conversion is desired, it must be specified explicitly. In particular, no conversion to UTF-8 or UTF-16 is attempted; that must be specified explicitly. (As it happens, conversion to a buf type based on 32-bit integers produces valid UTF-32 in the native endianness.)

Undefined types

These can behave as values or objects of any class, except that defined always returns false. One can create them with the built-in undef and fail functions. (See S04 for how failures are handled.)

    Object      Uninitialized (derivatives serve as protoobjects of classes)
    Whatever    Wildcard (like undef, but subject to do-what-I-mean via MMD)
    Failure     Failure (lazy exceptions, thrown if not handled properly)

Whenever you declare any kind of type, class, module, or package, you're automatically declaring a undefined prototype value with the same name.

Whenever a Failure value is put into a typed container, it takes on the type specified by the container but continues to carry the Failure role. (The undef function merely returns the most generic Failure object. Use fail to return more specific failures. Use Object for the most generic non-failure undefined value. The Any type is also undefined, but excludes Junctions so that autothreading may be dispatched using normal multiple dispatch rules.)

Immutable types

Objects with these types behave like values, i.e. $x === $y is true if and only if their types and contents are identical (that is, if $x.WHICH eqv $y.WHICH).

    Bit         Perl single bit (allows traits, aliasing, undef, etc.)
    Int         Perl integer (allows Inf/NaN, arbitrary precision, etc.)
    Str         Perl string (finite sequence of Unicode characters)
    Num         Perl number
    Complex     Perl complex number
    Bool        Perl boolean
    Exception   Perl exception
    Code        Base class for all executable objects
    Block       Executable objects that have lexical scopes
    List        Lazy Perl list (composed of immutables and iterators)
    Seq         Completely evaluated (hence immutable) sequence
    Range       A pair of Ordered endpoints; gens immutables when iterated
    Set         Unordered collection of values that allows no duplicates
    Bag         Unordered collection of values that allows duplicates
    Junction    Set with additional behaviors
    Pair        A single key-to-value association
    Mapping     Set of Pairs with no duplicate keys
    Signature   Function parameters (left-hand side of a binding)
    Capture     Function call arguments (right-hand side of a binding)
    Blob        An undifferentiated mass of bits

Mutable types

Objects with these types have distinct .WHICH values that do not change even if the object's contents change. (Routines are considered mutable because they can be wrapped in place.)

    Scalar      Perl scalar
    Array       Perl array
    Hash        Perl hash
    KeyHash     Perl hash that autodeletes values matching default
    KeySet      KeyHash of Bool (does Set in list/array context)
    KeyBag      KeyHash of UInt (does Bag in list/array context)
    Buf         Perl buffer (a stringish array of memory locations)
    IO          Perl filehandle
    Routine     Base class for all wrappable executable objects
    Sub         Perl subroutine
    Method      Perl method
    Submethod   Perl subroutine acting like a method
    Macro       Perl compile-time subroutine
    Regex       Perl pattern
    Match       Perl match, usually produced by applying a pattern
    Package     Perl 5 compatible namespace
    Module      Perl 6 standard namespace
    Class       Perl 6 standard class namespace
    Role        Perl 6 standard generic interface/implementation
    Grammar     Perl 6 pattern matching namespace
    Any         Perl 6 object (default parameter type, excludes Junction)
    Object      Perl 6 object (either Any or Junction)

A KeyHash differs from a normal Hash in how it handles default values. If the value of a KeyHash element is set to the default value for the KeyHash, the element is deleted. If undeclared, the default default for a KeyHash is 0 for numeric types, False for boolean types, and the null string for string and buffer types. A KeyHash of a Object type defaults to the undefined prototype for that type. More generally, the default default is whatever defined value an undef would convert to for that value type. A KeyHash of Scalar deletes elements that go to either 0 or the null string. A KeyHash also autodeletes keys for normal undef values (that is, those undefined values that do not contain an unthrown exception).

A KeySet is a KeyHash of booleans with a default of False. If you use the Hash interface and increment an element of a KeySet its value becomes true (creating the element if it doesn't exist already). If you decrement the element it becomes false and is automatically deleted. Decrementing a non-existing value results in a False value. Incrementing an existing value results in True. When not used as a Hash (that is, when used as an Array or list or Set object) a KeySet behaves as a Set of its keys. (Since the only possible value of a KeySet is the True value, it need not be represented in the actual implementation with any bits at all.)

A KeyBag is a KeyHash of UInt with default of 0. If you use the Hash interface and increment an element of a KeyBag its value is increased by one (creating the element if it doesn't exist already). If you decrement the element the value is decreased by one; if the value goes to 0 the element is automatically deleted. An attempt to decrement a non-existing value results in a Failure value. When not used as a Hash (that is, when used as an Array or list or Bag object) a KeyBag behaves as a Bag of its keys, with each key replicated the number of times specified by its corresponding value. (Use .kv or .pairs to suppress this behavior in list context.)

Value types

Explicit types are optional. Perl variables have two associated types: their "value type" and their "implementation type". (More generally, any container has an implementation type, including subroutines and modules.) The value type is stored as its of property, while the implementation type of the container is just the object type of the container itself. The word returns is allowed as an alias for of.

The value type specifies what kinds of values may be stored in the variable. A value type is given as a prefix or with the of keyword:

    my Dog $spot;
    my $spot of Dog;

In either case this sets the of property of the container to Dog.

Subroutines have a variant of the of property, as, that sets the as property instead. The as property specifies a constraint (or perhaps coercion) to be enforced on the return value (either by explicit call to return or by implicit fall-off-the-end return). This constraint, unlike the of property, is not advertised as the type of the routine. You can think of it as the implicit type signature of the (possibly implicit) return statement. It's therefore available for type inferencing within the routine but not outside it. If no as type is declared, it is assumed to be the same as the of type, if declared.

    sub get_pet() of Animal {...}       # of type, obviously
    sub get_pet() returns Animal {...}  # of type
    our Animal sub get_pet() {...}      # of type
    sub get_pet() as Animal {...}       # as type

A value type on an array or hash specifies the type stored by each element:

    my Dog @pound;  # each element of the array stores a Dog

    my Rat %ship;   # the value of each entry stores a Rat

The key type of a hash may be specified as a shape trait--see S09.

Implementation types

The implementation type specifies how the variable itself is implemented. It is given as a trait of the variable:

    my $spot is Scalar;             # this is the default
    my $spot is PersistentScalar;
    my $spot is DataBase;

Defining an implementation type is the Perl�6 equivalent to tying a variable in Perl�5. But Perl�6 variables are tied directly at declaration time, and for performance reasons may not be tied with a run-time tie statement unless the variable is explicitly declared with an implementation type that does the Tieable role.

However, package variables are always considered Tieable by default. As a consequence, all named packages are also Tieable by default. Classes and modules may be viewed as differently tied packages. Looking at it from the other direction, classes and modules that wish to be bound to a global package name must be able to do the Package role.

Hierarchical types

A non-scalar type may be qualified, in order to specify what type of value each of its elements stores:

    my Egg $cup;                       # the value is an Egg
    my Egg @carton;                    # each elem is an Egg
    my Array of Egg @box;              # each elem is an array of Eggs
    my Array of Array of Egg @crate;   # each elem is an array of arrays of Eggs
    my Hash of Array of Recipe %book;  # each value is a hash of arrays of Recipes

Each successive of makes the type on its right a parameter of the type on its left. Parametric types are named using square brackets, so:

    my Hash of Array of Recipe %book;

actually means:

    my Hash[of => Array[of => Recipe]] %book; 

Because the actual variable can be hard to find when complex types are specified, there is a postfix form as well:

    my Hash of Array of Recipe %book;           # HoHoAoRecipe
    my %book of Hash of Array of Recipe;        # same thing

The as form may be used in subroutines:

    my sub get_book ($key) as Hash of Array of Recipe {...}

Alternately, the return type may be specified within the signature:

    my sub get_book ($key --> Hash of Array of Recipe) {...}

There is a slight difference, insofar as the type inferencer will ignore a as but pay attention to --> or prefix type declarations, also known as the of type. Only the inside of the subroutine pays attention to as, and essentially coerces the return value to the indicated type, just as if you'd coerced each return expression.

You may also specify the of type as the of trait (with returns allowed as a synonym):

    my Hash of Array of Recipe sub get_book ($key) {...}
    my sub get_book ($key) of Hash of Array of Recipe {...}
    my sub get_book ($key) returns Hash of Array of Recipe {...}

Polymorphic types

Anywhere you can use a single type you can use a set of types, for convenience specifiable as if it were an "or" junction:

    my Int|Str $error = $val;              # can assign if $val~~Int or $val~~Str

Fancier type constraints may be expressed through a subtype:

    subset Shinola of Any where {.does(DessertWax) and .does(FloorTopping)};
    if $shimmer ~~ Shinola {...}  # $shimmer must do both interfaces

Since the terms in a parameter could be viewed as a set of constraints that are implicitly "anded" together (the variable itself supplies type constraints, and where clauses or tree matching just add more constraints), we relax this to allow juxtaposition of types to act like an "and" junction:

    # Anything assigned to the variable $mitsy must conform
    # to the type Fish and either the Cat or Dog type...
    my Cat|Dog Fish $mitsy = new Fish but { int rand 2 ?? .does Cat
                                                       !! .does Dog };

Parameter types

Parameters may be given types, just like any other variable:

    sub max (int @array is rw) {...}
    sub max (@array of int is rw) {...}

Generic types

Within a declaration, a class variable (either by itself or following an existing type name) declares a new type name and takes its parametric value from the actual type of the parameter it is associated with. It declares the new type name in the same scope as the associated declaration.

    sub max (Num ::X @array) {
        push @array, X.new();
    }

The new type name is introduced immediately, so two such types in the same signature must unify compatibly if they have the same name:

    sub compare (Any ::T $x, T $y) {
        return $x eqv $y;
    }

Return types

On a scoped subroutine, a return type can be specified before or after the name. We call all return types "return types", but distinguish two kinds of return types, the as type and the of type, because the of type is normally an "official" named type and declares the official interface to the routine, while the as type is merely a constraint on what may be returned by the routine from the routine's point of view.

    our sub lay as Egg {...}            # as type
    our Egg sub lay {...}               # of type
    our sub lay of Egg {...}            # of type
    our sub lay (--> Egg) {...}         # of type

    my sub hat as Rabbit {...}          # as type
    my Rabbit sub hat {...}             # of type
    my sub hat of Rabbit {...}          # of type
    my sub hat (--> Rabbit) {...}       # of type

If a subroutine is not explicitly scoped, it belongs to the current namespace (module, class, grammar, or package), as if it's scoped with the our scope modifier. Any return type must go after the name:

    sub lay as Egg {...}                # as type
    sub lay of Egg {...}                # of type
    sub lay (--> Egg) {...}             # of type

On an anonymous subroutine, any return type can only go after the sub keyword:

    $lay = sub as Egg {...};            # as type
    $lay = sub of Egg {...};            # of type
    $lay = sub (--> Egg) {...};         # of type

but you can use a scope modifier to introduce an of prefix type:

    $lay = my Egg sub {...};            # of type
    $hat = my Rabbit sub {...};         # of type

Because they are anonymous, you can change the my modifier to our without affecting the meaning.

The return type may also be specified after a --> token within the signature. This doesn't mean exactly the same thing as as. The of type is the "official" return type, and may therefore be used to do type inferencing outside the sub. The as type only makes the return type available to the internals of the sub so that the return statement can know its context, but outside the sub we don't know anything about the return value, as if no return type had been declared. The prefix form specifies the of type rather than the as type, so the return type of

    my Fish sub wanda ($x) { ... }

is known to return an object of type Fish, as if you'd said:

    my sub wanda ($x --> Fish) { ... }

not as if you'd said

    my sub wanda ($x) as Fish { ... }

It is possible for the of type to disagree with the as type:

    my Squid sub wanda ($x) as Fish { ... }

or equivalently,

    my sub wanda ($x --> Squid) as Fish { ... }

This is not lying to yourself--it's lying to the world. Having a different inner type is useful if you wish to hold your routine to a stricter standard than you let on to the outside world, for instance.

Names and Variables ^

Names ^

Literals ^

Context ^

Lists ^

Files ^

Properties ^

Grammatical Categories ^

Lexing in Perl�6 is controlled by a system of grammatical categories. At each point in the parse, the lexer knows which subset of the grammatical categories are possible at that point, and follows the longest-token rule across all the active grammatical categories. The grammatical categories that are active at any point are specified using a regex construct involving a set of magical hashes. For example, the matcher for the beginning of a statement might look like:

    <%statement_control
    | %scope_declarator
    | %prefix
    | %prefix_circumfix_meta_operator
    | %circumfix
    | %quote
    | %term
    >

(Ordering of grammatical categories within such a construct matters only in case of a "tie", in which case the grammatical category that is notionally "first" wins. For instance, given the example above, a statement_control is always going to win out over a prefix operator of the same name. And the reason you can't call a function named "if" directly as a list operator is because it would be hidden either by the statement_control category at the beginning of a statement or by the statement_modifier category elsewhere in the statement. Only the if(...) form unambiguously calls an "if" function, and even that works only because statement controls and statement modifiers require subsequent whitespace, as do list operators.)

Here are the current grammatical categories:

    category:<prefix>                           prefix:<+>
    circumfix:<[ ]>                             [ @x ]
    dotty:<.=>                                  $obj.=method
    infix_circumfix_meta_operator:{'»','«'}     @a »+« @b
    infix_postfix_meta_operator:<=>             $x += 2;
    infix_prefix_meta_operator:<!>              $x !~~ 2;
    infix:<+>                                   $x + $y
    package_declarator:<role>                   role Foo;
    postcircumfix:<[ ]>                         $x[$y] or $x.[$y]
    postfix_prefix_meta_operator:{'»'}          @array »++
    postfix:<++>                                $x++
    prefix_circumfix_meta_operator:{'[',']'}    [*]
    prefix_postfix_meta_operator:{'«'}          -« @magnitudes
    prefix:<!>                                  !$x (and $x.'!')
    q_backslash:<\\>                            '\\'
    qq_backslash:<n>                            "\n"
    quote_mod:<x>                               q:x/ ls /
    quote:<qq>                                  qq/foo/
    regex_assertion:<!>                         /<!before \h>/
    regex_backslash:<w>                         /\w/ and /\W/
    regex_metachar:<.>                          /.*/
    regex_mod_internal:<P5>                     m:/ ... :P5 ... /
    routine_declarator:<sub>                    sub foo {...}
    scope_declarator:<has>                      has $.x;
    sigil:<%>                                   %hash
    special_variable:<$!>                       $!
    statement_control:<if>                      if $condition { 1 } else { 2 }
    statement_mod_cond:<if>                     .say if $condition
    statement_mod_loop:<for>                    .say for 1..10
    statement_prefix:<gather>                   gather for @foo { .take }
    term:<!!!>                                  $x = { !!! }
    trait_auxiliary:<does>                      my $x does Freezable
    trait_verb:<handles>                        has $.tail handles <wag>
    twigil:<?>                                  $?LINE
    type_declarator:<subset>                    subset Nybble of Int where ^16
    version:<v>                                 v4.3.*

Any category containing "circumfix" requires two token arguments, supplied in slice notation. Note that many of these names do not represent real operators, and you wouldn't be able to call them even though you can name them.

syntax highlighting: