The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
# Copyright (C) 2001-2010, Parrot Foundation.

=head1 [DRAFT] PDD 6: Parrot Assembly Language (PASM)

=head2 Abstract

The format of Parrot's bytecode assembly language.

=head2 Description

Parrot's bytecode can be thought of as a form of machine language for a
virtual super CISC machine. It makes sense, then, to define an assembly
language for it for those people who may need to generate bytecode directly,
rather than indirectly through a high-level language.

{{ NOTE: out-of-date and incomplete. }}

=head2 Questions

=over 4

=item *
    <barney>    Can we get rid of PASM ?
    <spinclad>  conversely, does PASM need to be kept up to date?
    <allison>   PASM is just a text form of PBC, so it should be kept
    <allison>   are there specific PBC features that can't currently be
                represented in PASM?
    <particle>  besides hll and :outer?
    <chromatic> :init
    <mdiep>     lexicals?
    <chromatic> :vtable
    <mdiep>     I'm a bit rusty, but anything that starts with a '.' or ':'
                 is suspect
    <allison>   things that start with '.' are just directives to IMCC,
                equally applicable to PASM and PIR
    <mdiep>     isn't PASM separate from IMCC?
    <allison>   mdiep: it used to be separate
    <mdiep>     so to say that PASM can have directives is a major
                architectural change
    <allison>   perhaps the biggest thing we need is a definition of what PASM
                actually is
    <allison>   the line has grown quite fuzzy over the years
    <barney>    PASM could be defined as stringified PBC
    <particle>  compilable stringified pbc
    <mdiep>     it should be defined that way if we're going to call it
                assembly.
    <allison>   barney: that's the most likely direction, and if so, it has
                some implications for how PASM behaves
    <particle>  allison: which is what we want, anyway, right?
    <allison>   particle: yup
    <barney>    yes
    <particle>  good, looks like we're in agreement and headed in the proper
                direction on that topic.

=back

=head2 Implementation

Parrot opcodes take the format of:

  code destination[dest_key], source1[source1_key], source2[source2_key]

The brackets do not denote optional arguments as such--they are real brackets.
They may be left out entirely, however. If any argument has a key the
assembler will substitute the null key for arguments missing keys.

Conditional branches take the format:

  code boolean[bool_key], true_dest

The key parameters are optional, and may be either an integer or a string. If
either is passed they are associated with the parameter to their left, and are
assumed to be either an array/list entry number, or a hash key. Any time a
source or destination can be a PMC register, there may be a key.

Destinations for conditional branches are an integer offset from the current
PC.

All registers have a type prefix of P, S, I, or N, for PMC, string, integer,
and number respectively. While parrot bytecode does not have a fixed limit
on the number of registers, PASM has an implementation limit on the number of
addressable registers of each type, currently set at 100 (0-99).

=head2 Assembly Syntax

All assembly opcodes contain only ASCII lowercase letters, digits, and the
underscore.

Assembler directives are prefixed with a dot. These directives are
instructions for the assembler and may or may not translate to a PASM
instruction.

Labels all end with a colon. They may have ASCII letters, numbers, and
underscores in them.

Namespaces are noted with the C<.namespace> directive. It takes a
single parameter, the name of the namespace, in the form of a
multi-dimensional key.

Constants can be declared with the C<.macro_const> directive. It takes two
parameters: the name of the constant and the value.

Subroutine names are noted with the C<.pcc_sub> directive. It takes a
single parameter, the name of the subroutine, which is added to the
namespace's symbol table. Sub names may be any valid Unicode
alphanumeric character and the underscore. The C<.pcc_sub> directive
may take flags to indicate when the sub should be invoked. The following
flags are available: C<:main> to indicate that execution should start
at the specified subroutine; C<:immediate> or C<:postcomp> to indicate
that the sub should be run immediately after compilation; C<:load> to
indicate that the sub should be executed when its bytecode segment is
loaded; C<:init> to indicate the sub should be run when the file is run
directly.

Constants don't need to be named and put in a separate section of the assembly
source. The assembler will take care of putting them in the appropriate part
of the generated bytecode.

Below is an overview of the grammar of a PASM file.

 pasm_file:
   [ pasm_line '\n' ]*

 pasm_line:
     pasm_instruction
   | constant_directive
   | namespace_directive

 pasm_instruction:
   [ [ sub_directive ]? label ]? instruction

 sub_directive:
   ".pcc_sub" [ sub_flag ]?

 sub_flag:
   ":init" | ":main" | ":load" | ":postcomp" | ":immediate" | ":anon"

 label:
   identifier ":"

 constant_directive:
   ".macro_const" identifier literal

 namespace_directive:
   ".namespace" "[" multi_dimensional_key "]"

 multi_dimensional_key:
   quoted_string [ ";" quoted_string ]*



=head2 Opcode List

There may be multiple (but unlisted) versions of an
opcode. If an opcode takes a register that might be keyed, the keyed version
of the opcode has a _k suffix. If an opcode might take multiple types of
registers for a single parameter, the opcode function really has a _x suffix,
where x is either P, S, I, or N, depending on whether a PMC, string, integer,
or numeric register is involved. The suffix isn't necessary (though not an
error) as the assembler can intuit the information from the code.

In those cases where an opcode can take several types of registers, and more
than one of the sources or destinations are of variable type, then the
register is passed in extended format. An extended format register number is
of the form:

     register_number | register_type

where register_type is 0x100, 0x200, 0x400, or 0x800 for PMC, string, integer,
or number respectively. So N19 would be 0x413.

B<Note>: Instructions tagged with a * will call a vtable function to handle
the instruction if used on PMC registers.

In all cases, the letters x, y, and z refer to register numbers. The letter t
refers to a generic register (P, S, I, or N). A lowercase p, s, i, or n means
either a register or constant of the appropriate type (PMC, string, integer,
or number).

=head2 References

None.

=cut

__END__
Local Variables:
  fill-column:78
End:
vim: expandtab shiftwidth=4: