= Laziness
A draft on the runtime view of how Lazy things (like Arrays) work.
# by fglock
The text currently has 2 main sections:
- a review of the Perl 6 Bible, with some interpretation and guessing
- implementation notes.
Each section has 3 subsections:
- Native data - about the internal representation of data types
- Values - about the basic Perl 6 "OO" things
- Containers - about the glue that hold things together in data structures
== TODO
- include drawings
- comparing with Ruby/Python should help explaining some concepts
- this text is supposed to talk about all these things:
Scalar / Array / Hash Container
Tuple / List / Generator / Iterator / Range
file iterator
Signature / Argument List
Array / Hash slice
enum
tie / proxy / rw / env
Constrained Types (eg: Int / Even / 0..100 where Even)
Piddle
lazy string / int / num
= Part I - Perl 6 Bible Readings
References:
- S02 Bits and Pieces - for "native data"
- S09 Data Structures - for Object types
- A bit of each of the other AES
= Native data
Native data is built with a thin, (possibly non-OO) library over the virtual
machine data types.
For example, you may need a wrapper to implement 1-bit values, if the
virtual machine only provides bytes.
- native "singular" value
Native values are: num, int, uint, bit, complex, ref.
Native values can have a bit-size specification: int1, int16.
A "native container" is whatever a "ref" value points to.
A "num" value must support IEEE floating point conventions - which means there
are native "Inf" and "NaN".
Native values don't need to be able to represent "undef".
- native "plural" value
Native plural values are: array, str, hash.
A "str" is an array of "int".
There may be separate implementations for fixed-size and for
auto-extending allocation.
These are fast, in-memory structures provided by the virtual machine.
These structures are assumed to be one-dimensional.
Native hash is mostly used internally, for holding "namespaces": Classes, variables, labels.
There may be separate implementations for string-indexed and for object-indexed hashes.
= Value Class
Values can be "singular" or "plural".
Singular Values are: Str, Int, Num, Pair, Ref, Junction,
Code, Block, Class, Grammar, Macro ...
Plural Values are things that look like single-dimensional arrays:
List, Generator, Iterator.
-- List, Lazy List, Tuple classes
List is implemented as a native Array, or as a contiguous Stack area if this
is supported by the virtual machine.
List is set of things.
List is a subclass of "Value".
List is one-dimensional.
A List may be built with a mix of Values and Containers.
A List might exist in 4 "states":
@a = ( 7, 20..30 );
- this is the Array that will be used in the examples
@a = (
( (Cell = 7), ),
( Generator 20..30)
);
- this is more-or-less what it looks like internally
- unflattened List
( 1, 5..10, @a )
- a List built from a Scalar, a Range, and an Array
- lazily flattened with "*" operator
( 1, 5..10, (Scalar := @a[0]), (Generator (Scalar := @a[1])..(Scalar := @a[10])) )
- the Array was "flattened", but we still have generators.
- non-lazily flattened (eager) with "**" operator
( 1, 5, 6, 7, 8, 9, 10, (Scalar := @a[0]), (Scalar := @a[1]), (Scalar := @a[2]), ..... )
- everything is flattened
- fully evaluated (Tuple)
( 1, 5, 6, 7, 8, 9, 10, @a[0].value, @a[1].value, @a[2].value, ..... )
- fully evaluated and immutable
List only implements a subset of the Array API.
A List slice is a new List - elements are shallow copied.
List don't support adding, removing or substituting elements.
List has no Cells - thus you can't bind a Container to a List element or slice. ???
$a := (1,2,3)[2]; # compile error - or is this creating a Constant ???
If a List is built with Generators, it is a Lazy List.
-- Generator / Range Classes
Generator is an object that creates Values on demand.
Generator is "functional" rather than "physical memory"
Range is a special case, in which there is a start, an end, and a step.
#At runtime, you only access Generator elements through Containers.
Generator also implements "match":
5 ~~ 1..10
The "*" operator flattens a Generator eagerly.
TODO - detail the behaviour of "Str Range"
-- Iterator Class
TODO
=<>
The "*" operator flattens an Iterator eagerly.
-- List Slice
A List Slice is a shallow copy of a part of a List.
A List Slice is just a List - there is not a separate Class for it.
= Container Class
Containers are the glue that hold Values together in data structures.
The container classes are: Scalar, Array, Hash.
Array, like List, is Lazy by default.
Hash and Scalar can be lazy too, but they are not Lazy by default.
== Scalar Container Class
Scalar is a "singular Value" Container.
One of the reasons to use a container is that you can easily change
it's semantics by replacing what's inside it - that is, a Container implements
a "variable":
value '1' - always means 'one'
container ('1') - it means 'one' too, but you can increment it to mean 'two'
== Array Container Class
The Array class defines all objects that implement the Array API.
Array is a subclass of "Container"
Array contains an ordered set of Cells.
Array is multi-dimensional.
Array is not meant to be subclassed - use "roles" to make different implementations.
The actual implementation depends on the "Eager" or the "Lazy" roles.
Other implementations can be specified.
--- Eager Role
An Array object that wraps a native array.
--- Lazy Role
An Array object that generates Cells on demand.
This is the default implementation.
--- Shape (Role or parameter? trait?)
--- Compact Array Role (piddle)
TODO - piddle is different from Compact
-- Container Slice
A Container Slice is a specialized Array/Hash, in which all the elements
are bound to objects inside other Containers (Array, Hash or Scalar).
A Lazy Container Slice contains pointers to Generators that are inside other
Containers.
Container Slice is multi-dimensional.
== Hash
The Hash class defines all objects that implement the hash API.
Hash is like an Array, in which the Cell indexes can be of any type
(not just numbers).
Hash is multi-dimensional.
= Part II - Internals
== Generator Internals
- TODO
== Generator API
- match
This operator is used in a match statement:
5 ~~ 1..10
- shift / pop
TODO - not sure if "Generator" is an iterator (read-only) or uses destructive
read (shift/pop)
- sum / reverse / map / zip
- flatten
Takes all Values from the Generator, and returns an Array (possibly Native).
- elems
The number of elements
=== This part of the Generator API is only used by the runtime internals.
- head(n)
This operator is used by the Array splice operator.
Takes "n" Values from the Generator, like "shift" does.
The result can be a Generator (when n is big), an Array (possibly Native),
a single Value (when n == 1), or Void (when n <= 0).
TODO - not sure if "Generator" is an iterator (read-only) or uses destructive
read (head/tail)
- tail(n)
This operator is used by the Array splice operator.
Takes "n" Values from the back of the Generator, like "pop" does.
The result can be a Generator (when n is big), an Array (possibly Native),
a single Value (when n == 1), or Void (when n <= 0).
TODO - check if "Generator" is an iterator (read-only) or uses destructive
read (head/tail)
- is_contiguous
This is used for the optimization of Slice, when the sequence is used
as a slice index.
is_contiguous() is True is the generated sequence is like "2 .. 10".
= Array internals
== Array (base Class)
TODO
== Eager (Role)
An Eager Array contains a Native Array.
TODO - explain "Cell"
Contents: 1 2 3
Internals: [ 1, 2, 3 ]
== Lazy (Role)
A Lazy Array contains a "specs" Array, which contains Generators, or Native Arrays.
Contents: 1 2 3
Internals: [
[ 1, 2, 3 ]
]
Contents: 10 20 30 100..10000 50 400..900
Internals: [
[ 10, 20, 30 ],
Generator( 100, 10000 ),
[ 50 ],
Generator( 400, 900 )
]
= Array API
- TODO
Array can do "zip", "map", "grep", splice, push, pop, store, fetch.
Array can be "bound" and "tied".
== map()
map() applies a 'Code' to an Array state:
1: @a = 1..100; # 1, 2, 3, ...
2: @b = @a.map:{ $_ * 2 }; # 2, 4, 6, ...
3:
4: @a[2] = 0; # 1, 0, 3, ... - original has changed
5: say @b[1..3]; # 2, 4, 6, ... - result didn't change
Line 1:
Container: @a
Internals: [
Generator( 1, 100 ),
]
Line 2:
Container: @_TEMP_
Internals: [
Generator( 1, 100 ),
]
Container: @a
Internals: [
RefArray( @_TEMP_ ),
]
Container: @b
Internals: [
MapArray( @_TEMP_, { $_ * 2 } ),
]
Line 4:
Container: @_TEMP_
Internals: [
[ 1, 2 ]
Generator( 3, 100 ),
]
Container: @a
Internals: [
[ 1, 0 ],
SpliceArray( @_TEMP_, 2 ),
]
Container: @b
Internals: [
MapArray( @_TEMP_, { $_ * 2 } ),
]
== splice()
This pseudo-code implements splice().
It assumes that the internal representation of Array can do .head(n) and .tail(n) -
see "Generator" for details.
sub splice ( @array, $offset? = 0, $length? = Inf, @list? = () ) {
my ( @head, @body, @tail );
if ( $offset >= 0 ) {
@head = @array.head( $offset );
if ( $length >= 0 ) {
@body = @array.head( $length );
@tail = @array;
}
else {
@tail = @array.tail( -$length );
@body = @array;
}
}
else {
@tail = @array.tail( -$offset );
@head = @array;
if ( $length >= 0 ) {
@body = $tail.head( $length );
}
else {
@body = @tail;
@tail = $body.tail( -$length );
}
};
@array = ( @head, @list, @tail );
return @body;
}
= Hash internals
A Hash contains a Native Hash.
Hash keys can be objects, but the objects must be immutable
Contents: a:1 b:2 c:3
Internals: { a:1 b:2 c:3 }
A Hash may contain Lazy items.
This structure contains a Native Array, which contains Generators or Native Hashes.
This structure permits near O(1) complexity for most operations.
Contents: a:1 b:2 c:3
Internals: [
{ a:1, b:2, c:3 }
]
Contents: a:10 b:20 c:30 (1:100 .. 10000:10100) d:50
Internals: [
{ a:10 b:20 c:30 },
Generator( 1:100, 10000:10100 ),
{ d:50 },
]
= Hash API
- TODO
= Slice API
- TODO
= Slice Internals
- TODO - hash slice / array slice
__END__