Ilya Zakharevich >
Numeric-LL_Array-0.1504 >
Numeric::LL_Array

Module Version: 0.1504
Numeric::LL_Array - Perl extension for low level operations over numeric arrays.

use Numeric::LL_Array; blah blah blah

One of the principal overheads of using Perl for numeric vector calculations is the need to constantly create and destroy a large amount of Perl values (there is no such overhead in calculations over scalars, since Perl knows how to reuse implicit temporary variables).

Thus, in this package, we provide a way to manipulate vectors "in place" without creation of new Perl values. Additionally, the calculations over slots in a vector are performed in C, thus significantly reducing overhead of Perl over C (which, for *literal* translation from C to Perl, is of order 80..200 times).

One of the design goals is that many vectors should be able to use the same C array (e.g., of type `double[]`

) - possibly with offset and a stride, so performing an operation over a *subarray* does not require a new allocation. The C array is stored in PVX of a Perl variable, so it may be (eventually) refcounted in the usual Perlish way.

A *playground* is just (a region in) a Perl string, with the buffer propertly alighed to keep a vector of a certain C type. This way, one gets, e.g., `unsigned char`

-playground, or `long double`

-playground, etc.; we call this C type the *flavor* of the playground. One describes a certain position in the playground in units of the size of the corresponding C type; e.g., position 3 in `double`

array would typically be at offset of 24 bytes from the start of playground.

A multi-dimensional array of size `SIZE1 x SIZE2 x ... x SIZEn`

may be placed into a playground in many different ways; we allow modification of the *start position* (the position of the element with index (0,0,...,0)), and of *strides*. There is one stride per dimension of array; the stride describes how the position of the element changes when one increments the index by 1 in the particular coordinate in the array.

For example, the array `0..4`

with start 3 and stride 2 occupies the following positions in the playground:

content * * * 0 * 1 * 2 * 3 * 4 ... position 0 1 2 3 4 5 6 7 8 9 .......

Note that when plotting this picture one does not care about the flavor of the playground.

Likewise, the 2-dimensional array with contents

11 12 13 14 21 22 23 24

and start 1, horizontal stride 2, and vertical stride 3 takes these positions;

* 11 * 12 21 13 22 14 23 * 24 ....

and with start 12, horizontal stride -1, and vertical stride -5 takes

* * * * 24 23 22 21 * 14 13 12 11 ... 0 1 2 3 4 5 6 7 8 9 10 11 12 ...

The major advantage of this flexibility is that one can work with subarrays of the given array, with the transposed array, and with the reflected array without reshuffling *the content of the array*, but with only changes in the start position and in the strides. For example, to access the transposed 2-dimensional array, one just accesses the same data with 2 strides interchanged. Likewise, by decreasing *dimension*, one can assess columns and row of the matrix without moving the contents.

Other examples: one can fit `N x M x K`

0-array into playground of size 1 using strides 0,0,0. And one can fit `N x N`

identity matrix into a playground of size `2N-1`

by using strides 1,-1.

For other examples, see "Using extra, "fake" dimensions".

To completely specify an array to the API of this module one needs to provide the C type, the playground of the array, the start offset, its *dimension* (e.g., 1 for vectors, 2 for matrices etc), and its *format*, which is the list of the sizes of the array in each particular direction, and the corresponding strides, in the order `STRIDE1 COUNT1 STRIDE2 COUNT2 ...`

. To avoid confusion, it makes sense to use the name *arity* for the number *dimension*, and use word *dimensions* for the list `COUNT1 COUNT2 ... COUNTn`

(here `n`

is the "arity").

The format may be encoded as a list reference, or as a packed list of C values of type `ptrdiff_t`

. It is *not* required that the list contains exaclty `2*dimension`

elements; it may contain more, and the rest is ignored. This way, one can access rows of matrices without copying the content of the matrix, *and* without touching the format of the matrix - only by decreasing dimension and modifying the starting position.

Note that the flavor of the array is not a part of the format. For this low-level API, each function is designed to work only with a certain flavor an array (e.g., if operating on 3 arrays, it assumes 3 particular flavors, one for each of 3 arguments); so given a particular function, the required flavor of the argument is fully described by its position in the argument list. (There is no error-checking in this regard, it is the caller's responsibility to follow flavors correctly.) So the flavors are, essentially, encoded in the name of the function.

For example, a function for high-level semantic `add_assign($target, $source)`

(implementing operation `$target += $source`

), can take arguments

source_playground, target_playground, dim, start_source, start_target, format_source, format_target

E.g., for $target of type `double`

, and source of type `unsigned short`

, the name of the function may encode letters `"d"`

and `"S"`

(in analogy to Perl `pack`

formats). (The actual name used is `S2d1_add_assign`

, see "Naming conventions for handlers".)

The semantic of this module is that it is the *target* dimensions which are assumed to be used; so from the source format only the strides is used, and the `COUNTn`

positions are ignored. Likewise, the arity dim is assumed to be common for arrays, and both formats should contain at least `2*dimension`

elements.

Accessor handlers take the following arguments (with defaults indicated):

@a = access_T($p, $offset = 0, $dim = 0, $format = undef, $in = undef, $keep = FALSE);

extracts a slice of playground $p into Perlish data structure (using references to arrays of references etc. as deep as specified by $dim).

The playground $p, and offset/dim/format arguments have the same sense as for other handlers. If $in is not defined, the "external" layer of the extracted data is put into elements of array @a (so if $dim is 1, elements of @a are numbers; if it is 2, elements are references to arrays of numbers, etc); if $in is TRUE, the returned value is a scalar containing an array reference (so if dim is 1, $a[0] contains a reference to an array of numbers, etc).

If $in is an array reference, then instead of putting the "external" layer of Perlish array into @a, it is put into the referenced array. The fortune of existing elements of the referenced array is governed by $keep; if FALSE, the existing content is removed; if TRUE; the returned data is appended after the end of existing data.

There are 4 types of handlers: *accessors*, and *modifying handlers* (with 0,1,2 sources). For modifying handlers, the argument which is write-only or read-write is called `target`

, and the read-only arguments are *sources*.

Perl functions for conversion from `Numeric::LL_Array`

arrays to Perl arrays are called `access_T`

; here `T`

is the letter of pack() specifier corresponding to *native* C type (e.g., to access native C `signed long`

one uses pack() specifier `"l!"`

; since `!`

means *native*, we drop it, and use `access_l`

).

Perl functions which modify one array and take no source are named <T0_type>; here `T`

is a letter encoding the flavor, and `type`

is the identifier describing the semantic of the function (the corresponding C or Perl operator is put below, when it is different):

negate flip_sign bit_complement incr decr 0 1 2 m1 ! - ~ ++ -- 0 1 2 -1 abs cos sin tan acos asin atan exp log log10 sqrt cbrt ceil floor trunc rint

(If C operation takes an argument, we do `target = OP(target)`

, otherwise `target = OP`

.) For example, to increment `signed char`

array, one uses the function named `c0_incr`

; to assign -1 to all elements of long double array, use `D0_m1`

. (The operations in the second row (except abs) are implemented only for floating point types.)

Perl functions which use one array ("source") to modify another ("target") are named <S2T1_type>; here `T`

is a letter encoding the target flavor, and `S`

encodes the source flavor. `type`

is the identifier describing the semantic of the function:

cos sin tan acos asin atan exp

(only for floating point types, and only with target type equal to source type), or `assign`

for assignment (possibly with type conversion), or

plus_assign minus_assign mult_assign div_assign remainder_assign += -= *= /= %= lshift_assign rshift_assign pow_assign bitand_assign bit(x)or_assign <<= >>= **= &= |= ^= min_assing max_assign negate flip_sign bit_complement ne0 $t = min($t,$s) $t = max($t,$s) abs ceil floor trunc rint log log10 sqrt cbrt

(The `ceil floor trunc rint`

are supported only for floating-point source types; the last two only if your C compiler defines them [most do].) `ne0`

checks for a value to be non-0. Semantic of function names `abs ceil floor trunc rint log log10 sqrt cbrt frexp modf`

coincides with one of the corresponding C functions.

For example, to convert `unsigned long`

array to a `long double`

array, one uses the function named `L2D1_assign`

.

Likewise, Perl functions which use two arrays ("source1" and "source2") to modify another ("target") are named <sS2T2_type>; here `T`

is a letter encoding the target flavor, `s`

and `S`

encode the source1 and source2 flavors correspondingly. `type`

is the identifier describing the semantic of the function:

plus minus mult div remainder pow lt gt le ge eq ne lshift rshift bitand bitor bitxor min max sproduct

Operations `lt gt le ge eq ne`

work as "in mathematics", not "as in C": a negative number is less than a non-negative, even if one type is signed, another unsigned. (However, currently a comparison of `long`

and `double`

goes through automatic C conversion `long -> double`

. On some machines long has more bits than than mantissa of a double, so this includes rounding a `long`

to the nearest fitting `double`

.)

`sproduct`

performs the operation `target += source1 * source2`

.

The target type must coincide with one of the source types (but see for exceptions in "Extra targets for 2-arg handlers").

If source1 or target are floating point, `lshift`

and `rshift`

operations accept negative arguments; additionally, no truncation is performed (including negative `lshift`

and positive `rshift`

). Likewise for `lshift_assign`

, `rshift_assign`

.

For example, to add `signed short`

array and `unsigned int`

and write the result to a `unsigned long`

array, one uses the function named `sI2L2_add`

. (Read this as *signed short and unsigned integer TO unsigned long: add[ition] [with 2 sources]*.)

In short: the number before underscore is the number of "source" arrays, and the flavors of source arrays preceed the (first) number `2`

in the name.

**Exceptional handlers**: 2 additional handlers are provided (for `frexp modf`

); they take 1 source and 2 targets. They should be called exactly as handlers with 2 sources and 1 target, only the second target takes place of the second source.

When an operation is performed, the cycle is run over the elements of the array; the innermost loop is w.r.t. the first index (i.e., the first of strides and the first of limits), and outermost is w.r.t. the last index.

The start element of the array is processed first, then the order of elements processed is governed by the strides. For example, suppose that $arr with 1 index consists of 0s; let $arr_f be format of $arr but with 1 less element, $off is the offset of the next element of $arr (=stride), and $ones consists of 1s (with format $ones_f), then

sS2s2_add($arr, $ones, $arr, 0, 0, $offset, $dim, $arr_f, $ones_f, $arr_f);

will initialize $arr so that `n`

th element is equal to `n`

. Indeed, the target is $arr with offset 1; so the first addition will assign 0th element + 1 to the 1st element of $arr; after this the first 2 elements of $arr are "correctly" initialized to 0 and 1. The second addition will assign 1st element + 1 to the 2nd element, so it would become 2, etc.

For the comparison handlers `lt gt le ge eq ne`

, in addition to types of the sources, the target can be of any integer type.

For multiplication handlers `mult`

and `sproduct`

, in addition to types of the sources, the target can be of any wider type. In more details: if one of arguments if floating point, `wider`

means the storage size in bytes is larger than of any of sources. If both arguments are of integer types, *wider* means one of the following: either in the sense of storage size, or a floating point type, or an unsigned size of the same storage size as the largest of the sources.

For `lshift`

with integer type sources, wider unsigned integer type targets are allowed.

Usually the generated C code for handlers contains no C type casts. There are exceptions.

For `lshift`

with integer type sources, and wider integer target, the sources are first cast to the target type.

For `mult`

and `sproduct`

, when a wider type is allowed, the sources are first cast to the target type (with an extra exception of integer type sources and a floating point target which is not of larger bitwidth than sources; then, instead of the target type, the sources are first cast to the *next wider* integer type - provided such a type exists).

None by default.

All handlers are exportable. So are constants for Perl pack() "type letter" for each flavor of the array:

packId_format packId_star_format packId_T packId_star_T

here `T`

is a flavor specifier for a type used by this module.

The functions on the left return the letter, on the right return a letter followed by `*`

. For example, packId_L() would return `L!`

on newer Perls, and a suitable substitute on the older ones. The functions on top return the pack() letter(s) for pack()ing the offsets and strides in the array.

To simplify developing code which works with different numeric types, one can create *aliases* for types by putting `:T=t`

into import list. After this word, the imported symbols may have `T`

put instead of `t`

. E.g.,

use Numeric::LL_Array qw( :X=d access_X XX2X2_add packId_X ); ... XX2X2_add(...);

(After this, use only the type `X`

to access doubles. If you want to replace `double`

s by `float`

s, all you need to do is to change one letter: make alias into `:X=f`

.)

Additionally, functions `packId($t)`

, `packId_star($t)`

are available; they take type letter (or `'format'`

) as a parameter.

C functions implementing the handlers of this module are collected into four *dictionaries*: for accessors, and for 0-, 1- and 2-sources modifiers. Each dictionary is compiled from the C code in one minuscule C file, *code_*.h*; this file is loaded multiple times, once per handler, with a handful of macros describing the operation to perform on each array element, and C types of array elements.

Each dictionary corresponds to one *constant wrapper* C file, *driver_*.c*; this wrapper contains no C code, only a few preprocessor directives to include the necessary headers, and the *autogenerated loader*, *driver_*.h*. The loaders are generated by a small Perl script, *write_driver.pl*; for each handler, a loader defines necessary macros, and includes the corresponding file *code_*.h*.

The memory overhead, and the initial slowdown to define thousands of XSUBs wrapping the C handlers would be quite measurable. To avoid this, at start no handler XSUBs are defined. What *is* defined is four *XSUB interfaces*; they are "closure XSUBs" (or "interfaces"): to make them into a real, callable, XSUB, one needs to attach to the interface the corresponding dictionary entry. So one extra XSUB is defined which does such an attachment.

So one can define a handler XSUB for by either calling a convenient Perl routine create_handler() (which wrapps low-level attacher XSUB), or by just import()ing the handler (the import()er would call create_handler() as needed) as in

use Numeric::LL_Array qw( access_i c2S1_assign );

So the build architecture consists of:

an import()er which creates handler XSUBs on the fly; an attacher which converts interface-XSUBs into handler XSUBs; interface-XSUBs which call entries in the dictionary; dictionaries created by F<write_driver.pl> from code in F<code_*.h>.

So the total complexity of this module is about 200 lines of C code, 450 lines of Perl, and 400 lines of XSUB (including some code for testing and developing).

This module deals with large memory buffers. In Perl, passing arguments to subroutines does not involve copying the memory buffers of string arguments. However, the common construction

my $arg1 = shift;

*does* involve copying of memory buffers.

So there are two ways to create Perl subroutines which do not do extraneous copying: first (ugly) way: access arguments as `$_[N]`

. Second way: change the signature of Perl subroutines: they should take playgrounds as references, and dereference them only when passing to handlers.

sub foo ($) { my $pg = shift; ... d0_sin $$pg, ... ... foo \$playground;

Many "typical" operations over arrays can be simplified a lot by introducing new "fake" dimensions to the array. By the same reason why the size of a vector with the stride `0`

does not depend on its dimension, introducing new dimensions with stride `0`

does not chenge the memory region occupied by an array. Hence with such dimensions added, still the same data is accessed. The benefit is that many operations may be reduced to just `sproduct`

using this approach.

Consider multiplication of a matrix `R`

by a vector `v`

. If we want to put the result into array `w`

with initial contents `0`

, one could do

w[k] += R[k,l] * v[l] for k in 0..K-1, l in 0..L-1;

However, if one could add an extra "fake" second dimension to `w`

, and an extra "fake" first dimension to `v`

, this becomes

W[k,l] += R[k,l] * V[k,l] for k in 0..K-1, l in 0..L-1;

(this assumes that W[k,l] accesses the same data as w[k], and V[k,l] accesses the same data as v[l]). Now it *is* an operation of `sproduct`

with target `W`

and sources `R`

and `V`

.

If `w`

is accessed with stride `Sw`

, and dimension `K`

, then `W`

should be accessed with strides `Sw, 0`

, and dimensions `K,L`

. Likewise, if `v`

is accessed with stride `Sv`

, and dimension `L`

, then `V`

should be accessed with strides `0, Sv`

, and dimensions `K,L`

.

Similarly, of one wants to multiply matrices `P, Q`

of dimensions `K,L`

, and `L, M`

into a resulting matrix `R`

of dimensions `K,M`

(initially 0), one should add extra "fake" dimensions to make them all of arity 3 and dimensions `K,L,M`

, and do `sproduct`

. The new strides are `sP1,sP2,0`

for `P`

, `0,sQ1,sQ2`

for `Q`

, and `sR1,0,sR2`

for `R`

(here we assume that omitting 0s gives old strides for `P,Q,R`

).

Consider a problem of convolving two arrays `A`

, `B`

of sizes `a`

and `b`

with `a > b`

; the result is an array of size `a+b-1`

which has `RES[k] = SUM A[k-t] B[t]`

(here k changes between `b-1`

and `a-1`

, and summation runs over t between 0 and `b-1`

). To conform to API of this module, change indices of the result to `0..a-b`

.

Then one can calculate this by

Assign 0 to RES Do sproduct() with target RES', and sources A', B'

Here `RES', A', B'`

are arrays `RES, A, B`

with an added "fake" dimension. Assume that strides of `A, B, RES`

are 1; then the strides of `A'`

are `1,-1`

, of `B'`

are `0,1`

, and of `RES'`

are `1,0`

, and the start position of A' is shifted to correspond to `A[b-1]`

.

One can immediately modify this to handle multi-dimensional convolution. Moreover, sometimes one can modify this to handle some sparse array configurations. For example, suppose that `B`

is the Laplace kernel

0 1 0 1 -4 1 0 1 0

and `A`

is an array of size `l x m`

. Then the result is of size `l-2 x m-2`

, and can be calculated as:

Make an array mFOUR of size 1 with content -4. Make an array mFOUR' of size l x m with the same buffer, and strides 0,0 Make an array ONE of size 1 with content 1. Make an array ONE' of size l x m x 2 x 2 with the same buffer, and strides 0,0,0,0 RES = A * mFOUR (pointwise multiplication) Do sproduct() with target RES', and sources A', ONE'

Here `RES'`

has strides `1,l,0,0`

(we assume that `A`

has strides `1,l`

), size `l-2 x m-2 x 2 x 2`

, while `A'`

has strides `1,l,-l+1,l+1`

, and starts at position of index [l,0] of `A`

.

Essentially, we treat the middle `4`

of `B`

separately, and note that the remaining `1`

s form a square. (In fact, a parallelogram instead of a square would be fine too.) When "positioned" on top of `A`

, this square has one side going up-right (which gives a stride `-l+1`

), another down-right (which gives a stride `l+1`

) - these are strides of the "extra" two dimensions of `A'`

.

Note that to calculate contribution of `1`

s into convolution, one needs to do `4(a-2)(b-2)`

multiplications, and this is exactly the total size of `A'`

. So the calculation above makes no unneeded multiplications (e.g., `0`

s in corners of `B`

are completely ignored). (Of course, by breaking the steps into `a-2 x b-2 x 2 x 2`

, we do some extra store/retrieve operations - comparing to doing straightforward convolution. But these operations are done in C, where they are much cheaper than in Perl.)

NEED: min/max; with signed/vs/unsigned solved ??? min_assign??? argmin() ? NEED: How to find first elt which breaks conditions (as in a[n+1] == a[n]+1??? NEED: more intelligent choice of accessors for q/Q and D (newsvpv?)... NEED: accessor to long double max-aligned (to 16 when size is between 8 and 16) NEED: long vs double comparison? char-vs-quad comparison? cmp? NEED: pseudo-flavor: k-th coordinate of the index (or a linear combination?) NEED: All flavors of FFT NEED: Indirect access (use value of one array as index in another) NEED: use long double C functions if argument is long long (???) MAYBE: finish uquad2double()ization of handlers (for broken M$ compilers)

BSD misses many `long double`

APIs (elementary functions, trunc(), rint() shifts, and `**`

). So when deciding whether one wants to do operations over `long double`

s, one should either check for presence of the needed functions (by doing `eval "use Numeric::LL_Array 'D0_sin'; 1"`

or some such), or check return value of elementary_D_missing().

Some C environments miss trunc() (and maybe rint()? Did not see it missing). In such cases the corresponding handlers are not defined. If you know how to work around, but want to use trunc() if present, one can check for this using eval(), as above, with handlers `d0_trunc`

, `d0_rint`

.

In C Integer Conversion to signed type and floating to integer conversion are implementation-specific unless the (truncated) value can be represented exactly. Hence, e.g., operations with `unsigned int`

and `signed int`

targets need different C implementations (they do not necessarily differ by integer conversion, which is a NOP on in-memory representations). Which means that we need C code to all all the flavors, and our (autogenerated) C code is quite bulky and takes a lot of time to compile.

On some machines (notably "s390 or alpha) `double`

may coincide with `long double`

(at least in size). In such a situation, we assume that these types are completely identical, and do not even try to compile functions with `long double`

arguments. (We use the corresponding functions with `double`

argments instead.)"

Some M$ compilers are broken and do not support conversion of 64bit unsigned integers to double. In such cases, we do not implement handlers mixing `Q`

and floating point types (but transcendental functions applied to `Q`

are supported). For introspection: inspect result of have_uquad2double().

Ilya Zakharevich <ilyaz@cpan.org>

Copyright (C) 2005 by Ilya Zakharevich <ilyaz@cpan.org>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.

What is needed is hide-the-flavors higher-level wrapper which automatically chooses handlers basing on flavors of arguments. Yet another hiding level would create an overloaded-operation

Possible layout of an object: RV: playground IV: flavor, dim PV: format

Need also a region of playground one can write when targetting this value.

This may give a significant slowdown in the case of smallish arrays. Then maybe a "precompile" step may be useful: another class; operations over this class do not do calculations, but just name/type resolution, and bound checks. While operations are performed, this information is stored in a "program buffer". Then one can invoke the "program buffer" (assumingly inside a cycle, otherwise there is no point in doing preprocessing in advance); this should suit iterative methods...

Tentative example of how it might be accessed:

my $prog = new Numeric::LL_Program_Array; my $playground = ...; my $p = new Numeric::Array_for_Program $prog, $playground, 'd', $min, $max, $start, $dims = [1, 2*$N+1, 1]; my $one = $p->subarray(0, $dims = [1, $N, 0]); # dim=1, off=0, stride = 0 my $a = $p->subarray(1, $dims = [1, $N, 1]); # dim=1, off=1, stride = 1 my $tmp = $p->subarray($N+1, $dims = [1, $N, 1]); # after $a my $one1 = $one->subarray(0, $dims = [1, 1, 0]); # occupies the same place $one1->assign_1; # assigns $one too my $a00 = $a->subarray(0, $dims = [1, 1, 1]); # The first element $a00->assign_0; my $a0 = $a->subarray(0, $dims = [1, $N-1, 1]); # all but last element my $a1 = $a->subarray(1, $dims = [1, $N-1, 1]); # all but first element $a1->assign_add($one, $a0); # Now $a has elements 0..$N-1 $prog->invoke_and_clean; # Execute once, forget, start accumulate again $tmp->assign_tan($a); $a->subtract_assign($tmp); $prog->invoke for 0..19; # run 20 iterations of x = x - tan(x) print($a);

(need also flavors of subarray which modify $min/$max?). Note that subarray() does not touch the playground, only the dim/offsets/strides data.

syntax highlighting: