The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Tie::CArray - Space-efficient, typed, external C Arrays (Alpha)

SYNOPSIS

    use Tie::CArray;
    $dblarr = new Tie::CDoubleArray(10000);

    @values = (0..10000);
    $dblarr = new Tie::CIntArray(10000,\@values);
    ref $dblarr eq 'Tie::CIntArray' and
      $dblarr->set(0,1) and
      $dblarr->get(0) == 1;

    tie @array, 'Tie::CDoubleArray', 10000, \@values;
    print $array[0], join ', ', @dbl[1..20];

DESCRIPTION

Several XS classes and methods to deal with typed, space-efficient C arrays are provided. Range checked and tieable.

There are hand-optimized, fast XS versions for the three basic C-types array of INT, DOUBLE and STRING and some sequential aggregate types int[2][], int[3][], int[4][], double[2][] and double[3][].

This roughly reflects to:

    CArray
        CIntArray               int[]
            CInt2Array          int[][2]
            CInt3Array          int[][3]
            CInt4Array          int[][4]
        CDoubleArray            double[]
            CDouble2Array       double[][2]
            CDouble3Array       double[][3]
        CStringArray            *char[]

Typed C arrays need about three times less space then untyped perl arrays. Such as various computional geometry modules dealing with 10.000 - 200.000 double[3]. Modification is done in-place and preferably in bulk.

It might also be easier to write XSUBs by converting the data to CArray's before, pass this pointer to the C func, and handle the results in Perl then.

The Fetch/Store operations with tied arrays copy the scalars to perl and back, so it shouldn't be abused for BIG data.

Perl's safemalloc/safefree is used.

EFFICIENT GROW

CArray's are efficiently growable, which is needed for several algorithms, such as placing extra sentinels at the end, adding three points for a super-triangle for Delaunay triangulation, ...

Extra space is always allocated to fit nicely into the page boundary, defined by the system granularity. For now it's 2048, half of the usual 4096, but this can be tweaked (e.g. for many small arrays) in the C function freesize().

CLASS METHODS

new ( size, [ template, [ values ]] )

The new method is provided for all classes, the optional arrayref initarg applies only to the base Array classes, not the aggregate.

The constructor creates a new Tie::CArray object. For the Array classes the second optional argument is used to initialize it with an array. The second argument may also be used by a seperate init call. If the optionally provided values arrayref is shorter that the allocated size, the rest will stay uninitialized.

    $D = new Tie::CDoubleArray( 1000, ,[0..999] );
len ()

The len method returns the length of the array, 1+ the index of the last element. To enlarge the array grow() should be used.

    $D  = new Tie::CDoubleArray(5000);
    for my $j (0 .. $D->len-1) { $D->set($_, 0.0)); }
    $D->len; # => 5000
get ( index )

get returns the value at the given index, which will be scalar or a list. Croaks with "index out of range" on wrong index.

    $I = new Tie::CIntArray(2,[0,1]);
    print $I->get(1); # => 1
    print $I->get(2);
      => croak "index out of range"

    $I2 = new Tie::CInt2Array(2,[[0,1]]);
    print $I->get(0); # => (0 1)
set ( index, value )

The set method is provided for all classes. It changes the value at the given index. The value should be either a scalar or an arrayref. Croaks with "index out of range" on wrong index. Returns nothing.

    $I = new Tie::CIntArray(100);
    map { $I->set($_,$i[$_]) } (0..99);
    $I->set(99,-1);
    $I->set(100);
      => "index out of range"

    $I2 = Tie::CInt2Array->new(2);
    $I2->set(0, [1,0]);
    $I2->set(1, [0,1]);
list ()

Returns the content of the flat array representation as arrayref.

init ( ARRAYREF )

Initializes the array with the values from the arrayref. Returns nothing.

This is the same as the second new argument. If the provided values arrayref is shorter that the allocated size, the rest will stay uninitialized.

  $I = Tie::CIntArray::new(100) ;
  $I->init( [0..99] );
grow ( n )

Adds room for n elements to the array. These elements must be initialized extra with set. To support faster grow() a certain number of already pre-allocated items at the end of the array will be used. (see free) Returns nothing.

delete ( index )

Deletes the item at the given index. free is incremented and the remaining array items are shifted. Returns nothing.

get_grouped_by ( size, index )

Returns a list of subsequent values. It returns a list of size indices starting at size * index. This is useful to abuse the unstructured array as typed array of the same type, such as *double[3] or *int[2].

But this is normally not used since fast get methods are provided for the sequential classes, and those methods can be used on flat arrays as well. (Internally all sequential arrays are flat).

  Tie::CInt3Array::get($I,0) == $I->get_grouped_by(3,0)

$ptr->get_grouped_by(2,4) returns the 4-th pair if the array is seen as list of pairs.

  $ptr->get_grouped_by(3,$i) => (ptr[i*3] ptr[i*3+1] ptr[i*3+2] )
slice ( start, size, [ stride=1 ] )

C++ like slice operator on a flat array. - In contrast to get_grouped_by() which semantics are as on a grouped array.

Returns a list of size items, starting at start, with interim offsets of stride which defaults to 1. This is useful to return columns or rows of a flat matrix.

  $I = new Tie::CIntArray (9, [0..8]);
  $I->slice ( 0, 3, 3 ); # 1st column
    => (0 3 6)
  $I->slice ( 0, 3, 1 ); # 1st row
    => (0 1 2)
  $I->get_grouped_by(3, 0);
    => (0 1 2)
isort ( [ cmpfunc ] )

"Indirect sort" (numerically ascending only for now)

Returns a fresh sorted index list of integers (0 .. len-1) resp. a CIntArray object in scalar context.

The optional cmpfunc argument is not yet implemented.

nreverse ()

"Reverse in place". (The name comes from lisp, where `n' denotes the destructive version). Destructively swaps all array items. Returns nothing.

To perform a copying reverse define

sub reverse { nreverse($_[0]->copy()) }

SOME SEQUENTIAL CLASSES and CONVERSION

To mix and change parallel and sequential data structures, some sequential types (int[2],int[3],int[4],double[2],double[3]) are derived from their base classes with fast, hand-optimized get and set methods to return and accept lists instead of scalars.

The input argument must be an arrayref, the result will be an array in list context and an arrayref in scalar context.

Conversion

The Arrays for Int2, Int3, Int4, Double2 and Double3 can also be converted from and to parallel base arrays with fast XS methods. Parallel arrays are sometimes preferred over structured arrays, but delete/ insert of structures in parallel arrays is costly.

  # three parallel CIntArray's
  $X = new Tie::CIntArray(1000);
  $Y = new Tie::CIntArray(1000);
  $Z = new Tie::CIntArray(1000);

  # copy to one sequential *int[3], new memory
  $I = $X->ToInt3($Y,$Z);

  # or to an existing array
  $I = new Tie::CIntArray(3000);
  $I = $X->ToInt3($Y,$Z,$I);

  # copies back with allocating new memory
  ($X, $Y, $Z) = $I->ToPar();

  # copies back with reusing some existing memory (not checked!)
  ($X, $Y, $Z) = $I->ToPar($X,$Z);  # Note: I3 will be fresh.
ToPar ( SeqArray, [ Tie::CArray,... ] )

This returns a list of Tie::CArray objects, copied from the sequential object to plain parallel CArray objects. This is a fast slice.

  *int[2] => (*int, *int)

  Tie::CInt2Array::ToPar
  Tie::CInt3Array::ToPar
  Tie::CInt4Array::ToPar
  Tie::CDouble2Array::ToPar
  Tie::CDouble3Array::ToPar

If the optional CArray args are given the memory for the returned objects are not new allocated, the space from the given objects is used instead.

To$Type$Num ( CArray, ..., [ CArray ] )

This returns a sequential CArray object copied from the parallel objects given as arguments to one sequential CArray. This is a fast map.

  *int, *int => *int[2]

  Tie::CIntArray::ToInt2
  Tie::CIntArray::ToInt3
  Tie::CIntArray::ToInt4
  Tie::CDoubleArray::ToDouble2
  Tie::CDoubleArray::ToDouble3

If the last optional CArray arg is defined the memory for the returned object is not new allocated, the space from the given object is used instead.

ARBITRARY STRUCTURED ARRAYS, PACK-STYLE TEMPLATES (not yet)

Some special sequential arrays are hand-optimized for speed but can hold only limited data types (int[2] .. double[3]).

To support arbitrary structured arrays a second template argument may be provided which must be a arrayref of a hash, where its keys name the accessor and the values pack-style letters.

This does not work yet!

   tie @A, 'Tie::CArray', 200,
                [ x => 'd',
                  y => 'd',
                  z => 'd',
                  attr => [ age   => 'i',
                            dirty => 'i',
                            owner => 's' ],
                  refcount => 'i' ];
  $A->init ...

  for (my $i = 0; $i < @A; $i++) {
    printf("x,y,z: (%d %d %d),\nattr: (age=%d, dirty=%d, owner=%s)\nrefcount=%d",
           $A[$i]->{x}, $A[$i]->{y}, $A[$i]->{z},
           $A[$i]->{attr}->{age}, $A[$i]->{attr}->{dirty}, $A[$i]->{attr}->{owner},
           $A[$i]->{refcount}
          );
  }

  tie @utmp, 'Tie::CArray', 100,
        [ ut_type => 's',
          ut_pid  => 'i',
          ut_line    => 'a12',
          ut_id      => 'a4',
          ut_user    => 'a32',
          ut_host    => 'a256',
          ut_exit    => [ # struct exit_status
                          e_termination => 's',
                          e_exit        => 's' ],
          ut_session => 'l',
          ut_tv      => [ # struct timeval
                          tv_sec  => 'l'
                          tv_usec => 'l' ],
          ut_addr_v6 => 'l4',
          pad        => 'a20' ];

The following subset of pack() template letters is supported:

i

signed integer (default)

I

unsigned integer

c

signed character (one byte integer)

c

unsigned character (one byte integer)

s

signed short integer

S

unsigned short integer

n

unsigned short integer in network byte order

l

signed long integer

L

unsigned long integer

N

unsigned long integer in network byte order

q

signed long long integer (long long/int64)

(only if the system has quads and perl was compiled for 64 bit)

Q

unsigned long long integer (unsigned long long/uint64)

(only if the system has quads and perl was compiled for 64 bit)

L

unsigned long integer

f

float

d

double

aN

fixed-length, null-padded ASCII string of length N

AN

fixed-length, space-padded ASCII string of length N

ZN

fixed-length, null-terminated ASCII string of length N

INTERNAL METHODS

DESTROY ()

This used to crash on certain DEBUGGING perl's, but seems to be okay now. Returns nothing.

Tie::CArray::itemsize ( )
Tie::CStringArray::itemsize ( [index] )

Returns the size in bytes per item stored in the array. This is only used internally to optimize memory allocation and the free list.

A CStringArray object accepts the optional index argument, which returns the string length at the given index. Without argument it returns the size in bytes of a char * pointer (which is 4 on 32 bit systems).

copy ()

Returns a freshly allocated copy of the array with the same contents.

_freelen ()

Internal only. Returns the number of free elements at the end of the array. If grow() needs less or equal than free elements to be added, no new room will be allocated.

This is primarly for performance measures.

TIEARRAY METHODS

Not tested yet!

tie (var, type, size)

After tying a array variable to an Tie::CArray class the variable can be used just as any normal perl array.

  tie @array, 'Tie::CDoubleArray', 200;
  print $array[200];
    => croak "index out of range"

SEE ALSO

perlxs(1), "tie" in perlfunc, Tie::Array(3), Geometry::Points(3), C::Dynalib::Poke(3), Tie::MmapArray(3)

TODO

Not all pack letters are implemented yet.

AUTHOR

Reini Urban <rurban@x-ray.at>

Andrew Ford wrote the arbitrary structure code.

COPYRIGHT

Copyright (c) 1999 Reini Urban.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

WARNING

The author makes NO WARRANTY, implied or otherwise, about the suitability of this software for safety or security purposes.

CArrays are now always ranged checked which cannot be turned off, so it's not that dangerous anymore to read or write to not-owned memory areas.

The author shall not in any case be liable for special, incidental, consequential, indirect or other similar damages arising from the use of this software.

Your mileage will vary. If in any doubt DO NOT USE IT. You've been warned.

BUGS

There are certainly some. Not fully tested yet. Tests for copy, grow, delete, tie are pending. Also some more conversion tests, esp. with double and degenerate (grow, cut) cases.

  1. realloc() in string_set() with DEBUGGING perl fails sometimes.

  2. An implicit DESTROY invocation sometimes asserts a DEBUGGING perl, regardless if PERL_MALLOC or the WinNT msvcrt.dll malloc is used. (5.00502 - 5.00558) Esp. on perl shutdown, when freeing the extra objects at the second GC.

    This became much better in 0.08 than in previous versions.

This is alpha, not fully tested yet!

VERSION

$Revision 0.13 $ $Date 2008-01-20 $