The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Tie::PagedArray - A tieable module for handling large arrays by paging

VERSION

Version 0.01

SYNOPSIS

tie my(@large_array), 'Tie::PagedArray';

tie my(@large_array), 'Tie::PagedArray', page_size => 100, paging_dir => '/tmp';

DESCRIPTION

When processing a large volumes of data a program may run out of memory. The operating system may impose a limit on the amount of memory a process can consume or the machine may simply lack the required amount of memory.

Tie::PagedArray supports large arrays by implementing paging and avoids running out of memory. The array is broken into pages and these pages are pushed to disk barring the page that is in use. Performance depends on the device chosen for persistence of pages.

This module uses Storable as its backend for serialization and deserialization. So the elements of the paged array can be any value or object. See documentation for Storable module to work with code refs.

When switching pages data from the currently active page is offloaded from the memory onto the page file if the page is marked dirty. This is followed by deserializing the page file of the page to which the switch is to be made.

An active page is marked dirty by an assignment of a value to any element in the page. To forcibly mark a page dirty assign an element in the page to itself!

  $large_array[2000] = $large_array[2000];

The defaults are page_size => 2000, paging_dir => "."

METHODS

tie

The tie call lets you create a new Tie::PagedArray object.

  tie my(@large_array), 'Tie::PagedArray';
  tie my(@large_array), 'Tie::PagedArray', page_size => 100;
  tie my(@large_array), 'Tie::PagedArray', page_size => 100, paging_dir => '/tmp';

Ties the array @large_array to Tie::PagedArray class.

page_size is the size of a page. If page_size is omitted then it defaults to 2000 elements. The default page size can be changed by setting the package variable ELEMS_PER_PAGE. The change in default only affects future ties.

  $Tie::PagedArray::ELEMS_PER_PAGE = 2000;

paging_dir is a directory to store the page files. Choose a directory on a fast storage device. If omitted it defaults to the current working directory.

page_files

The page_files method available on the tied object returns the names of the page files belonging to the array. This can be used to freeze the array and archive it along with its page files!

LIMITATIONS

1) foreach loop must not be used on Tie::PagedArrays because the array in foreach expands into an in-memory list. Instead, use iterative loops.

  while(my($i) = each(@large_array)) {
    # Do something with $large_array[$i]
  }

  OR

  for(my $i = 0; $i < scalar(@large_array); $i++) {
    # Do something with $large_array[$i]
  }

2) When an update is made to an element's nested datastructure then the corresponding page is not marked dirty as it is difficult to track such updates.

Suppose page_size => 1 and hash refs are stored as elements in the array.

  @car_parts = ({name => "wheel", count => 4}, {name => "lamp", count => 8});

Then an update to count will not mark the page dirty. When the page is later switched out the modification would be lost!

  $car_parts[1]->{count} = 6;

The workaround is to assign the element to itself.

  $car_parts[1] = $car_parts[1];

3) When an object is assigned to two elements in different pages they point to two independent objects.

Suppose page_size => 2, then

  my $wheel = {name => "wheel", count => 4};

  @car_parts = ($wheel, $wheel, $wheel);

  print($car_parts[0] == $car_parts[1] ? "Same object\n" : "Independent objects\n");
  Same object

  print($car_parts[0] == $car_parts[1] ? "Same object\n" : "Independent objects\n");
  Independent objects

BUGS

None known.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Tie::PagedArray

AUTHOR

Kartik Bherin

LICENSE AND COPYRIGHT

Copyright (C) 2013 Kartik Bherin.