The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Sort::Key::Merger - Perl extension for merging sorted things

SYNOPSIS

  use Sort::Key::Merger qw(keymerger);

  sub line_key_value {

      # $_[0] is available as a scratchpad that persist
      # between calls for the same $_;
      unless (defined $_[0]) {
          # so we use it to cache the file handle when we
          # open a file on the first read
          open $_[0], "<", $_
              or croak "unable to open $_";
      }

      # don't get confused by this while loop, it's only
      # used to ignore empty lines
      my $fh = $_[0];
      local $_; # break $_ aliasing;
      while (<$fh>) {
          next if /^\s*$/;
          chomp;
          if (my ($key, $value) = /^(\S+)\s+(.*)$/) {
              return ($value, $key)
          }
          warn "bad line $_"
      }

      # signals the end of the data by returning an
      # empty list
      ()
  }

  # create a merger object:
  my $merger = keymerger { line_key_value } @ARGV;

  # sort and write the values:
  my $value;
  while (defined($value=$merger->())) {
      print "value: $value\n"
  }

WARNING!!!

Several backward imcompatible changes has been introduced in version 0.10:

    - filekeymerger callbacks are now called on list context
    - order of return values on keymerger callback has changed
    - in list context only the next value is returned by default
      instead of all the remaining ones

DESCRIPTION

Sort::Key::Merger merges presorted collections of data based on some (calculated) keys.

Given

FUNCTIONS

The following functions are available from this module:

keymerger { GENERATE_VALUE_KEY_PAIR($_) } @sources;

creates a merger object for the given @sources collections.

Every item in @source is aliased by $_ and then the user defined subroutine GENERATE_VALUE_KEY_PAIR called. The result from that callback should be a (value, key) pair. Keys are used to determine the order in which the values are sorted.

GENERATE_VALUE_KEY_PAIR can return an empty list to indicate that a source has become exhausted.

The result from keymerger is another subroutine that works as a generator. It can be called as:

  my $next = $merger->();

  my @next = $merger->($n);

In scalar context it returns the next value or undef if all the sources have been exhausted. In list context it returns the next $n values (1 is used as the deault value for $n).

If your data can contain undef values, you should iterate over the sorted values as follows:

  my $merger = keymerger ...;

  while (my ($next) = $merger->()) {
     # do whatever with $next
     # ...
  }

Passing -1 makes the function return all the remaining values:

  my @remaining = $merger->(-1);

NOTE: an additional argument is passed to the GENERATE_VALUE_KEY_PAIR callback in $_[0]. It is to be used as a scrachpad, its value is associated to the current source and will perdure between calls from the same generator, i.e.:

  my $merger = keymerger {

      # use $_[0] to cache an open file handler:
      $_[0] or open $_[0], '<', $_
          or croak "unable to open $_";

      my $fh = $_[0];
      local $_;
      while (<$fh>) {
          chomp;
          return $_ => $_;
      }
      ();
  } ('/tmp/foo', '/tmp/bar');

This function honours the use locale pragma.

nkeymerger { GENERATE_VALUE_KEY_PAIR($_) } @sources

is like keymerger but compares the keys numerically.

This function honours the use integer pragma.

ikeymerger

Similar to keymerger but Compares the keys as integers.

ukeymerger

Compares the keys as unsigned integers.

rkeymerger
rnkeymerger
rikeymerger
rukeymerger

performs the sorting in reverse order.

filekeymerger { generate_key } @files;

returns a merger subroutine that returns lines read from @files sorted by the keys that generate_key generates.

@files can contain file names or handles for already open files.

generate_key is called with the line just read on $_ and has to return the sorting key for it. If its return value is undef the line is ignored.

The line can be modified inside generate_key changing $_, i.e.:

  my $merger = filekeymerger {
      chomp($_); #             <== here
      return undef if /^\s*$/;
      substr($_, -1, 10)
  } @ARGV;

Finally, $/ can be changed from its default value to read the files in chunks other than lines.

The return value from this function is a subroutine reference that on successive calls returns the sorted elements in the same fashion as the iterator returned from keymerger.

  my $merger = filekeymerger { (split)[0] } @ARGV;
  while (my ($next) = $merger->(1)) {
    ...
  }

This function honours the use locale pragma.

nfilekeymerger { generate_key } @files;

is like filekeymerger but the keys are compared numerically.

This function honours the use integer pragma.

ifilekeymerger

similar to filekeymerger bug compares the keys as integers.

ufilekeymerger

similar to filekeymerger bug compares the keys as unsigned integers.

rfilekeymerger
rnfilekeymerger
rifilekeymerger
rufilekeymerger

perform the sorting in reverse order.

multikeymerger { GENERATE_VALUE_KEYS_LIST($_) } \@types, @sources

This function generates a multikey merger.

GENERATE_VALUE_KEYS_LIST should return a list with the next value from the source passed in $_ and the sorting keys.

@types is an array with the key sorting types (ee Sort::Key multikey sorting documentation for a discussion on the supported types).

For instance:

  my $merger = multikeymerger {
      my $v = shift $@_;
      my $name = $v->name;
      my $age = $v->age;
      ($v, $age, $name)
  } [qw(-integer string)], @data_sources;

  while (my ($next) = $merger->()) {
      print "$next\n";
  }

SEE ALSO

Sort::Key, Sort::Key::External, locale, integer, perl core sort function.

COPYRIGHT AND LICENSE

Copyright (C) 2005, 2007 by Salvador Fandiño, <sfandino@yahoo.com>.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.