The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Hash::Transform - Turns array of hashes to hash of hashes in predefined ways

SYNOPSIS

  use Data::Hash::Transform qw(hash_f hash_l hash_m hash_a hash_em);

  my $loh = [ { k => 1, n => 'one' }, { k => 2, n => 'two' }, { k => 1, n => 'ein' } ];
  $hoh1 = hash_f($loh, 'k'); # keep first
  $hoh2 = hash_l($loh, 'k'); # keep last
  $hoh3 = hash_m($loh, 'k'); # keep a list (if needed)
  $hoh4 = hash_a($loh, 'k'); # always keep a list

  $hoh = hash_em($loh, 'k', $meth); # $meth is one of 'f', 'l', 'm', or 'a'

DESCRIPTION

This module provides four algorithms to turn an array of hashes to a hash of hashes. The transformation is based on using the value at a certain key of inner hashes as the key in the outer hash.

So:

  [ { k => 1, n => 'one' }, { k => 2, n => 'two' } ]

turns to

  { 1 => { k => 1, n => 'one' }, 2 => { k => 2, n => 'two } }

when 'k' is the key of keys. (From this example, it was made obvious that here we mean array and hash refs when talking about arrays and hashes.)

The difference among the algorithms happen when the same key happens twice or more. For example, how do the following array maps to a hash? ('k' is still the key of keys here.)

  [ { k => 1, n => 'one' }, { k => 2, n => 'two' }, { k => 1, n => 'ein' } ]

The following alternatives (among others) are possible:

  • keep the first

      { 1 => { k => 1, n => 'one' }, 2 => { k => 2, n => 'two' } }
  • keep the last

      { 2 => { k => 2, n => 'two' }, 1 => { k => 1, n => 'ein' }  }
  • keep a list in the case of collisions

      { 1 => [ { k => 1, n => 'one' }, { k => 1, n => 'ein' } ],
        2 => { k => 2, n => 'two' } }
  • always keep a list (for the case of collisions)

      { 1 => [ { k => 1, n => 'one' }, { k => 1, n => 'ein' } ],
        2 => [ { k => 2, n => 'two' } ] }

That is exactly what we implement here.

EXPORT

None by default. hash_f, hash_l, hash_m, hash_a, hash_em can be exported on demand.

HASH_M VERSUS HASH_A

The difference between using hash_m and hash_a is primarily oriented to the code that is going to consume the transformed hash. In the case of hash_m, it must be ready to handle two cases: a single element which appears as a hash ref and multiple elements which appear as an array ref of hash refs. In the case of hash_a, the treatment is more homogeneous and you will always get an array ref of hash refs.

A typical code with the return of hash_m is illustrated by the code below.

  my $h = hash_m($loh);
  while (my ($k, $v) = each %$h) {
          if (ref $v eq 'ARRAY') {
                  do something with $_ for @$v;
          } else {
                  do something with $v
          }
  }

or the shorter:

  my $h = hash_m($loh);
  while (my ($k, $v) = each %$h) {
          my @vs = (ref $v eq 'ARRAY') ? @$v : ($v);
          do something with $_ for @vs;
  }

With hash_a, it would look like:

  my $h = hash_m($loh);
  while (my ($k, $v) = each %$h) {
          do something with $_ for @$v;
  }

It is a trade-off: the client code can be simple (hash_a) or the overhead of data structures can be reduced (hash_m).

TO DO

If you are familiar with XML::Simple, you probably have recognized some of the tranformations it does with hashes against arrays. Mainly, the ones represented by hash_m and hash_l (when ForceArray is used).

Other transformations based on typical behavior of XML::Simple are possible. For example,

  • discard the key element

      [ { k => 1, n => 'one' }, { k => 2, n => 'two' } ]

    to

      { 1 => { n => 'one' }, 2 => { n => 'two' } }

    and even (for 'n' defined to be the contents key)

      { 1 => 'one', 2 => 'two' }
  • mark the key element

      [ { k => 1, n => 'one' }, { k => 2, n => 'two' }, { k => 1, n => 'ein' } ]

    to

      { 1 => { -k => 1, n => 'one' }, 2 => { -k => 2, n => 'two' } }

Maybe someday this gets implemented too.

ISSUES

The functions hash_* have been designed to be fast and that's why their code is redundant. One could write a function with all bells and whistles which does all the work of them together, by using options and querying them at runtime. I think the code would be slightly harder to maintain and perfomance may suffer. But this is just guessing. Soon I will write such an implementation and a benchmark to make sure it is worth to use this code as it is.

BUGS

Please report bugs via CPAN RT http://rt.cpan.org/NoAuth/Bugs.html?Dist=Module-Which.

AUTHOR

Adriano R. Ferreira, <ferreira@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2005 by Adriano R. Ferreira

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.