The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Array::To::Moose - Build Moose objects from a data array

VERSION

This document describes Array::To::Moose version 0.0.6

SYNOPSIS

  use Array::To::Moose;
  # or
  use Array::To::Moose qw(array_to_moose set_class_ind set_key_ind
                          throw_nonunique_keys throw_multiple_rows   );

Array::To::Moose exports function array_to_moose() by default, and convenience functions set_class_ind(), set_key_ind(), throw_nonunique_keys() and throw_multiple_rows() if requested.

array_to_moose

array_to_moose() builds Moose objects from suitably-sorted 2-dimensional arrays of data of the type returned by, e.g., DBI::selectall_arrayref() i.e. a reference to an array containing references to an array for each row of data fetched.

Example 1a

  package Car;
  use Moose;

  has 'make'  => (is => 'ro', isa => 'Str');
  has 'model' => (is => 'ro', isa => 'Str');
  has 'year'  => (is => 'ro', isa => 'Int');

  package CarOwner;
  use Moose;

  has 'last'  => (is => 'ro', isa => 'Str');
  has 'first' => (is => 'ro', isa => 'Str');
  has 'Cars'  => (is => 'ro', isa => ArrayRef[Car]');

  ...

  # in package main:

  use Array::To::Moose;

  # In this dataset Alex owns two cars, Jim one, and Alice three
  my $data = [
    [ qw( Green Alex  Ford   Focus 2011 ) ],
    [ qw( Green Alex  VW     Jetta 2009 ) ],
    [ qw( Green Jim   Honda  Civic 2007 ) ],
    [ qw( Smith Alice Buick  Regal 2012 ) ],
    [ qw( Smith Alice Toyota Camry 2008 ) ],
    [ qw( Smith Alice BMW    X5    2010 ) ],
  ];

  my $CarOwners = array_to_moose(
                      data => $data,
                      desc => {
                        class => 'CarOwner',
                        last  => 0,
                        first => 1,
                        Cars  => {
                          class => 'Car',
                          make  => 2,
                          model => 3,
                          year  => 4,
                        } # Cars
                      } # Car Owners
  );

  print $CarOwners->[2]->Cars->[1]->model; # prints "Camry"

Example 1b - Hash(ref) Sub-objects

In the above example, array_to_moose() returns a reference to an array of CarOwner objects, $CarOwners.

If a hash of CarOwner objects is required, a "key =>... " entry must be added to the descriptor hash. For example, to construct a hash of CarOwner objects, whose key is the owner's first name, (unique for every person in the example data), the call becomes:

  my $CarOwnersH = array_to_moose(
                      data => $data,
                      desc => {
                        class => 'CarOwner',
                        key   => 1,   # note key
                        last  => 0,
                        first => 1,
                        Cars  => {
                          class => 'Car',
                          make  => 2,
                          model => 3,
                          year  => 4,
                        } # Cars
                      } # Car Owners
  );

  print $CarOwnersH->{Alex}->Cars->[0]->make; # prints "Ford"

Similarly, to construct the Cars sub-objects as hash sub-objects (and not an array as above), define CarOwner as:

  package CarOwner;
  use Moose;

  has 'last'  => (is => 'ro', isa => 'Str'         );
  has 'first' => (is => 'ro', isa => 'Str'         );
  has 'Cars'  => (is => 'ro', isa => 'HashRef[Car]'); # Was 'ArrayRef[Car]'

and noting that the car make is unique for each person in the $data dataset, we construct the reference to an array of objects with the call:

  $CarOwners = array_to_moose(
                      data => $data,
                      desc => {
                        class => 'CarOwner',
                        last  => 0,
                        first => 1,
                        Cars  => {
                          class => 'Car',
                          key   => 2,   # note key
                          model => 3,
                          year  => 4,
                        } # Cars
                      } # Car Owners
  );

  print $CarOwners->[2]->Cars->{BMW}->model; # prints 'X5'

Example 1c - "Simple" Reference Attributes

If, instead of the car owner object containing an ArrayRef or HashRef of Car sub-objects, it contains, say, a ArrayRef of strings representing the names of the car makers:

  package SimpleCarOwner;
  use Moose;

  has 'last'      => (is => 'ro', isa => 'Str'          );
  has 'first'     => (is => 'ro', isa => 'Str'          );
  has 'CarMakers' => (is => 'ro', isa => 'ArrayRef[Str]');

Using the same dataset from Example 1a, we construct an arrayref SimpleCarOwner objects as:

  $SimpleCarOwners = array_to_moose(
                        data => $data,
                        desc => {
                          class     => 'SimpleCarOwner',
                          last      => 0,
                          first     => 1,
                          CarMakers => [2],  # Note the '[...]' brackets
                        }
  );

  print $SimpleCarOwners->[2]->[1];   # prints 'Toyota'

I.e., when the object attribute is an ArrayRef of one of the Moose "simple" types, e.g. 'Str', 'Num', 'Bool', etc (See Moose::Manual::Types), then the column number should appear in square brackets ('CarMakers => [2]' above) to differentiate them from the bare types (last => 0, and first => 1, above).

Note that Array::To::Moose doesn't (yet) handle the case of hashrefs of "simple" types, e.g., ( isa => "HashRef[Str]" )

Example 2 - Use with DBI

The main rationale for writing Array::To::Moose is to make it easy to build Moose objects from data extracted from relational databases, especially when the database query involves multiple tables with one-to-many relationships to each other.

As an example, consider a database which models patients making visits to a clinic on multiple occasions, and on each visit, having a doctor run some tests and diagnose the patient's complaint. In this model, the database Patient table would have a one-to-many relationship with the Visit table, which in turn would have a one-to-many relationship with the Test table

The corresponding Moose model has nested Moose objects which reflects those one-to-many relationships, i.e., multiple Visit objects per Patient object and multiple Test objects per Visit object, declared as:

  package Test;
  use Moose;
  has 'name'        => (is => 'rw', isa => 'Str');
  has 'result'      => (is => 'rw', isa => 'Str');

  package Visit;
  use Moose;
  has 'date'        => (is => 'rw', isa => 'Str'           );
  has 'md'          => (is => 'rw', isa => 'Str'           );
  has 'diagnosis'   => (is => 'rw', isa => 'Str'           );
  has 'Tests'       => (is => 'rw', isa => 'HashRef[Test]' );

  package Patient;
  use Moose;
  has 'last'        => (is => 'rw', isa => 'Str'             );
  has 'first'       => (is => 'rw', isa => 'Str'             );
  has 'Visits'      => (is => 'rw', isa => 'ArrayRef[Visit]' );

In the main program:

  use DBI;
  use Array::To::Moose;

  ...

  my $sql = q{
    SELECT
       P.Last, P.First
      ,V.Date, V.Doctor, V.Diagnosis
      ,T.Name, T.Result
    FROM
       Patient P
      ,Visit   V
      ,Test    T
    WHERE
          -- join clauses
          P.Patient_key = V.Patient_key
      AND V.Visit_key   = T.Visit_key
      ...
    ORDER BY
        P.Last, P.First, V.Date
  };

  my $dbh = DBI->connect(...);

  my $data = $dbh->selectall_arrayref($sql);

  # rows of @$data contain:
  #               Last, First, Date, Doctor, Diagnosis, Name, Result
  # at positions: [0]   [1]    [2]   [3]     [4]        [5]   [6]

  my $patients = array_to_moose(
                      data => $data,
                      desc => {
                        class => 'Patient',
                        last  => 0,
                        first => 1,
                        Visits => {
                          class => 'Visit',
                          date      => 2,
                          md        => 3,
                          diagnosis => 4,
                          Tests => {
                            class  => 'Test',
                            key    => 5,
                            name   => 5,
                            result => 6,
                          } # tests
                        } # visits
                      } # patients
  );

  print $patients->[2]->Visits->[0]->Tests->{BP}->result; # prints '120/80'

Note: We used the Test name as the key for the Visit 'Tests', as the tests have unique names within any one Visit. (See t/5.t)

DESCRIPTION

As shown in the above examples, the general usage is:

  package MyClass;
  use Moose;
  (define Moose object(s))
  ...
  use Array::To::Moose;
  ...
  my $data_ref = selectall_arrayref($sql); # for example

  my $object_ref =  array_to_moose(
                        data => $data_ref
                        desc => {
                          class    => 'MyClass',
                          key      => K,   # only for HashRefs
                          attrib_1 => N1,
                          attrib_2 => N2,
                          ...
                          attrib_m => [ M ],
                          ...
                          SubObject => {
                            class => 'MySubClass',
                            ...
                          }
                        }
  );

Where:

array_to_moose() returns an array- or hash reference of MyClass Moose objects. All Moose classes (MyClass, MySubClass, etc) must already have been defined by the user.

$data_ref is a reference to an array containing references to arrays of scalars of the kind returned by, e.g., DBI::selectall_arrayref()

desc (descriptor) is a reference to a hash which contains several types of data:

class => 'MyObj' is required and defines the Moose class or package which will contain the data. The user should have defined this class already.

key => N is required if the Moose object being constructed is to be a hashref, either at the top-level Moose object returned from array_to_moose() or as a "isa => 'HashRef[...]'" sub-object.

attrib => N where attrib is the name of a Moose attribute ("has 'attrib' => ...")

attrib => [ N ] where attrib is the name of a Moose "simple" sub-attribute ("has => 'attrib' ( isa => 'ArrayRef[Type]' ...) "), where Type is a "simple" Moose type, e.g., 'Str', 'Int', etc.

In the above cases, N is a positive integer containing the the corresponding zero-indexed column number in the data array where that attribute's data is to be found.

Sub-Objects

array_to_moose() can handle three types of Moose sub-objects, i.e.:

an array of sub-objects:

  has => 'Sub_Obj' ( isa => 'ArrayRef[MyObj]' );

a hash of sub-objects:

  has => 'Sub_Obj' ( isa => 'HashRef[MyObj]'  );

or a single sub-object:

  has => 'Sub_Obj' ( isa => 'MyObj'           );

the descriptor entry for Sub_Obj in each of these cases is (almost) the same:

  desc => {
    class => ...
    ...
    Sub_Obj => {
      class    => 'MyObj',
      key      => <keycol> # HashRef['] only
      attrib_a => <N>,
      ...
    } # end SubObj
    ...
  } # end desc

(A HashRef['] sub-object will also require a key => N entry in the descriptor).

In addition, array_to_moose() can also handle ArrayRefs of "simple" types:

  has => 'Sub_Obj' ( isa => 'ArrayRef[Type]' );

where Type is a "simple" Moose type, e.g., 'Str', 'Int, 'Bool', etc.

Ordering the data

array_to_moose() does not sort the input data array, and does all processing in a single pass through the data. This means that the data in the array must be sorted properly for the algorithm to work.

For example, in the previous Patient/Visit/Test example, in which there are many Tests per Visit and many Visits per Patient, the data in the Test column(s) must change the fastest, the Visit data slower, and the Patient data the slowest:

  Patient  Visit  Test
  ------   -----  ----
    P1      V1     T1
    P1      V1     T2
    P1      V1     T3
    P1      V2     T4
    P1      V2     T5
    P2      V3     T6
    P2      V3     T7
    P2      V4     T8

In SQL this would be accomplished by a SORT BY clause, e.g.:

  SORT BY Patient.Key, Visit.Key, Test.Key

throw_nonunique_keys ()

By default, array_to_moose() does not check the uniqueness of hash key values within the data. If the key values in the data are not unique, existing hash entries will get overwritten, and the sub-object will contain the value from the last data row which contained that key value. For example:

  package Employer;
  use Moose;
  has 'year'    => (is => 'rw', isa => 'Str');
  has 'name'    => (is => 'rw', isa => 'Str');

  package Person;
  use Moose;
  has 'name'        => (is => 'rw', isa => 'Str'              );
  has 'Employers'   => (is => 'rw', isa => 'HashRef[Employer]');

  ...

  my $data = [
    [ 'Anne Miller', '2005', 'Acme Corp'    ],
    [ 'Anne Miller', '2006', 'Acme Corp'    ],
    [ 'Anne Miller', '2007', 'Widgets, Inc' ],
    ...
  ];

The call:

  my $obj = array_to_moose(
                  data => $data,
                  desc => {
                    class     => 'Person',
                    name      => 0,
                    Employers => {
                      class => 'Employer',
                      key   => 2,   # using employer name as key
                      year  => 1,
                    } # Employer
                  } # Person
  );

Because the employer was 'Acme Corp' in years 2005 & 2006, array_to_moose will silently overwrite the 2005 Employer object with the data for the 2006 Employer object:

  print $obj->[0]->Employers->{'Acme Corp'}->year, "\n"; # prints '2006'

Calling throw_uniq_keys() (either with no argument, or with a non-zero argument) enables reporting of non-unique keys. In the above example, array_to_moose() would exit with warning:

 Non-unique key 'Acme Corp' in 'Employer' class ...

Calling throw_uniq_keys(0), i.e. with an argument of zero will disable subsequent reporting of non-unique keys. (See t/8c.t)

throw_multiple_rows ()

For single-occurence sub-objects (i.e. ( isa => 'MyObj' )), if the data contains more than one row of data for the sub-object, only the first row will be used to construct the single sub-object and array_to_moose() will not report the fact. E.g.:

  package Salary;
  use Moose;
  has 'year'    => (is => 'rw', isa => 'Str');
  has 'amount'  => (is => 'rw', isa => 'Int');

  package Person;
  use Moose;
  has 'name'     => (is => 'rw', isa => 'Str'   );
  has 'Salary'   => (is => 'rw', isa => 'Salary'); # a single object

  ...

  my $data = [
    [ 'John Smith', '2005', 23_350 ],
    [ 'John Smith', '2006', 24_000 ],
    [ 'John Smith', '2007', 26_830 ],
    ...
  ];

The call:

  my $obj = array_to_moose(
                  data => $data,
                  desc => {
                    class  => 'Person'
                    name   => 0,
                    Salary => {
                      class  => 'Salary',
                      year   => 1,
                      amount => 2
                    } # Salary
                  } # Person
  );

would silently assign to Salary, the first row of the three Salary data rows, i.e. for year 2005:

  print $object->[0]->Salary->year, "\n"; # prints '2005'

Calling throw_multiple_rows() (either with no argument, or with a non-zero argument) enables reporting of this situation. In the above example, array_to_moose() will exit with error:

  Expected a single 'Salary' object, but got 3 of them ...

Calling throw_multiple_rows(0), i.e. with an argument of zero will disable subsequent reporting of this error. (See t/8d.t)

set_class_ind (), set_key_ind ()

Problems arise if the Moose objects being constructed contain attributes called class or key, causing ambiguities in the descriptor. (Does key => 5 mean the attribute key or the hash key key is in the 5th column?)

In these cases, set_class_ind() and set_key_ind() can be used to change the keywords for class => ... and key => ... descriptor entries.

For example:

  package Letter;
  use Moose;

  has 'address' => ( is => 'ro', isa => 'Str'         );
  has 'class'   => ( is => 'ro', isa => 'PostalClass' );
  ...

  set_key_ind('package'); # use "package =>" in place of "class =>"

  my $letters = array_to_moose(
                        data => $data,
                        desc => {
                          package => 'Letter',  # the Moose class
                          address => 0,
                          class   => 1,         # the attribute 'class'
                          ...
                        }
  );

Read-only Attributes

One of the recommendations of Moose::Manual::BestPractices is to make attributes read-only (isa => 'ro') wherever possible. Array::To::Moose supports this by evaluating all the attributes for a given object given in the descriptor, then including them all in the call to new(...) when constructing the object.

For Moose objects with attributes which are sub-objects, i.e. references to a Moose object, or references to an array or hash of Moose objects, it means that the sub-objects must be evaluated before the new() call. The effect of this for multi-leveled Moose objects is that object evaluations are carried out depth-first.

Treatment of NULLs

array_to_moose() uses Array::GroupBy::igroup_by to compare the rows in the data given in data => ..., using function Array::GroupBy::str_row_equal() which compares the data as strings.

If the data contains undef values, typically returned from database SQL queries in which DBI maps NULL values to undef, when str_row_equal() encounters undef elements in corresponding column positions, it will consider the elements equal. When corresponding column elements are defined and undef respectively, the elements are considered unequal.

This truth table demonstrates the various combinations:

  -------+------------+--------------+--------------+--------------
  row 1  | ('a', 'b') | ('a', undef) | ('a', undef) | ('a', 'b'  )
  row 2  | ('a', 'b') | ('a', undef) | ('a', 'b'  ) | ('a', undef)
  -------+------------+--------------+--------------+--------------
  equal? |    yes     |     yes      |      no      |      no

EXPORT

array_to_moose by default; throw_nonunique_keys, throw_multiple_rows, set_class_ind and set_key_ind if requested.

DIAGNOSTICS

Errors in the call of array-to-moose() will be caught by Params::Validate::Array, q.v.

<array-to-moose> does a lot of error checking, and is probably annoyingly chatty. Most of the errors generated are, of course, self-explanatory :-)

DEPENDENCIES

  Carp
  Params::Validate::Array
  Array::GroupBy

SEE ALSO

DBI, Moose, Array::GroupBy

BUGS

The handling of Moose type constraints is primitive.

AUTHOR

Sam Brain <samb@stanford.edu>

COPYRIGHT AND LICENSE

Copyright (c) Stanford University. June 6th, 2010. All rights reserved. Author: Sam Brain <samb@stanford.edu>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.