lib/DBIx/DataModel/Doc/Design.pod

=encoding ISO8859-1

=head1 NAME

DBIx::DataModel::Doc::Design - Architecture and design principles


=head1 DOCUMENTATION CONTEXT

This chapter is part of the C<DBIx::DataModel> manual.


=over

=item *

L<SYNOPSIS AND DESCRIPTION|DBIx::DataModel>

=item *

DESIGN

=item *

L<QUICKSTART|DBIx::DataModel::Doc::Quickstart>

=item *

L<REFERENCE|DBIx::DataModel::Doc::Reference>

=item *

L<COOKBOOK|DBIx::DataModel::Doc::Cookbook>

=item *

L<INTERNALS|DBIx::DataModel::Doc::Internals>

=item *

L<GLOSSARY|DBIx::DataModel::Doc::Glossary>

=back


This chapter covers the basic architecture of C<DBIx::DataModel>,
and the main motivating principles for proposing yet
another ORM. Read it if you are currently evaluating whether
C<DBIx::DataModel> is suitable for your context, or if you want to
globally understand how it works. Skip it and jump to
the L<QUICKSTART|DBIx::DataModel::Doc::Quickstart> chapter if you want
to directly start using the framework.

=head1 GENERAL ARCHITECTURE

=head2 Schema definitions centralized in one single module

Table and association definitions for C<DBIx::DataModel> are
centralized in one single file; this is in strong contrast with most
other Perl ORM frameworks, where each table has its own C<.pm> file,
and where association declarations are scattered among such files.

Nevertheless, C<DBIx::DataModel> I<does> create a Perl package for each
table; but such packages are directly added into the interpreter's symbol
table. Each
L<Table()|DBIx::DataModel::Doc::Reference/"Table()"> declaration
in the schema definition file creates a new Perl package
under the namespace of the current schema.

Even if packages are created in memory, without being associated
with specific C<.pm> files, it is still possible to 
add supplementary methods into them.,
Details are shown in the 
L<cookbook example|DBix::DataModel::Doc::Cookbook/"Add custom methods into a generated table class">.



=head2 Classes

The following picture shows the main classes in C<DBIx::DataModel>
class hierarchy :

         FRAMEWORK CLASSES
         *****************

   +================+   +============================+  +===============+
   | DBIx::DataModel|   | DBIx::DataModel::Statement |  | ...::ResultAs |
   +================+   +============================+  +=====,=========+
                                                             /
                                                            /
                                         +=================^=====+
                                         | ...::ResultAs::*      +==+
                                         +==+====================+  |==+
                                            +==+====================+  |
                                               +=======================+

   +=========================+   +=========================+
   | DBIx::DataModel::Schema |	 | DBIx::DataModel::Source |
   +==,======================+ 	 +===,=================.===+
      |                             /                   \
      |                            /                     \
      |     +=====================^==+  +=================^=====+
      |     | DBIx::DataModel::Table |  | DBIx::DataModel::Join |
      |     +=================,======+  +====================,==+
      |                       |                             /
  ====|=======================|============================/=======
      |  APPLICATION CLASSES  |                           /
      |  *******************  |                          /
      |                       |                         /
  +===^======+     +==========^========+               /
  | MySchema |     | MySchema::Table_n |==+           /
  +==========+     +==+=========.======+  |==+       /
                      +==+=======\==.=====+  |      /
                         +========\==\==.====+     /
                                   \  \  \        /
                                  +=^==^==^======^========+
                                  | MySchema::AutoJoin::* +==+
                                  +==+====================+  |==+
                                     +==+====================+  |
                                        +=======================+

The top half of the picture represents basic classes distributed with
C<DBIx::DataModel>. The bottom half represents application-specific
subclasses, built through class definition methods (see below).  Most
objects created during the lifetime of the application will be
instances of those application-specific subclasses.

The entry class
L<DBIx::DataModel|DBIx::DataModel>
is just a façade interface to
L<DBIx::DataModel::Schema|DBIx::DataModel::Schema>.

Class
L<DBIx::DataModel::Statement|DBIx::DataModel::Statement>
implements short-lived objects which gather information for issuing
requests to the database. Data received from the database can then
be packaged into various
L<kinds of results|DBIx::DataModel::Doc::Glossary/"result kind">;
each result kind is implemented as a subclass of
L<DBIx::DataModel::Schema::ResultAs|DBIx::DataModel::Schema::ResultAs>.


L<DBIx::DataModel::Schema|DBIx::DataModel::Schema> is the parent class
for schema classes created by 
L<Schema()|DBIx::DataModel::Doc::Reference/"Schema()">
or
L<define_schema()|DBIx::DataModel::Doc::Reference/"define_schema()">
methods.
In most cases only one such subclass will be needed,
unless the application talks to several databases.

Subclasses of
L<DBIx::DataModel::Source::Table|DBIx::DataModel::Source::Table>
represent tables in the database and are created by methods
L<Table()|DBIx::DataModel::Doc::Reference/"Table()">
or
L<define_table()|DBIx::DataModel::Doc::Reference/"define_table()">.
Calls to
L<Association()|DBIx::DataModel::Doc::Reference/"Association()">
or
L<define_association()|DBIx::DataModel::Doc::Reference/"define_association()">
automatically add new methods into these table classes, so that
a database row, represented by an instance of one table class,
can reach related rows in associated tables.

Subclasses of L<DBIx::DataModel::Source::Join|DBIx::DataModel::Source::Join>
represent database queries that join several tables.
They are created indirectly through calls to the
L<define_join()|DBIx::DataModel::Doc::Reference/"define_join()">
method. Subclass cration happens dynamically, every time a 
new join request is met in the application; if a previous join
involving the same tables was already met, the corresponding
subclass is reused.
C<Join> subclasses inherit from
L<DBIx::DataModel::Source::Join|DBIx::DataModel::Source::Join>,
but also from tables involved in the database join (thanks
to Perl support for multiple inheritance). As a result,
instances of joins can invoke all methods of their parent tables.

This short tour of C<DBIx::DataModel> architecture only covered
the most visible classes. Other classes involved in the internal
infrastructure will be introduced later in the documentation.


=head2 Instances

A schema instance holds a connexion to a database. Most often
there is only one schema instance, because by default the framework is in 
L<single-schema mode|DBIx::DataModel::Doc::Glossary/"single-schema mode">; but
if necessary the application can switch to
L<multi-schema mode|DBIx::DataModel::Doc::Glossary/"multi-schema mode">,
where several schema instances can coexist.

Instances of the application-specific C<Table> and C<Join> subclasses
encapsulate data rows.  They possess methods for navigating through
associations in the database and methods for modifying the data.
When in multi-schema mode, each instance representing a data row
also has an internal attribute pointing to the schema from which
it was retrieved.

Instances of
L<DBIx::DataModel::Statement|DBIx::DataModel::Statement>
encapsulate SELECT requests to the database.  Such
instances have methods for preparing the SQL query, binding parameters
to it, executing the query, and retrieving the resulting data rows.
Statement instances are usually short-lived and confined to specific
internal parts of the application, while data rows (instances of
tables or joins) are usually transmitted to the presentation layers of
the application, in order to use the data within reports, forms, etc.
Data row instances have no attribute pointing to the statement
from which they were generated.


The following picture shows relationships between classes
and instances :

             FRAMEWORK CLASSES       +============================+
             *****************       | DBIx::DataModel::Statement |
                                     +========================,===+
                                                              |
  ============================================================|=====
             APPLICATION CLASSES                              |
             *******************                              |
                                                              |
  +==========+     +===================+                      |
  | MySchema |     | MySchema::Table_n |=+                    |
  +==========+     +==+================+ |=+                  |
       |              +==+===============+ |                  |
       |                 +===,=============+                  |
       |                     |                                |
       |                     |    +=====================+     |
       |                     |    | MySchema::AutoJoin  +=+   |
       |                     |    +==+==================+ |   |
       |                     |       +=,==================+   |
       |                     |         |                      |
  =====|=====================|=========|======================|=====
       |     INSTANCES       |         |                      |
       |     =========       |         |                      |
   +--------+            +-----+    +-----+  next()  +-----------+
   | schema |            | row |    | row |<==<==<===| statement |
   +--------+            +-----+    +-----+          +-----------+



=head2 Meta-classes

Each application subclass has a I<metaclass>, i.e. an instance of a
class in the C<DBIx::DataModel::Meta> namespace.  This meta-object is
accessible from the class through the
L<metadm()|DBIx::DataModel::Doc::Reference/"metadm()"> class method.
Conversely, metaclasses have a
L<class()|DBIx::DataModel::Doc::Reference/"Meta-source methods"> method
to retrieve the application subclass to which they are bound.

Metaclasses hold information about application classes,
and implement some of their methods. In most cases,
this is totally transparent to end users; however, users
can interact with metaclasses to get some information
about the available tables, associations, etc., or
even to change their behaviour.
The picture below shows relationships between application classes and
the meta-objects to which they are related (classes start with capital
letters, instances start with lowercase letters).

          FRAMEWORK CLASSES
          *****************
  +==============+  +=============+  +============+  +============+
  | Meta::Schema |  | Meta::Table |  | Meta::Join |  | Meta::Path |
  +====,=========+  +==========,==+  +=====,======+  +,===========+
       |                       |           |          |  +===================+
       |                       |           |          |  | Meta::Association |
       |                       |           |          |  +=======,===========+
   ====|=======================|===========|==========|==========|==
       |  APPLICATION CLASSES  |           |          |          |
       |  AND META-OBJECTS     |           |          |          |
       |  *******************  |           |          |          |
  +----^------+   +========+   |           |          |  +-------^----------+
  |meta_schema|---|MySchema|   |           |          |  | meta_association |
  +---------x-+   +========+   |           |          |  +----x-------------+
             \_________________|___________|__________|_____ /
                          \    |           |          |     /
   +=================+   +-x---^------+    |     +----^----x-+
   | MySchema::Table |---| meta_table |----|-----| meta_path |
   +=================+   +----------x-+    |     +x----------+
                                     \     |     /
                                      \    |    /
         +=======================+   +-x---^---x-+
         | MySchema::AutoJoin::* +---| meta_join |
         +=======================+   +-----------+



=head2 Polymorphic methods

=head3 Principle

Some methods like C<join()> or C<select()> are heavily I<polymorphic>,
in the sense that they can be applied to various kinds of invocants,
with various kinds of arguments, and can return various kinds of
results.  Polymorphism in this way is not exactly common
object-oriented practice, but it has been intentionally designed as such,
in a attempt to "do the right thing" in different situations, while
hiding inner details from users. This is similar in sprit to the
the "do what I mean" (DWIM) principle of Perl design, where similar
constructs may mean different things depending on the context.

Subsections below give more details about how such methods behave in
various contexts.

=head3 Polymorphic C<join()>

The
L<join()|DBIx::DataModel::Doc::Reference/Schema::join()>
method, when applied to a
L<Schema|DBIx::DataModel::Schema>,
calls
L<define_join()|DBIx::DataModel::Doc::Reference/define_join()>
to define or retrieve a join
subclass; then it creates a fresh instance of that class. From that object, data
can then be retrieved through the
L<select()|DBIx::DataModel::Doc::Reference/select()> method.


When applied as a class method to
a L<Source|DBIx::DataModel::Source> subclass,
L<join()|DBIx::DataModel::Doc::Reference/Source::join()>
creates a new statement to query one or several tables associated with
the source.  The new statement contains a
condition to restrict the results according to the primary key of the
initial source.  That statement cannot be executed yet, because the
values of the primary key are not known until we have an instance of
the initial source; but the statement can already be
L<prepare|DBIx::DataModel::Doc::Reference/prepare()>d.  Later on, we can
L<bind|DBIx::DataModel::Doc::Reference/bind()> the statement
to an instance of the initial source, and then execute it.

When applied to a I<data row> (to an instance of a
L<Table|DBIx::DataModel::Source::Table>
or L<Join|DBIx::DataModel::Source::Join> subclass),
L<join()|DBIx::DataModel::Doc::Reference/Source::join()>
is an instance method that works like the
class method above (i.e. it creates a statement), but in addition,
values of the current object are immediately bound to the appropriated
placeholders within the query; hence the statement can be
L<execute|DBIx::DataModel::Doc::Reference/execute()>d. 
This is illustrated below in the L<section about statements|/"STATEMENT OBJECTS">.

So in all of these cases, C<join()> is meant to produce a
statement from which one can later C<select> in order
to retrieve data rows.


=head3 Polymorphic C<select()>

The L<select()|DBIx::DataModel::Doc::Statement/select()>  method
is most commonly called from
L<Source|DBIx::DataModel::Source> classes, but this
is just a proxy to the implementation in the
L<Statement|DBIx::DataModel::Statement> class.
This method causes the statement object to progress through
its internal lifecycle, until it is ready to produce results.
The kind of result which is produced depends on what was 
requested through the C<-result_as> parameter : it can be
a single row, a list of rows, a file, or other kinds
of datastructures.
More details about statements are provided below in the
L</"STATEMENT OBJECTS"> section.


=head2 Named parameters instead of positional parameters

As a general policy, most method calls use
I<named parameters> instead of I<positional parameters>; that is,
most method calls are of shape

  $obj->method(-arg1 => $val1, -arg2 => $val2);

instead of 
  
  $obj->method($val1, $val2);

Named parameters are more verbose, but they convey more explicitly
the purpose of each parameter, and they are exensible : a subclass
may override a method and accept more named parameters than its
parent class, without introducing perturbations with the original
behaviour.

Like L<DBIx::Class> and possibly other ORMs, 
C<DBIx::DataModel> relies on L<SQL::Abstract> for generating SQL;
but since that module only supports positional parameters, a specific
subclass was designed to support named parameters, plus a number of
other exensions. That subclass is distributed separately under the name
L<SQL::Abstract::More>. Thanks to that extension, all C<select()> calls
from C<DBIx::DataModel> use named parameters C<-columns>, C<-where>,
C<-order_by>, etc., which makes it much easier to read the source code
and understand where each parameter to a query  belongs.


=head1 ASSOCIATIONS

=head2 Syntax for defining associations

=head3 Front-end method : C<Association()>

Associations are expressed in a syntax
designed to closely reflect how they would be pictured
in a Unified Modelling Language (UML) diagram. The general
form, using the I<front-end>
L<Association()|DBIx::DataModel::Doc::Reference/Association()> method,
is :

  $schema->Association([$class1, $role1, $multiplicity1, @columns1],
                       [$class2, $role2, $multiplicity2, @columns2]);

Let's consider a particular example :

  $schema->Association([qw/Department department 1 /],
                       [qw/Activity   activities * /]);


This declaration corresponds to the following UML diagram :

  +------------+                         +------------+
  |            | 1                  0..* |            |
  | Department +-------------------------+ Activities |
  |            | department   activities |            |
  +------------+                         +------------+


The diagram states that there is an association between C<Department>
and C<Activity>, where each side of the association has a I<role name>
and a I<multiplicity>. In this example, a department may host zero,
one or many activities; in the other direction, an activity is done
within exactly one department.

In a UML diagram, role names and multiplicities are usually optional,
and indeed are often omitted, because they do not add significant
information : often the reader can infer them from the context. By
contrast, roles and multiplicities are mandatory in a
C<DBIx::DataModel> association declaration, because they are needed
for building the internal datastructures to make it work.
In addition, the framework also needs to
know the join column names on both sides : this is what the
C<@columns1> and C<@columns2> arguments stand for in the general
syntax shown above. However, join column names can be omitted
if they have the same names in both tables : in that case,
C<DBIx::DataModel> will automatically guess the join column names
from the primary key of the table with multiplicity 1.
The department-activity example above used that implicit form.

The association declaration is bidirectional, so it will
simultaneously add features in both participating classes.
Role names declared in the association are used for a number of
purposes : 

=over

=item *

implementing path methods for direct navigation;

=item *

implementing C<insert_into_*> methods for inserting new rows
related to a given parent row;

=item *

implementing multi-step navigation paths through several assocations, like in :

   $department->join(qw/activities employee spouse/)
              ->select(-columns => \@some_columns,
                       -where   => \%some_criteria);

=back


Information known by the meta-schema about the associations will be used to
automatically generate appropriate SQL for the database joins. The kinds of
SQL joins (INNER JOIN, LEFT OUTER JOIN) are inferred from multiplicities
declared in the associations, but they can be overridden
by writing for example

   ...->join(qw/activities <=> employee <=> spouse/) # inner joins

   ...->join(qw/activities  => employee  => spouse/) # left joins


Associations in C<DBIx::DataModel> must be declared even if 
the associations between tables are already modelled by 
L<referential integrity rules|DBIx::DataModel::Doc::Glossary/"referential integrity rule">
within the database schema  :
this is because rules in the database do not
contain information about role names and multiplicities, and therefore
are not sufficient for generating the C<DBIx::DataModel> associations.
Nevertheless,
L<DBIx::DataModel::Schema::Generator|DBIx::DataModel::Schema::Generator>
can generate a skeleton for the C<DBIx::DataModel> schema; but that
skeleton usually needs some manual additions to become truly useful.

=head3 Back-end method : C<define_association()>

Associations can also be declared through the I<back-end> method
L<define_association()|DBIx::DataModel::Doc::Reference/"define_association()">.
The principle is exactly the same, but the back-end method
uses named parameters instead of positional parameters : therefore
it is more explicit and more flexible, at the cost of being more verbose.

With C<define_association()>, our example above would be written

  my $meta_schema = $schema->metadm;
  $meta_schema->define_association(
    name => 'Department_activity', # or whatever name you prefer
    kind => 'Association',
    A    => {
      table        => $meta_schema->table('Department'),
      role         => 'department',
      multiplicity => [1, 1],
      join_cols    => [qw/dpt_id/],
    },
    B    => {
      table        => $meta_schema->table('Activity'),
      role         => 'activities',
      multiplicity => [0, '*'],
      join_cols    => [qw/dpt_id/],
    },
  );

=head2 Technical consequences of an association definition

This section describes what happens when a new association is defined.

=head3 Creation of a C<Meta::Association> instance

A new instance of
L<DBIx::DataModel::Meta::Association|DBIx::DataModel::Meta::Association>
is created, mainly for supporting reflection
queries (clients asking what are are the tables and associations in the
current schema).


=head3 Creation of C<Meta::Path> instances

Two instances of
L<DBIx::DataModel::Meta::Path|DBIx::DataModel::Meta::Path>
are created and registered into their corresponding
meta_table objects. Such paths will be queried for resolving
joins, so for example a method call like

  $schema->join(qw/Table path1 path2 .../)

will walk through the paths to find out which tables and which join conditions
are involved.

=head3 Insertion of path methods within table classes

Path methods are added into classes on both sides of the association; these
methods are named according to the role names.
The result of a path method depends on the multiplicity : if the maximum
multiplicit is 1, the result is a single row; if the maximum
multiplicity is C<'*'>, the result is an arrayref of rows.


The UML diagram should be read crosswise to understand where path methods
are generated. For example the UML picture

  +--------+                     +--------+
  |        | *              0..1 |        |
  | Table1 +---------------------+ Table2 |
  |        | role1         role2 |        |
  +--------+                     +--------+

shows that from an object of C<Table1>, you need a method C<role2> to access
the associated object of C<Table2>; therefore the path method C<role2>
will be added into the C<Table1> class, not C<Table2>.
Sometimes UML diagrams get this picture wrong, with role names on the
other side; this can happen in particular when modelers have a
background in Entity-Relationship or Merise methods, where the
convention for role names is the reverse from UML. So be sure
that your association declarations are compliant with the UML
convention; otherwise C<DBIx::Model> will not behave according
to your expectations.

To illustrate how this works, the
path method C<activities()> corresponding to the example above
will generate the following SQL query :

  SELECT * FROM Activity WHERE dpt_id = '$a_department->{dpt_id}'

The method can also accept additional parameters
in L<SQL::Abstract::More> format, exactly like the
L<select()|DBIx::DataModel::Doc::Reference/"select()"> method. So for example

  my $activities = $a_department->activities(-columns => [qw/act_name salary/],
                                             -where   => {is_active => 'Y'});

would perform the following SQL request :

  SELECT act_name, salary FROM Activity WHERE
    dpt_id = '$a_department->{dpt_id}' AND
    is_active = 'Y'

A primitive form of caching is supported : if the method is called
without any parameters, and if that path was previously expanded (see
L<expand()|/"expand">), i.e. if the object hash contains an entry
C<< $a_department->{activities} >>, then this data is reused instead of
calling the database again.  To force a new call to the database,
it suffices to pass some parameters :

  $dpt->expand('activities');  # stores result in $dpt->{activities}
  $list = $dpt->activities;    # returns cached $dpt->{activities}
  $list = $dpt->activities(-columns => '*');  # does not change the default
                                              # columns, but forces a new call
                                              # to the database

Sometimes associations are unidirectional (it does not make sense to
traverse the association in both directions).  In such cases, use an
I<anonymous> role, expressed by an empty string, or by strings
C<"0">, C<'""'>, C<"--"> or C<"none"> : then the corresponding
path method is not generated.


=head3 Insertion of C<insert_into_$path> methods

When a role has multiplicity '*', a method
named C<insert_into_...> is added to the table class.
That method will create new objects of the associated class, taking
care of the linking automatically :

  $a_department->insert_into_activities({d_begin => $today,
                                         emp_id  => $emp_id});

This is equivalent to

  $schema->table('Activity')->insert({d_begin => $today,
                                      emp_id  => $emp_id,
                                      dpt_id  => $a_department->{dpt_id}});

Observe how the C<dpt_id> column was automatically added
into the hash to be inserted.

=head2 Many-to-many associations

UML conceptual models may contain associations where the multiplicity
is '*' on both sides (these are called
L<many-to-many|DBIx::DataModel::Doc::Glossary/"many-to-many">
associations). However, when it comes to actual database
implementation, such associations need an intermediate linking table
to collect pairs of identifiers from both tables.
Therefore a many-to-many association in C<DBIx::DataModel>
is declared by specifying how both sides relate to the
linking table. 

The linking table needs to be declared first :

  $schema->Table(qw/LinkTable link_table prim_key1 prim_key2/);

  $schema->Association([qw/Table1     role1       0..1/],
                       [qw/LinkTable  link_table  *   /]);

  $schema->Association([qw/Table2     role2       0..1/],
                       [qw/LinkTable  link_table  *   /]);

This describes a diagram like this :

  +--------+                   +-------+                   +--------+
  |        | 0..1            * | Link  | *            0..1 |        |
  | Table1 +-------------------+  --   +-------------------+ Table2 |
  |        | role1  link_table | Table | link_table  role2 |        |
  +--------+                   +-------+                   +--------+

Then we can declare the  many-to-many association, very much like
ordinary associations, except that the last items in the argument
lists are names of paths to follow, instead of names of columns to join.
In the diagram above, we must follow paths C<link_table> and C<role2>
in order to obtain the rows of C<Table2> related to an instance
of C<Table1>; so we write

  $schema->Association([qw/Table1  roles1  *  link_table role1/],
                       [qw/Table2  roles2  *  link_table role2/]);

which describes a diagram like this :

              +--------+                    +--------+
              |        | *                * |        |
              | Table1 +--------------------+ Table2 |
              |        | roles1      roles2 |        |
              +--------+                    +--------+

The declaration has created a new method C<roles2> in
C<Table1>; that method is implemented by following paths
C<linksA> and C<role2>. So for an object C<obj1> of C<Table1>,
the call

  my $obj2_arrayref = $obj1->roles2();

will generate the following SQL :

  SELECT * FROM link_table INNER JOIN table2
            ON link_table.prim_key2=table2.prim_key2
    WHERE link_table.prim_key1 = $obj->{prim_key1}


Observe that C<roles2()> returns rows from a I<join>,
so these rows will belong both to C<Table2> I<and> to
C<Link_Table>.

Many-to-many associations do not have an
automatic C<insert_into_*> method : you must
explicitly insert into the link table.


=head2 Navigating through multiple associations

In the previous section we were following two roles at once
in order to implement a many-to-many association. More generally,
it may be useful to follow several roles at once, joining
the tables in a single SQL query. This can be done through the
following methods :

=over

=item *

L<Schema::join()|DBIx::DataModel::Doc::Reference/Schema::join()> :
create a new C<Join> that selects
from several tables, filling the joins automatically


=item *

L<Table::join()|DBIx::DataModel::Doc::Reference/Table::join()>  :
from a given row object, follow a list of paths to get information
from associated tables.

=item *

L<Table::join()|DBIx::DataModel::Doc::Reference/Table::join()>  :
from a given class, follow a list of roles to prepare getting information
from associated tables; the result is a L<Statement|DBIx::DataModel::Statement>
object that can be bound to specific members of the initial table
and then can be selected.


=item *

L<define_navigation_method()|DBIx::DataModel::Doc::Reference/Table::define_navigation_method()>  :
add a new method in a table, that will follow a list of roles; this is
like compiling a C<join()> through several associated tables into
a method.

=back



=head1 STATEMENT OBJECTS

=head2 Purpose

A I<statement> object encapsulates a SELECT request to the database.

Often, the client code does not even see that a statement object
is doing the work : for example in a query like

  my $rows = $schema->table($name)->select(-columns => \@columns
                                           -where   => \%condition);

the results are directly retrieved as an arrayref of rows.  Behind the
scene, an intermediate statement object was created to assemble 
the SQL request, issue that request to the database and
package the results; after that work was finished, the statement
object was destroyed automatically.

However, in some situations it is useful to explicitly interact with
the statement object. The rest of this chapter explains the statement
lifecycle and how parameters are assembled in several steps until
the SQL query can be issued.

=head2 Lifecycle

=head3 Principle

The statement object goes through a sequence of I<states> before
delivering results. Some methods are only available in a given
state. At any time, the
L<status()|DBIx::DataModel::Doc::Reference/status()>
method tells which is the current state of the statement object.
The state-transition diagram is pictured below; then each state
is described in more detail in the following section.


=head3 State diagram

  +--------+     select()      +--------------------+
  | source |------------------>|    NEW_statement   |
  +--------+                   +--------------------+
                    refine()      |
                   ,----<---------'
                   |
                   |           +--------------------+
                   '---->----->| REFINED_statement  |<----<-------,
                               +--------------------+     |       |
                    sqlize()      |    |   |    refine()  |       |
                   ,----<---------'    |   '------>-------'       |
                   |                   |                 bind(..) |
                   |                   '------------------>-------'
                   |
                   |           +--------------------+
                   '---->----->| SQLIZED_statement  |<----<-------,
                               +--------------------+             |
                    prepare()     |    |                 bind(..) |
                   ,----<---------'    '------------------>-------'
                   |
                   |           +--------------------+
                   '---->----->| PREPARED_statement |<----<-------,
                               +--------------------+             |
                    execute()     |    |                 bind(..) |
                   ,----<---------'    '------------------>-------'
                   |
                   |           +--------------------+
                   '---->----->| EXECUTED_statement |<----<-------,
                               +--------------------+     |       |
                                  |    |   |     bind(..) |       |
                                  |    |   '------>-------'       |
                                  |    |                execute() |
                                  |    '------------------>-------'
  +-------------+  next()/all()   |
  | data row(s) |-------<---------'
  +-------------+

=head3 Statement states

=over

=item NEW

  my $statement = $source->select(..., -result_as => 'statement);
  # or
  my $statement = DBIx::DataModel::Statement->new($source);
  

The statement has just been created,
and already knows its I<datasource> (an instance of
L<DBIx::DataModel::Meta::Source|DBIx::DataModel::Meta::Source>).

Statements are rarely created by a direct call to the C<new()> method;
instead, they are created indirectly, by calling method C<select()>
on a datasource.


=item REFINED

  $statement->refine(%arguments);

The
L<refine()|DBIx::DataModel::Doc::Reference/refine()>
method stores parameters within the statement object, like
C<-columns> to retrieve, C<-where> clauses, etc.

The statement is then bumped into C<REFINED> state.  While in this
state, C<refine()> can be called again several times, accumulating
parameters in several steps. This is useful for example when one
application module knows only one part of the request,
while other criteria are known by another module; this situation is
quite frequent in web application, where some criteria come from
back-office constraints, while other criteria come from forms filled
by users.

Early binding of values to placeholders may also occur through the
L<bind()|DBIx::DataModel::Doc::Reference/bind()>
method. This useful when some pairs C<< name => $value >> are
already accessible, but without knowing yet where those names will
be used in the SQL query. Such situations can happen for example when
fetching rows associated to an initial data row : the values
of the initial row may become criteria for the new SQL request.


=item SQLIZED

The
L<sqlize()|DBIx::DataModel::Doc::Reference/sqlize()>
method gathers all parameters accumulated so far within the statement,
generates the SQL, and bumps the statement into C<SQLIZED> state.

At this point, it is no longer possible to call the
L<refine()|DBIx::DataModel::Doc::Reference/refine()>
method to add new clauses. However, it is still possible to call the
L<bind()|DBIx::DataModel::Doc::Reference/bind()>
method to bind values to the placeholders.

=item PREPARED

The
L<prepare()|DBIx::DataModel::Doc::Reference/prepare()>
method calls L<DBI/prepare> to get a DBI sth handle, and
bumps the statement into C<PREPARED> state.

=item EXECUTED

The
L<execute()|DBIx::DataModel::Doc::Reference/execute()>
method calls L<DBI/execute> to execute the statement on the database side,
and bumps the statement into C<EXECUTED> state.

Ath this point, the statement is ready to extract data rows
through the L<next()|DBIx::DataModel::Doc::Reference/next()>
or L<all()|DBIx::DataModel::Doc::Reference/all()> methods.

=back



=head2 Stepwise building of the SQL query

=head3 Principle

A statement object can accumulate requirements in several
steps, before generating the actual database query.
Therefore various independent client components can
contribute to various parts of the final SQL.

=head3 Example

  # create a statement with initial conditions on a department object
  my $statement = $department->join(qw/activities employee/);

  # add a date condition (from config file or CGI params or whatever)
  my $date = get_initial_date_from_some_external_source();
  $statement->refine(-where => {d_begin => {">" => $date}});

  # now issue the SQL query
  my $rows = $statement->select(-columns => [qw/d_begin lastname firstname/]);

This code generates the following SQL :

  SELECT d_begin, lastname, firstname
  FROM   activity INNER JOIN employee
                  ON activity.emp_id=employee.emp_id
  WHERE  dpt_id  = $departement->{dpt_id}
    AND  d_begin > $date

The C<join> method applied to an instance of C<Department> first
created a view representing the database join between C<activity> and
C<employee>; then it created a statement object that would query that
view with an initial condition on C<dpt_id>.  The C<refine> call added
a second condition on C<d_begin>.  Finally the C<select> method
specified which columns to retrieve.


=head3 Stepwise parameter binding through named placeholders

C<DBIx::DataModel::Statement> objects have their own mechanism of
placeholders, i.e. places within a SQL statement where values need to
be inserted, but where the actual values are supplied in a separate
step.  Ultimately these placeholders get translated into usual
placeholders at the C<DBI> and database layers; but an additional
layer was needed here in order to allow for stepwise building of SQL
conditions, as just demonstrated above.

Stepwise binding of values to placeholders requires
I<named placeholders>, as opposed to usual positional placeholders.
Named placeholders are recognized according to a I<placeholder prefix>,
which by default is C<?:>. Here is an example :

  $statement->refine(-where => {col1 => '?:foo',
                                col2 => '?:bar',
                                col3 => '?:foo'});

A call to the L</bind()> method associates values with named placeholders :

  $statement->bind(foo => 123, bar => 456);

This binding can happen either I<before> or I<after> the C<refine()>
step.  In other words, collaboration scenarios can vary : placeholders
can be inserted into the statement before knowing which values they
will take, or on the contrary values can be supplied before knowing
where they will be used (or even if they will be used at all).

If the default placeholder prefix C<?:> is inconvenient, another
placeholder prefix may be specified as an option to the
L<schema creation method|DBIx::DataModel::Doc::Reference/Schema>.


=head1 COLUMN HANDLERS AND TYPES

=head2 Column handlers

A I<column handler> is a subroutine associated to a column under a given
name. Given any C<$row> object, a call to

  $row->apply_column_handler($handler_name)

will iterate over all columns present in that row, check
if these columns have a handler of the corresponding name, and if so,
execute the associated code.

Column handlers are registered
either at table definition time through the
L<define_table()|DBIx::DataModel::Doc::Reference/define_table()> method,
or later through the
L<define_column_handlers()|DBIx::DataModel::Doc::Reference/define_column_handlers()>
method. The signature for handler subroutines is detailed in 
L<DBIx::DataModel::Doc::Reference/define_type()>.


The concept of handlers is generic, but some specific handler names
play particular roles : handler B<from_DB> is called automatically
just after reading data from the database, and handler B<to_DB> is
called automatically just before writing into the database. Handler
name B<validate> is used by the method
L<DBIx::DataModel::Doc::Reference/"has_invalid_columns()">. Other
usage of handlers is up to the client code. Handlers may be used
for example for 


=over

=item *

converting dates between internal database format and user presentation format

=item *

converting empty strings into null values

=item *

inflating scalar values into objects

=item *

validating column data

=back


=head2 Types as collections of handlers

A I<Type> is just a collection of handlers, registered under a type
name : it is a convenience for associating the same collection of handlers
to various columns in various tables.
Here is an example from the L<DBIx::DataModel/SYNOPSIS> :

  My::Schema->metadm->define_type(
    name     => 'Percent',
    handlers => {
      from_DB  => sub {$_[0] *= 100 if $_[0]},
      to_DB    => sub {$_[0] /= 100 if $_[0]},
      validate => sub {$_[0] =~ /1?\d?\d/}),
    });

Note that this notion of "type" is independent from the actual
datatypes defined within the database (integer, varchar, etc.).
From the Perl side, these are all seen as scalar values; 
a column type as defined here is just a way to specify some
operations, programmed in Perl, that can be performed on the
scalar values.



=head1 GENERAL DESIGN PRINCIPLES


Material in the previous sections presented the general architecture
of C<DBIx::DataModel>; this should be enough to easily follow the
L<QUICKSTART|DBIx::DataModel::Doc::Quickstart> chapter,
or investigate more details in the
L<REFERENCE|DBIx::DataModel::Doc::Reference> chapter.


The present section will discuss the motivation for some design
features of C<DBIx::DataModel>, in order to explain not only I<how> it
works, but also I<why> it was designed that way : this may be of
interest to readers who want to compare various ORMs, or to architects
who must decide whether or not they will include C<DBIx::DataModel>
in their collection of components.



=head2 Collaborate with lower-level layers, do not hide them

C<DBIx::DataModel> provides high-level abstractions that help client
applications to automate some common tasks; however, access to
lower-level layers remains open, for situations where detailed control
is needed.  Here are some ways of collaborating with lower layers :

=over

=item *

Any call to the C<select()> method (including indirect calls through
path methods in row objects) can use the C<-return_as> parameter to
ask for the underlying DBI statement, or even just the generated SQL
code, instead of getting the usual list of data rows as result.
From there, the application code can interact directly with the
DBI layer.

=item *

Conversely, the application code can start by getting raw results from
the DBI layer, and then bring those results to the
C<DBIx::DataModel> layer through the
L<bless_from_DB()|DBIx::DataModel::Doc::Reference/"bless_from_DB()">
method. Objects blessed in that way can then benefit from all methods
provided by C<DBIx::DataModel>.


=item *

Hooks can be inserted at various stages of the statement
lifecycle : see parameters C<-post_SQL>, C<-pre_exec>, etc.
to the
L<select()|DBIx::DataModel::Doc::Reference/"select()">
method. This provides an opportunity for running driver-specific
or application-specific code at a particular point in the
lifecycle. If the same hook is needed in every statement,
another possibility is to subclass
L<DBIx::DataModel::Statement> and override the C<prepare()>,
C<execute()> or C<select()> methods.


=item *

The internal representation of a row object is just a plain Perl
hashref. Application code can take advantage of usual Perl idioms for
dealing with such hashrefs, for example for extracting keys, values or
slices of data, or for passing the whole datastructure to external
helper modules such as XML generators, Perl dumps, javascript JSON,
templates of the Template Toolkit, etc. Such modules need to walk on
the data tree, exploring keys, values and subtrees; so they cannot
work if data columns are implemented as object-oriented methods.

=back


=head2 Let the database do the work

In the spirit of collaborating with the database instead of hiding its
functionalities under an object-oriented cover, several tasks are
deliberately not included within the C<DBIx::DataModel> framework,
under the assumption that such tasks will be better handled by the
database directly.


=head3 Use RDBMS tools to create the schema

Besides basic SQL data definition statements,
RDBMS often come with their own helper tools for creating or modifying
a database schema (interactive editors for tables,
columns, datatypes, etc.). Therefore
C<DBIx::DataModel> provides no support in this area,
and assumes that the database schema is pre-existent.

To communicate with the database, the framework only needs to know a
bare minimum about the schema: table names, primary keys
and UML associations. No details are required about column names
or their datatypes.


=head3 Let the RDBMS check data integrity

Most RDBMS have facilities for checking or ensuring integrity rules :
foreign key constraints, restricted ranges for values, cascaded
deletes, etc. C<DBIx::DataModel> can also do some validation
tasks, by setting up column types with a C<validate> handler;
however, it is recommended  to rather use the RDBMS for
performing data integrity checks, whenever possible.

=head3 Take advantage of database projections through variable-size objects

In many ORMs, columns in a table are in 1-to-1 correspondence
with attributes in the associated class; so any transfer between
database and memory systematically includes all the columns, both
for selects and for updates. Of course this has the advantage
of simplicity for the programmer; however, it may be very inefficient
if the client program only wants to read two columns from
a very big table.

Furthermore, unexpected concurrency problems may occur : in a scenario such as

  client1                            client2
  =======			     =======
  my $obj = My::Table->fetch($key);  my $obj = My::Table->fetch($key);
  $obj->set(column1 => $val1);	     $obj->set(column2 => $val2);
  $obj->update;                	     $obj->update;

the final state of the row should theoretically
be consistent for any concurrent execution of C<client1> and C<client2>.
However, if the ORM layer blindly updates I<all> columns, instead of just
the changed columns, then the final value of C<column1> or
C<column2> is unpredictable.

To diminish the efficiency problem, some ORMs offer the possibility
to partition columns into several I<column groups>. The ORM layer
then transparently fetches the appropriate groups in several steps,
depending on which columns are requested from the client. However,
this might be another source of inefficiency, if the client
frequently needs one column from the first group and one from the
second group.

With C<DBIx::DataModel>, the client code has precise control over
which columns to transfer, because these can be specified separately at
each method call. Whenever efficiency is not an issue, one
can be lazy and specify nothing, in which case the SELECT columns will
default to "*". Actually, the schema
I<does not know about column names>, except for primary and
foreign keys, and therefore would be unable to transparently
decide which columns to retrieve. Consequently, objects from a
given class may be of I<variable size> :

  my $objs_A = My::Table->select(-columns => [qw/c1 c2/],
		 	         -where   => {name => {-like => "A%"}};

  my $objs_B = My::Table->select(-columns => [qw/c3 c4 c5/],
			         -where   => {name => {-like => "B%"}};

  my $objs_C = My::Table->select(# nothing specified : defaults to '*'
                                 -where   => {name => {-like => "C%"}};

Therefore the programmer has much more freedom and control, but of
course also more responsability : in this example, attempts to access
column C<c1> in members of C<@$objs_B> would yield an error.


=head3 Exploit database products (joins) through multiple inheritance

ORMs often have difficulties to exploit database joins, because
joins contain columns from several tables at once.
If tables are mapped to classes, and rows are mapped
to objects of those classes, then what should be the
class of a joined row ? Three approaches can be taken :

=over

=item *

ignore database joins altogether : all joins are performed
within the ORM, on the client side. This is of course the
simplest way, but also the less efficient, because many
database queries are needed in order to gather all the data.

=item *

ask a join from the database, then perform some reverse
engineering to split each resulting row into several objects
(partitioning the columns).

=item *

create a new subclass on the fly that inherits from all joined tables :
data rows then simply become objects of that new subclass.

=back


C<DBIx::DataModel> takes the third approach, and seems to be the
sole ORM dealing with database joins in that way.



=head2 Efficiency concerns

Great care has been taken to interact with the database in
the most efficient way, and to leave an open access to
L<DBI|DBI> fine-tuning options for achieving even better
results. In particular :

=over

=item *

C<DBIx::DataModel> can take advantage of DBI's
L<bind_columns|DBI/bind_columns> method, which is the
fastest way to get a large number of data rows from DBI.
With this technique, each row is retrieved into the same memory
location, in order to spare the cost of allocating and free-ing memory 
at each row. This is done in C<DBIx::DataModel>
through a I<fast statement> :

  my $statement = My::Table->select(-columns   => ...,
                                    -where     => ...,
                                    -result_as => 'fast_statement');
  while (my $row = $statement->next) {
    work_with($row);
  }

This code creates a single memory location for storing data rows;
at each call to the C<next> method, that location is updated
with fresh values from the database.

While being very fast, this approach also has some limitations :
for example it is not possible to put such rows into an array
(because the array would merely contain multiple references
to the last row). So fast statements are not activated by default;
regular statements create a fresh hashref for each row.

=item *

The client code can have fine control on statement preparation and
execution, which is useful for writing efficient loops.
For example, instead of writing

  my $list = My::Table->select(...);
  foreach my $obj (@$list) {
    my $related_rows = $obj->join(qw/role1 role2/)->select;
    ...
  }

we can prepare a statement before the loop, and then
just execute that statement at each iteration :

  my $statement = My::Table->join(qw/role1 role2/)->prepare;
  my $list = My::Table->select(...);
  foreach my $obj (@$list) {
    my $related_rows = $statement->execute($obj)->all;
    ...
  }


=item *

C<DBI> has a L<prepare_cached|DBI/prepare_cached> method, that works
like C<prepare> except that the statement handle returned is stored in
a hash associated with the C<$dbh>. This can be exploited from
C<DBIx::DataModel> by stating

  $schema->dbi_prepare_method('prepare_cached');


=item *

The dependencies of C<DBIx::DataModel> to other CPAN modules
are modest; as a result, the number of internal method calls necessary
to answer a query is quite low.


=back



=head1 DESIGN FAQ

Here are answers to some design choices, in the form
of a FAQ.


=head2 Why no accessor methods for columns ?

Columns within a row object are accessed, not with object-oriented
method calls, but as entries within a hash. This is a strong
design choice, which differs from many other ORMs; its motivations
are the following :

=over

=item *

as already stated above, row objects are of variable size, because
each C<select()> call decides which columns will be retrieved from the
database.  An object-oriented API would not be well suited for
handling this situation, because in Perl object-oriented programming
the set of methods is the same for every member of a given class.
Other ORMs usually solve the problem by providing additional methods
for asking which columns are present in the object. By contrast, the
hash API makes it very direct to find out which keys are present
in the hash.

=item *

all common Perl idioms for working with hashes can be used, like :

  # inspect hash keys
  my @column_names = keys @$row;

  # remove leading spaces in all columns
  s/^\s+// foreach values @$row;

  # print a slice
  print @{$row}{qw/col1 col2 col3/};

  # swap values
  ($row->{col1}, $row->{col2}) = ($row->{col2}, $row->{col1});

  # other way to swap values
  @{$row}{qw/col1 col2/} = @{$row}{qw/col2 col1/};

As demonstrated by these examples, the hash API
is especially powerful for dealing with several columns
in one single line of Perl code; an object-oriented API would
be more verbose.

=item *

the hash API is faster and uses less memory than an
object-oriented encapsulation.

=item *

row hashrefs can be converted directly to import/export formats
such as JSON, YAML, XML, etc.

=back

Of course the hash API also has some disadvantages : it is not
possible to intercept method calls for modifying or computing column
data on the fly, nor to implement a mechanism of "dirty columns" to
keep track of which columns where accessed or modified.

Nevertheless, conversion of column values is fully possible through
the mechanisms of column types and C<to_DB> / C<from_DB> handlers.
This is simply done at a different moment : 
instead of doing conversions when columns are I<accessed>,
C<DBIx::DataModel> does conversions just before or just after having
I<communicated with the database>; this is similar in spirit
to a kind of L<PerlIO> layer for databases. 


=head2 Why this C<localize_state> method ?

When C<DBIx::DataModel> is in
L<single-schema mode|DBIx::DataModel::Doc::Glossary/"single-schema mode">,
the schema is a global resource within the application, very much
like other Perl global resources (for example
C<STDIN>, C<%ENV>, or special variables C<$/>, C<$,>, etc.).

When used with care, interaction of several components through a
global resource can be quite handy : for example C<STDIN> does not
need to be explicitly passed to every component, it is always
available; furthermore, C<STDIN> can be redirected at one specific
place and then all collaborating components will immediately change
behaviour accordingly. However, this must be done with care, because
there is also a risk of undesired "action at distance" --- maybe the
other components wanted to continue reading from the real C<STDIN>,
not the redirected one !

To avoid undesired interactions through the global state, Perl offers
the C<local> construct, also known as I<dynamic scoping> (see
L<perlsub>). Using that construct, a component can temporarily
redirect C<STDIN> for the duration of one specific computation, and
then automatically restore it to its previous state.

The
L<localize_state|DBIx::DataModel::Doc::Reference/localize_state>
method of C<DBIx::DataModel> uses a similar approach : it
supports a temporary change to the global schema state (for example
for changing the database handle), and then restores
the schema to its previous state.
That method is also called implicitly by 
L<do_transaction|DBIx::DataModel::Doc::Reference/do_transaction>.


=head2 Serialization

C<DBIx::DataModel> includes support for the standard
L<Storable|Storable> serialization / deserialization
methods C<freeze> and C<thaw> : so records and record trees
can be written into files or sent to other processes.
Dynamic subclasses for database joins are re-created on the fly
during deserialization through C<thaw>. However, there is no support
for serializing database connections (this would be hazardous, and also
insecure because serialization data would contain database passwords).
Therefore the process which performs deserialization is responsible
for opening the database connection by its own means, before
calling the C<thaw> method. 

=head1 TO DO


Here is a list of points to improve in future versions C<DBIx::DataModel> :

  - 'has_invalid_columns' : should be called automatically before insert/update ?
  - 'validate' record handler (at record level, not only column handlers)
  - walk through WHERE queries and apply 'to_DB' handler (not obvious!)
  - add PKEYS keyword in -columns, to be automatically replaced by
    names of primary key columns of the touched tables
  - pre/post callbacks: support arrays of handlers, refine(..) should add
    to the array
  - refine(-order_by => ..) should add to the ordering
  - update with subtrees (insert/update on dependent records. Quid: delete?)
  - auto-unjoin (API for partioning columns into subobjects).
  - support DISTINCT ON ...
  - support find_or_create, update_or_create
  - copy idea from DBIC:Storage:DBI:MultiColumnIn
  - Storable
     - hooks for YAML ?
     - think about STORABLE_attach
     - think about freeze/thaw in multi-schema mode
  - readonly tables
  - savepoints
  - explain how to design families of tables with an intermediate superclass
  - tune croak() in Statement.pm so that SQL errors are reported 
    from caller's line
  - $obj->join(path1 ...)->select(...) should automatically add
    -result_as => 'firstrow' when all multiplicities are 1. See TODO
    in ConnnectedSources::join
  - check auto_insert / auto_update columns to prevent intersections
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)