NAME

DBIx::DataModel::Doc::Design - Architecture and design principles

DOCUMENTATION CONTEXT

This chapter is part of the DBIx::DataModel manual.

This chapter covers the basic architecture of DBIx::DataModel, and the main motivating principles for proposing yet another ORM. Read it if you are currently evaluating whether DBIx::DataModel is suitable for your context, or if you want to globally understand how it works. Skip it and jump to the QUICKSTART chapter if you want to directly start using the framework.

GENERAL ARCHITECTURE

Schema definitions centralized in one single module

Table and association definitions for DBIx::DataModel are centralized in one single file; this is in strong contrast with most other Perl ORM frameworks, where each table has its own .pm file, and where association declarations are scattered among such files.

Nevertheless, DBIx::DataModel does create a Perl package for each table; but such packages are directly added into the interpreter's symbol table. Each Table() declaration in the schema definition file creates a new Perl package under the namespace of the current schema.

Even if packages are created in memory, without being associated with specific .pm files, it is still possible to add supplementary methods into them., Details are shown in the cookbook example.

Classes

The following picture shows the main classes in DBIx::DataModel class hierarchy :

         FRAMEWORK CLASSES
         *****************

   +================+   +============================+  +===============+
   | DBIx::DataModel|   | DBIx::DataModel::Statement |  | ...::ResultAs |
   +================+   +============================+  +=====,=========+
                                                             /
                                                            /
                                         +=================^=====+
                                         | ...::ResultAs::*      +==+
                                         +==+====================+  |==+
                                            +==+====================+  |
                                               +=======================+

   +=========================+   +=========================+
   | DBIx::DataModel::Schema |   | DBIx::DataModel::Source |
   +==,======================+   +===,=================.===+
      |                             /                   \
      |                            /                     \
      |     +=====================^==+  +=================^=====+
      |     | DBIx::DataModel::Table |  | DBIx::DataModel::Join |
      |     +=================,======+  +====================,==+
      |                       |                             /
  ====|=======================|============================/=======
      |  APPLICATION CLASSES  |                           /
      |  *******************  |                          /
      |                       |                         /
  +===^======+     +==========^========+               /
  | MySchema |     | MySchema::Table_n |==+           /
  +==========+     +==+=========.======+  |==+       /
                      +==+=======\==.=====+  |      /
                         +========\==\==.====+     /
                                   \  \  \        /
                                  +=^==^==^======^========+
                                  | MySchema::AutoJoin::* +==+
                                  +==+====================+  |==+
                                     +==+====================+  |
                                        +=======================+

The top half of the picture represents basic classes distributed with DBIx::DataModel. The bottom half represents application-specific subclasses, built through class definition methods (see below). Most objects created during the lifetime of the application will be instances of those application-specific subclasses.

The entry class DBIx::DataModel is just a façade interface to DBIx::DataModel::Schema.

Class DBIx::DataModel::Statement implements short-lived objects which gather information for issuing requests to the database. Data received from the database can then be packaged into various kinds of results; each result kind is implemented as a subclass of DBIx::DataModel::Schema::ResultAs.

DBIx::DataModel::Schema is the parent class for schema classes created by Schema() or define_schema() methods. In most cases only one such subclass will be needed, unless the application talks to several databases.

Subclasses of DBIx::DataModel::Source::Table represent tables in the database and are created by methods Table() or define_table(). Calls to Association() or define_association() automatically add new methods into these table classes, so that a database row, represented by an instance of one table class, can reach related rows in associated tables.

Subclasses of DBIx::DataModel::Source::Join represent database queries that join several tables. They are created indirectly through calls to the define_join() method. Subclass cration happens dynamically, every time a new join request is met in the application; if a previous join involving the same tables was already met, the corresponding subclass is reused. Join subclasses inherit from DBIx::DataModel::Source::Join, but also from tables involved in the database join (thanks to Perl support for multiple inheritance). As a result, instances of joins can invoke all methods of their parent tables.

This short tour of DBIx::DataModel architecture only covered the most visible classes. Other classes involved in the internal infrastructure will be introduced later in the documentation.

Instances

A schema instance holds a connexion to a database. Most often there is only one schema instance, because by default the framework is in single-schema mode; but if necessary the application can switch to multi-schema mode, where several schema instances can coexist.

Instances of the application-specific Table and Join subclasses encapsulate data rows. They possess methods for navigating through associations in the database and methods for modifying the data. When in multi-schema mode, each instance representing a data row also has an internal attribute pointing to the schema from which it was retrieved.

Instances of DBIx::DataModel::Statement encapsulate SELECT requests to the database. Such instances have methods for preparing the SQL query, binding parameters to it, executing the query, and retrieving the resulting data rows. Statement instances are usually short-lived and confined to specific internal parts of the application, while data rows (instances of tables or joins) are usually transmitted to the presentation layers of the application, in order to use the data within reports, forms, etc. Data row instances have no attribute pointing to the statement from which they were generated.

The following picture shows relationships between classes and instances :

             FRAMEWORK CLASSES       +============================+
             *****************       | DBIx::DataModel::Statement |
                                     +========================,===+
                                                              |
  ============================================================|=====
             APPLICATION CLASSES                              |
             *******************                              |
                                                              |
  +==========+     +===================+                      |
  | MySchema |     | MySchema::Table_n |=+                    |
  +==========+     +==+================+ |=+                  |
       |              +==+===============+ |                  |
       |                 +===,=============+                  |
       |                     |                                |
       |                     |    +=====================+     |
       |                     |    | MySchema::AutoJoin  +=+   |
       |                     |    +==+==================+ |   |
       |                     |       +=,==================+   |
       |                     |         |                      |
  =====|=====================|=========|======================|=====
       |     INSTANCES       |         |                      |
       |     =========       |         |                      |
   +--------+            +-----+    +-----+  next()  +-----------+
   | schema |            | row |    | row |<==<==<===| statement |
   +--------+            +-----+    +-----+          +-----------+

Meta-classes

Each application subclass has a metaclass, i.e. an instance of a class in the DBIx::DataModel::Meta namespace. This meta-object is accessible from the class through the metadm() class method. Conversely, metaclasses have a class() method to retrieve the application subclass to which they are bound.

Metaclasses hold information about application classes, and implement some of their methods. In most cases, this is totally transparent to end users; however, users can interact with metaclasses to get some information about the available tables, associations, etc., or even to change their behaviour. The picture below shows relationships between application classes and the meta-objects to which they are related (classes start with capital letters, instances start with lowercase letters).

          FRAMEWORK CLASSES
          *****************
  +==============+  +=============+  +============+  +============+
  | Meta::Schema |  | Meta::Table |  | Meta::Join |  | Meta::Path |
  +====,=========+  +==========,==+  +=====,======+  +,===========+
       |                       |           |          |  +===================+
       |                       |           |          |  | Meta::Association |
       |                       |           |          |  +=======,===========+
   ====|=======================|===========|==========|==========|==
       |  APPLICATION CLASSES  |           |          |          |
       |  AND META-OBJECTS     |           |          |          |
       |  *******************  |           |          |          |
  +----^------+   +========+   |           |          |  +-------^----------+
  |meta_schema|---|MySchema|   |           |          |  | meta_association |
  +---------x-+   +========+   |           |          |  +----x-------------+
             \_________________|___________|__________|_____ /
                          \    |           |          |     /
   +=================+   +-x---^------+    |     +----^----x-+
   | MySchema::Table |---| meta_table |----|-----| meta_path |
   +=================+   +----------x-+    |     +x----------+
                                     \     |     /
                                      \    |    /
         +=======================+   +-x---^---x-+
         | MySchema::AutoJoin::* +---| meta_join |
         +=======================+   +-----------+

Polymorphic methods

Principle

Some methods like join() or select() are heavily polymorphic, in the sense that they can be applied to various kinds of invocants, with various kinds of arguments, and can return various kinds of results. Polymorphism in this way is not exactly common object-oriented practice, but it has been intentionally designed as such, in a attempt to "do the right thing" in different situations, while hiding inner details from users. This is similar in sprit to the the "do what I mean" (DWIM) principle of Perl design, where similar constructs may mean different things depending on the context.

Subsections below give more details about how such methods behave in various contexts.

Polymorphic join()

The join() method, when applied to a Schema, calls define_join() to define or retrieve a join subclass; then it creates a fresh instance of that class. From that object, data can then be retrieved through the select() method.

When applied as a class method to a Source subclass, join() creates a new statement to query one or several tables associated with the source. The new statement contains a condition to restrict the results according to the primary key of the initial source. That statement cannot be executed yet, because the values of the primary key are not known until we have an instance of the initial source; but the statement can already be prepared. Later on, we can bind the statement to an instance of the initial source, and then execute it.

When applied to a data row (to an instance of a Table or Join subclass), join() is an instance method that works like the class method above (i.e. it creates a statement), but in addition, values of the current object are immediately bound to the appropriated placeholders within the query; hence the statement can be executed. This is illustrated below in the section about statements.

So in all of these cases, join() is meant to produce a statement from which one can later select in order to retrieve data rows.

Polymorphic select()

The select() method is most commonly called from Source classes, but this is just a proxy to the implementation in the Statement class. This method causes the statement object to progress through its internal lifecycle, until it is ready to produce results. The kind of result which is produced depends on what was requested through the -result_as parameter : it can be a single row, a list of rows, a file, or other kinds of datastructures. More details about statements are provided below in the "STATEMENT OBJECTS" section.

Named parameters instead of positional parameters

As a general policy, most method calls use named parameters instead of positional parameters; that is, most method calls are of shape

  $obj->method(-arg1 => $val1, -arg2 => $val2);

instead of

  $obj->method($val1, $val2);

Named parameters are more verbose, but they convey more explicitly the purpose of each parameter, and they are exensible : a subclass may override a method and accept more named parameters than its parent class, without introducing perturbations with the original behaviour.

Like DBIx::Class and possibly other ORMs, DBIx::DataModel relies on SQL::Abstract for generating SQL; but since that module only supports positional parameters, a specific subclass was designed to support named parameters, plus a number of other exensions. That subclass is distributed separately under the name SQL::Abstract::More. Thanks to that extension, all select() calls from DBIx::DataModel use named parameters -columns, -where, -order_by, etc., which makes it much easier to read the source code and understand where each parameter to a query belongs.

ASSOCIATIONS

Syntax for defining associations

Front-end method : Association()

Associations are expressed in a syntax designed to closely reflect how they would be pictured in a Unified Modelling Language (UML) diagram. The general form, using the front-end Association() method, is :

  $schema->Association([$class1, $role1, $multiplicity1, @columns1],
                       [$class2, $role2, $multiplicity2, @columns2]);

Let's consider a particular example :

  $schema->Association([qw/Department department 1 /],
                       [qw/Activity   activities * /]);

This declaration corresponds to the following UML diagram :

  +------------+                         +------------+
  |            | 1                  0..* |            |
  | Department +-------------------------+ Activities |
  |            | department   activities |            |
  +------------+                         +------------+

The diagram states that there is an association between Department and Activity, where each side of the association has a role name and a multiplicity. In this example, a department may host zero, one or many activities; in the other direction, an activity is done within exactly one department.

In a UML diagram, role names and multiplicities are usually optional, and indeed are often omitted, because they do not add significant information : often the reader can infer them from the context. By contrast, roles and multiplicities are mandatory in a DBIx::DataModel association declaration, because they are needed for building the internal datastructures to make it work. In addition, the framework also needs to know the join column names on both sides : this is what the @columns1 and @columns2 arguments stand for in the general syntax shown above. However, join column names can be omitted if they have the same names in both tables : in that case, DBIx::DataModel will automatically guess the join column names from the primary key of the table with multiplicity 1. The department-activity example above used that implicit form.

The association declaration is bidirectional, so it will simultaneously add features in both participating classes. Role names declared in the association are used for a number of purposes :

  • implementing path methods for direct navigation;

  • implementing insert_into_* methods for inserting new rows related to a given parent row;

  • implementing multi-step navigation paths through several assocations, like in :

       $department->join(qw/activities employee spouse/)
                  ->select(-columns => \@some_columns,
                           -where   => \%some_criteria);

Information known by the meta-schema about the associations will be used to automatically generate appropriate SQL for the database joins. The kinds of SQL joins (INNER JOIN, LEFT OUTER JOIN) are inferred from multiplicities declared in the associations, but they can be overridden by writing for example

   ...->join(qw/activities <=> employee <=> spouse/) # inner joins

   ...->join(qw/activities  => employee  => spouse/) # left joins

Associations in DBIx::DataModel must be declared even if the associations between tables are already modelled by referential integrity rules within the database schema : this is because rules in the database do not contain information about role names and multiplicities, and therefore are not sufficient for generating the DBIx::DataModel associations. Nevertheless, DBIx::DataModel::Schema::Generator can generate a skeleton for the DBIx::DataModel schema; but that skeleton usually needs some manual additions to become truly useful.

Back-end method : define_association()

Associations can also be declared through the back-end method define_association(). The principle is exactly the same, but the back-end method uses named parameters instead of positional parameters : therefore it is more explicit and more flexible, at the cost of being more verbose.

With define_association(), our example above would be written

  my $meta_schema = $schema->metadm;
  $meta_schema->define_association(
    name => 'Department_activity', # or whatever name you prefer
    kind => 'Association',
    A    => {
      table        => $meta_schema->table('Department'),
      role         => 'department',
      multiplicity => [1, 1],
      join_cols    => [qw/dpt_id/],
    },
    B    => {
      table        => $meta_schema->table('Activity'),
      role         => 'activities',
      multiplicity => [0, '*'],
      join_cols    => [qw/dpt_id/],
    },
  );

Technical consequences of an association definition

This section describes what happens when a new association is defined.

Creation of a Meta::Association instance

A new instance of DBIx::DataModel::Meta::Association is created, mainly for supporting reflection queries (clients asking what are are the tables and associations in the current schema).

Creation of Meta::Path instances

Two instances of DBIx::DataModel::Meta::Path are created and registered into their corresponding meta_table objects. Such paths will be queried for resolving joins, so for example a method call like

  $schema->join(qw/Table path1 path2 .../)

will walk through the paths to find out which tables and which join conditions are involved.

Insertion of path methods within table classes

Path methods are added into classes on both sides of the association; these methods are named according to the role names. The result of a path method depends on the multiplicity : if the maximum multiplicit is 1, the result is a single row; if the maximum multiplicity is '*', the result is an arrayref of rows.

The UML diagram should be read crosswise to understand where path methods are generated. For example the UML picture

  +--------+                     +--------+
  |        | *              0..1 |        |
  | Table1 +---------------------+ Table2 |
  |        | role1         role2 |        |
  +--------+                     +--------+

shows that from an object of Table1, you need a method role2 to access the associated object of Table2; therefore the path method role2 will be added into the Table1 class, not Table2. Sometimes UML diagrams get this picture wrong, with role names on the other side; this can happen in particular when modelers have a background in Entity-Relationship or Merise methods, where the convention for role names is the reverse from UML. So be sure that your association declarations are compliant with the UML convention; otherwise DBIx::Model will not behave according to your expectations.

To illustrate how this works, the path method activities() corresponding to the example above will generate the following SQL query :

  SELECT * FROM Activity WHERE dpt_id = '$a_department->{dpt_id}'

The method can also accept additional parameters in SQL::Abstract::More format, exactly like the select() method. So for example

  my $activities = $a_department->activities(-columns => [qw/act_name salary/],
                                             -where   => {is_active => 'Y'});

would perform the following SQL request :

  SELECT act_name, salary FROM Activity WHERE
    dpt_id = '$a_department->{dpt_id}' AND
    is_active = 'Y'

A primitive form of caching is supported : if the method is called without any parameters, and if that path was previously expanded (see expand()), i.e. if the object hash contains an entry $a_department->{activities}, then this data is reused instead of calling the database again. To force a new call to the database, it suffices to pass some parameters :

  $dpt->expand('activities');  # stores result in $dpt->{activities}
  $list = $dpt->activities;    # returns cached $dpt->{activities}
  $list = $dpt->activities(-columns => '*');  # does not change the default
                                              # columns, but forces a new call
                                              # to the database

Sometimes associations are unidirectional (it does not make sense to traverse the association in both directions). In such cases, use an anonymous role, expressed by an empty string, or by strings "0", '""', "--" or "none" : then the corresponding path method is not generated.

Insertion of insert_into_$path methods

When a role has multiplicity '*', a method named insert_into_... is added to the table class. That method will create new objects of the associated class, taking care of the linking automatically :

  $a_department->insert_into_activities({d_begin => $today,
                                         emp_id  => $emp_id});

This is equivalent to

  $schema->table('Activity')->insert({d_begin => $today,
                                      emp_id  => $emp_id,
                                      dpt_id  => $a_department->{dpt_id}});

Observe how the dpt_id column was automatically added into the hash to be inserted.

Many-to-many associations

UML conceptual models may contain associations where the multiplicity is '*' on both sides (these are called many-to-many associations). However, when it comes to actual database implementation, such associations need an intermediate linking table to collect pairs of identifiers from both tables. Therefore a many-to-many association in DBIx::DataModel is declared by specifying how both sides relate to the linking table.

The linking table needs to be declared first :

  $schema->Table(qw/LinkTable link_table prim_key1 prim_key2/);

  $schema->Association([qw/Table1     role1       0..1/],
                       [qw/LinkTable  link_table  *   /]);

  $schema->Association([qw/Table2     role2       0..1/],
                       [qw/LinkTable  link_table  *   /]);

This describes a diagram like this :

  +--------+                   +-------+                   +--------+
  |        | 0..1            * | Link  | *            0..1 |        |
  | Table1 +-------------------+  --   +-------------------+ Table2 |
  |        | role1  link_table | Table | link_table  role2 |        |
  +--------+                   +-------+                   +--------+

Then we can declare the many-to-many association, very much like ordinary associations, except that the last items in the argument lists are names of paths to follow, instead of names of columns to join. In the diagram above, we must follow paths link_table and role2 in order to obtain the rows of Table2 related to an instance of Table1; so we write

  $schema->Association([qw/Table1  roles1  *  link_table role1/],
                       [qw/Table2  roles2  *  link_table role2/]);

which describes a diagram like this :

              +--------+                    +--------+
              |        | *                * |        |
              | Table1 +--------------------+ Table2 |
              |        | roles1      roles2 |        |
              +--------+                    +--------+

The declaration has created a new method roles2 in Table1; that method is implemented by following paths linksA and role2. So for an object obj1 of Table1, the call

  my $obj2_arrayref = $obj1->roles2();

will generate the following SQL :

  SELECT * FROM link_table INNER JOIN table2
            ON link_table.prim_key2=table2.prim_key2
    WHERE link_table.prim_key1 = $obj->{prim_key1}

Observe that roles2() returns rows from a join, so these rows will belong both to Table2 and to Link_Table.

Many-to-many associations do not have an automatic insert_into_* method : you must explicitly insert into the link table.

In the previous section we were following two roles at once in order to implement a many-to-many association. More generally, it may be useful to follow several roles at once, joining the tables in a single SQL query. This can be done through the following methods :

  • Schema::join() : create a new Join that selects from several tables, filling the joins automatically

  • Table::join() : from a given row object, follow a list of paths to get information from associated tables.

  • Table::join() : from a given class, follow a list of roles to prepare getting information from associated tables; the result is a Statement object that can be bound to specific members of the initial table and then can be selected.

  • define_navigation_method() : add a new method in a table, that will follow a list of roles; this is like compiling a join() through several associated tables into a method.

STATEMENT OBJECTS

Purpose

A statement object encapsulates a SELECT request to the database.

Often, the client code does not even see that a statement object is doing the work : for example in a query like

  my $rows = $schema->table($name)->select(-columns => \@columns
                                           -where   => \%condition);

the results are directly retrieved as an arrayref of rows. Behind the scene, an intermediate statement object was created to assemble the SQL request, issue that request to the database and package the results; after that work was finished, the statement object was destroyed automatically.

However, in some situations it is useful to explicitly interact with the statement object. The rest of this chapter explains the statement lifecycle and how parameters are assembled in several steps until the SQL query can be issued.

Lifecycle

Principle

The statement object goes through a sequence of states before delivering results. Some methods are only available in a given state. At any time, the status() method tells which is the current state of the statement object. The state-transition diagram is pictured below; then each state is described in more detail in the following section.

State diagram

  +--------+     select()      +--------------------+
  | source |------------------>|    NEW_statement   |
  +--------+                   +--------------------+
                    refine()      |
                   ,----<---------'
                   |
                   |           +--------------------+
                   '---->----->| REFINED_statement  |<----<-------,
                               +--------------------+     |       |
                    sqlize()      |    |   |    refine()  |       |
                   ,----<---------'    |   '------>-------'       |
                   |                   |                 bind(..) |
                   |                   '------------------>-------'
                   |
                   |           +--------------------+
                   '---->----->| SQLIZED_statement  |<----<-------,
                               +--------------------+             |
                    prepare()     |    |                 bind(..) |
                   ,----<---------'    '------------------>-------'
                   |
                   |           +--------------------+
                   '---->----->| PREPARED_statement |<----<-------,
                               +--------------------+             |
                    execute()     |    |                 bind(..) |
                   ,----<---------'    '------------------>-------'
                   |
                   |           +--------------------+
                   '---->----->| EXECUTED_statement |<----<-------,
                               +--------------------+     |       |
                                  |    |   |     bind(..) |       |
                                  |    |   '------>-------'       |
                                  |    |                execute() |
                                  |    '------------------>-------'
  +-------------+  next()/all()   |
  | data row(s) |-------<---------'
  +-------------+

Statement states

NEW
  my $statement = $source->select(..., -result_as => 'statement);
  # or
  my $statement = DBIx::DataModel::Statement->new($source);
  

The statement has just been created, and already knows its datasource (an instance of DBIx::DataModel::Meta::Source).

Statements are rarely created by a direct call to the new() method; instead, they are created indirectly, by calling method select() on a datasource.

REFINED
  $statement->refine(%arguments);

The refine() method stores parameters within the statement object, like -columns to retrieve, -where clauses, etc.

The statement is then bumped into REFINED state. While in this state, refine() can be called again several times, accumulating parameters in several steps. This is useful for example when one application module knows only one part of the request, while other criteria are known by another module; this situation is quite frequent in web application, where some criteria come from back-office constraints, while other criteria come from forms filled by users.

Early binding of values to placeholders may also occur through the bind() method. This useful when some pairs name => $value are already accessible, but without knowing yet where those names will be used in the SQL query. Such situations can happen for example when fetching rows associated to an initial data row : the values of the initial row may become criteria for the new SQL request.

SQLIZED

The sqlize() method gathers all parameters accumulated so far within the statement, generates the SQL, and bumps the statement into SQLIZED state.

At this point, it is no longer possible to call the refine() method to add new clauses. However, it is still possible to call the bind() method to bind values to the placeholders.

PREPARED

The prepare() method calls "prepare" in DBI to get a DBI sth handle, and bumps the statement into PREPARED state.

EXECUTED

The execute() method calls "execute" in DBI to execute the statement on the database side, and bumps the statement into EXECUTED state.

Ath this point, the statement is ready to extract data rows through the next() or all() methods.

Stepwise building of the SQL query

Principle

A statement object can accumulate requirements in several steps, before generating the actual database query. Therefore various independent client components can contribute to various parts of the final SQL.

Example

  # create a statement with initial conditions on a department object
  my $statement = $department->join(qw/activities employee/);

  # add a date condition (from config file or CGI params or whatever)
  my $date = get_initial_date_from_some_external_source();
  $statement->refine(-where => {d_begin => {">" => $date}});

  # now issue the SQL query
  my $rows = $statement->select(-columns => [qw/d_begin lastname firstname/]);

This code generates the following SQL :

  SELECT d_begin, lastname, firstname
  FROM   activity INNER JOIN employee
                  ON activity.emp_id=employee.emp_id
  WHERE  dpt_id  = $departement->{dpt_id}
    AND  d_begin > $date

The join method applied to an instance of Department first created a view representing the database join between activity and employee; then it created a statement object that would query that view with an initial condition on dpt_id. The refine call added a second condition on d_begin. Finally the select method specified which columns to retrieve.

Stepwise parameter binding through named placeholders

DBIx::DataModel::Statement objects have their own mechanism of placeholders, i.e. places within a SQL statement where values need to be inserted, but where the actual values are supplied in a separate step. Ultimately these placeholders get translated into usual placeholders at the DBI and database layers; but an additional layer was needed here in order to allow for stepwise building of SQL conditions, as just demonstrated above.

Stepwise binding of values to placeholders requires named placeholders, as opposed to usual positional placeholders. Named placeholders are recognized according to a placeholder prefix, which by default is ?:. Here is an example :

  $statement->refine(-where => {col1 => '?:foo',
                                col2 => '?:bar',
                                col3 => '?:foo'});

A call to the "bind()" method associates values with named placeholders :

  $statement->bind(foo => 123, bar => 456);

This binding can happen either before or after the refine() step. In other words, collaboration scenarios can vary : placeholders can be inserted into the statement before knowing which values they will take, or on the contrary values can be supplied before knowing where they will be used (or even if they will be used at all).

If the default placeholder prefix ?: is inconvenient, another placeholder prefix may be specified as an option to the schema creation method.

COLUMN HANDLERS AND TYPES

Column handlers

A column handler is a subroutine associated to a column under a given name. Given any $row object, a call to

  $row->apply_column_handler($handler_name)

will iterate over all columns present in that row, check if these columns have a handler of the corresponding name, and if so, execute the associated code.

Column handlers are registered either at table definition time through the define_table() method, or later through the define_column_handlers() method. The signature for handler subroutines is detailed in "define_type()" in DBIx::DataModel::Doc::Reference.

The concept of handlers is generic, but some specific handler names play particular roles : handler from_DB is called automatically just after reading data from the database, and handler to_DB is called automatically just before writing into the database. Handler name validate is used by the method "has_invalid_columns()" in DBIx::DataModel::Doc::Reference. Other usage of handlers is up to the client code. Handlers may be used for example for

  • converting dates between internal database format and user presentation format

  • converting empty strings into null values

  • inflating scalar values into objects

  • validating column data

Types as collections of handlers

A Type is just a collection of handlers, registered under a type name : it is a convenience for associating the same collection of handlers to various columns in various tables. Here is an example from the "SYNOPSIS" in DBIx::DataModel :

  My::Schema->metadm->define_type(
    name     => 'Percent',
    handlers => {
      from_DB  => sub {$_[0] *= 100 if $_[0]},
      to_DB    => sub {$_[0] /= 100 if $_[0]},
      validate => sub {$_[0] =~ /1?\d?\d/}),
    });

Note that this notion of "type" is independent from the actual datatypes defined within the database (integer, varchar, etc.). From the Perl side, these are all seen as scalar values; a column type as defined here is just a way to specify some operations, programmed in Perl, that can be performed on the scalar values.

GENERAL DESIGN PRINCIPLES

Material in the previous sections presented the general architecture of DBIx::DataModel; this should be enough to easily follow the QUICKSTART chapter, or investigate more details in the REFERENCE chapter.

The present section will discuss the motivation for some design features of DBIx::DataModel, in order to explain not only how it works, but also why it was designed that way : this may be of interest to readers who want to compare various ORMs, or to architects who must decide whether or not they will include DBIx::DataModel in their collection of components.

Collaborate with lower-level layers, do not hide them

DBIx::DataModel provides high-level abstractions that help client applications to automate some common tasks; however, access to lower-level layers remains open, for situations where detailed control is needed. Here are some ways of collaborating with lower layers :

  • Any call to the select() method (including indirect calls through path methods in row objects) can use the -return_as parameter to ask for the underlying DBI statement, or even just the generated SQL code, instead of getting the usual list of data rows as result. From there, the application code can interact directly with the DBI layer.

  • Conversely, the application code can start by getting raw results from the DBI layer, and then bring those results to the DBIx::DataModel layer through the bless_from_DB() method. Objects blessed in that way can then benefit from all methods provided by DBIx::DataModel.

  • Hooks can be inserted at various stages of the statement lifecycle : see parameters -post_SQL, -pre_exec, etc. to the select() method. This provides an opportunity for running driver-specific or application-specific code at a particular point in the lifecycle. If the same hook is needed in every statement, another possibility is to subclass DBIx::DataModel::Statement and override the prepare(), execute() or select() methods.

  • The internal representation of a row object is just a plain Perl hashref. Application code can take advantage of usual Perl idioms for dealing with such hashrefs, for example for extracting keys, values or slices of data, or for passing the whole datastructure to external helper modules such as XML generators, Perl dumps, javascript JSON, templates of the Template Toolkit, etc. Such modules need to walk on the data tree, exploring keys, values and subtrees; so they cannot work if data columns are implemented as object-oriented methods.

Let the database do the work

In the spirit of collaborating with the database instead of hiding its functionalities under an object-oriented cover, several tasks are deliberately not included within the DBIx::DataModel framework, under the assumption that such tasks will be better handled by the database directly.

Use RDBMS tools to create the schema

Besides basic SQL data definition statements, RDBMS often come with their own helper tools for creating or modifying a database schema (interactive editors for tables, columns, datatypes, etc.). Therefore DBIx::DataModel provides no support in this area, and assumes that the database schema is pre-existent.

To communicate with the database, the framework only needs to know a bare minimum about the schema: table names, primary keys and UML associations. No details are required about column names or their datatypes.

Let the RDBMS check data integrity

Most RDBMS have facilities for checking or ensuring integrity rules : foreign key constraints, restricted ranges for values, cascaded deletes, etc. DBIx::DataModel can also do some validation tasks, by setting up column types with a validate handler; however, it is recommended to rather use the RDBMS for performing data integrity checks, whenever possible.

Take advantage of database projections through variable-size objects

In many ORMs, columns in a table are in 1-to-1 correspondence with attributes in the associated class; so any transfer between database and memory systematically includes all the columns, both for selects and for updates. Of course this has the advantage of simplicity for the programmer; however, it may be very inefficient if the client program only wants to read two columns from a very big table.

Furthermore, unexpected concurrency problems may occur : in a scenario such as

  client1                            client2
  =======                            =======
  my $obj = My::Table->fetch($key);  my $obj = My::Table->fetch($key);
  $obj->set(column1 => $val1);       $obj->set(column2 => $val2);
  $obj->update;                      $obj->update;

the final state of the row should theoretically be consistent for any concurrent execution of client1 and client2. However, if the ORM layer blindly updates all columns, instead of just the changed columns, then the final value of column1 or column2 is unpredictable.

To diminish the efficiency problem, some ORMs offer the possibility to partition columns into several column groups. The ORM layer then transparently fetches the appropriate groups in several steps, depending on which columns are requested from the client. However, this might be another source of inefficiency, if the client frequently needs one column from the first group and one from the second group.

With DBIx::DataModel, the client code has precise control over which columns to transfer, because these can be specified separately at each method call. Whenever efficiency is not an issue, one can be lazy and specify nothing, in which case the SELECT columns will default to "*". Actually, the schema does not know about column names, except for primary and foreign keys, and therefore would be unable to transparently decide which columns to retrieve. Consequently, objects from a given class may be of variable size :

  my $objs_A = My::Table->select(-columns => [qw/c1 c2/],
                                 -where   => {name => {-like => "A%"}};

  my $objs_B = My::Table->select(-columns => [qw/c3 c4 c5/],
                                 -where   => {name => {-like => "B%"}};

  my $objs_C = My::Table->select(# nothing specified : defaults to '*'
                                 -where   => {name => {-like => "C%"}};

Therefore the programmer has much more freedom and control, but of course also more responsability : in this example, attempts to access column c1 in members of @$objs_B would yield an error.

Exploit database products (joins) through multiple inheritance

ORMs often have difficulties to exploit database joins, because joins contain columns from several tables at once. If tables are mapped to classes, and rows are mapped to objects of those classes, then what should be the class of a joined row ? Three approaches can be taken :

  • ignore database joins altogether : all joins are performed within the ORM, on the client side. This is of course the simplest way, but also the less efficient, because many database queries are needed in order to gather all the data.

  • ask a join from the database, then perform some reverse engineering to split each resulting row into several objects (partitioning the columns).

  • create a new subclass on the fly that inherits from all joined tables : data rows then simply become objects of that new subclass.

DBIx::DataModel takes the third approach, and seems to be the sole ORM dealing with database joins in that way.

Efficiency concerns

Great care has been taken to interact with the database in the most efficient way, and to leave an open access to DBI fine-tuning options for achieving even better results. In particular :

  • DBIx::DataModel can take advantage of DBI's bind_columns method, which is the fastest way to get a large number of data rows from DBI. With this technique, each row is retrieved into the same memory location, in order to spare the cost of allocating and free-ing memory at each row. This is done in DBIx::DataModel through a fast statement :

      my $statement = My::Table->select(-columns   => ...,
                                        -where     => ...,
                                        -result_as => 'fast_statement');
      while (my $row = $statement->next) {
        work_with($row);
      }

    This code creates a single memory location for storing data rows; at each call to the next method, that location is updated with fresh values from the database.

    While being very fast, this approach also has some limitations : for example it is not possible to put such rows into an array (because the array would merely contain multiple references to the last row). So fast statements are not activated by default; regular statements create a fresh hashref for each row.

  • The client code can have fine control on statement preparation and execution, which is useful for writing efficient loops. For example, instead of writing

      my $list = My::Table->select(...);
      foreach my $obj (@$list) {
        my $related_rows = $obj->join(qw/role1 role2/)->select;
        ...
      }

    we can prepare a statement before the loop, and then just execute that statement at each iteration :

      my $statement = My::Table->join(qw/role1 role2/)->prepare;
      my $list = My::Table->select(...);
      foreach my $obj (@$list) {
        my $related_rows = $statement->execute($obj)->all;
        ...
      }
  • DBI has a prepare_cached method, that works like prepare except that the statement handle returned is stored in a hash associated with the $dbh. This can be exploited from DBIx::DataModel by stating

      $schema->dbi_prepare_method('prepare_cached');
  • The dependencies of DBIx::DataModel to other CPAN modules are modest; as a result, the number of internal method calls necessary to answer a query is quite low.

DESIGN FAQ

Here are answers to some design choices, in the form of a FAQ.

Why no accessor methods for columns ?

Columns within a row object are accessed, not with object-oriented method calls, but as entries within a hash. This is a strong design choice, which differs from many other ORMs; its motivations are the following :

  • as already stated above, row objects are of variable size, because each select() call decides which columns will be retrieved from the database. An object-oriented API would not be well suited for handling this situation, because in Perl object-oriented programming the set of methods is the same for every member of a given class. Other ORMs usually solve the problem by providing additional methods for asking which columns are present in the object. By contrast, the hash API makes it very direct to find out which keys are present in the hash.

  • all common Perl idioms for working with hashes can be used, like :

      # inspect hash keys
      my @column_names = keys @$row;
    
      # remove leading spaces in all columns
      s/^\s+// foreach values @$row;
    
      # print a slice
      print @{$row}{qw/col1 col2 col3/};
    
      # swap values
      ($row->{col1}, $row->{col2}) = ($row->{col2}, $row->{col1});
    
      # other way to swap values
      @{$row}{qw/col1 col2/} = @{$row}{qw/col2 col1/};

    As demonstrated by these examples, the hash API is especially powerful for dealing with several columns in one single line of Perl code; an object-oriented API would be more verbose.

  • the hash API is faster and uses less memory than an object-oriented encapsulation.

  • row hashrefs can be converted directly to import/export formats such as JSON, YAML, XML, etc.

Of course the hash API also has some disadvantages : it is not possible to intercept method calls for modifying or computing column data on the fly, nor to implement a mechanism of "dirty columns" to keep track of which columns where accessed or modified.

Nevertheless, conversion of column values is fully possible through the mechanisms of column types and to_DB / from_DB handlers. This is simply done at a different moment : instead of doing conversions when columns are accessed, DBIx::DataModel does conversions just before or just after having communicated with the database; this is similar in spirit to a kind of PerlIO layer for databases.

Why this localize_state method ?

When DBIx::DataModel is in single-schema mode, the schema is a global resource within the application, very much like other Perl global resources (for example STDIN, %ENV, or special variables $/, $,, etc.).

When used with care, interaction of several components through a global resource can be quite handy : for example STDIN does not need to be explicitly passed to every component, it is always available; furthermore, STDIN can be redirected at one specific place and then all collaborating components will immediately change behaviour accordingly. However, this must be done with care, because there is also a risk of undesired "action at distance" --- maybe the other components wanted to continue reading from the real STDIN, not the redirected one !

To avoid undesired interactions through the global state, Perl offers the local construct, also known as dynamic scoping (see perlsub). Using that construct, a component can temporarily redirect STDIN for the duration of one specific computation, and then automatically restore it to its previous state.

The localize_state method of DBIx::DataModel uses a similar approach : it supports a temporary change to the global schema state (for example for changing the database handle), and then restores the schema to its previous state. That method is also called implicitly by do_transaction.

Serialization

DBIx::DataModel includes support for the standard Storable serialization / deserialization methods freeze and thaw : so records and record trees can be written into files or sent to other processes. Dynamic subclasses for database joins are re-created on the fly during deserialization through thaw. However, there is no support for serializing database connections (this would be hazardous, and also insecure because serialization data would contain database passwords). Therefore the process which performs deserialization is responsible for opening the database connection by its own means, before calling the thaw method.

TO DO

Here is a list of points to improve in future versions DBIx::DataModel :

  - 'has_invalid_columns' : should be called automatically before insert/update ?
  - 'validate' record handler (at record level, not only column handlers)
  - walk through WHERE queries and apply 'to_DB' handler (not obvious!)
  - add PKEYS keyword in -columns, to be automatically replaced by
    names of primary key columns of the touched tables
  - pre/post callbacks: support arrays of handlers, refine(..) should add
    to the array
  - refine(-order_by => ..) should add to the ordering
  - update with subtrees (insert/update on dependent records. Quid: delete?)
  - auto-unjoin (API for partioning columns into subobjects).
  - support DISTINCT ON ...
  - support find_or_create, update_or_create
  - copy idea from DBIC:Storage:DBI:MultiColumnIn
  - Storable
     - hooks for YAML ?
     - think about STORABLE_attach
     - think about freeze/thaw in multi-schema mode
  - readonly tables
  - savepoints
  - explain how to design families of tables with an intermediate superclass
  - tune croak() in Statement.pm so that SQL errors are reported 
    from caller's line
  - $obj->join(path1 ...)->select(...) should automatically add
    -result_as => 'firstrow' when all multiplicities are 1. See TODO
    in ConnnectedSources::join
  - check auto_insert / auto_update columns to prevent intersections