The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DBIx::XMLServer - Serve data as XML in response to HTTP requests

SYNOPSIS

  use XML::LibXML;
  use DBIx::XMLServer;

  my $xml_server = new DBIx::XMLServer($dbh, "template.xml");

  my $doc = $xml_server->process($QUERY_STRING);
  die "Error: $doc" unless ref $doc;

  print "Content-type: application/xml\r\n\r\n";
  print $doc->toString(1);

DESCRIPTION

This module implements the whole process of generating an XML document from a database query, in response to an HTTP request. The mapping from the DBI database to an XML structure is defined in a template file, also in XML; this template is used not only to turn the data into XML, but also to parse the query string. To the user, the format of the query string is very natural in relation to the XML data which they will receive.

All the methods of this object can take a hash of named parameters instead of a list of parameters.

One DBIx::XMLServer object can process several queries. The following steps take place in processing a query:

  1. The query string is parsed. It contains search criteria together with other options about the format of the returned data.

  2. The search criteria from the query string are converted, using the XML template, into an SQL SELECT statement.

  3. The results of the SQL query are translated into XML, again using the template, and returned to the caller.

METHODS

Constructor

  my $xml_server = new DBIx::XMLServer( $dbh, $template_doc 
                                        [, $template_node] );

  my $xml_server = new DBIx::XMLServer( dbh => $dbh,
                                        doc => $template_doc,
                                        template => $template_node,
                                        maxpagesize => $maxpagesize );

The constructor for DBIx::XMLServer takes two mandatory arguments and two optional arguments.

$dbh

This is a handle for the database; see DBI for more information.

$template_doc

This is the XML document containing the template. It may be either an XML::LibXML::Document object or a string, which is taken as a file name.

$template_node

One template file may contain several individual templates; if so, this argument may be used to pass an XML::LibXML::Element object indicating which template should be used. By default the first template in the file is used.

$maxpagesize

This option may be used to limit the number of records than will be returned in a query. The user can choose a page size smaller than this by using the pagesize option on their query (see below), but they will not be allowed to request a page size larger than this maximum.

process()

  my $result = $xml_server->process( $query [, $template_node] );

  my $result = $xml_server->process( query => $query,
                                     template => $template_node,
                                     rowcount => $rowcount,
                                     userformat => $userformat );

This method processes an HTTP query and returns an XML document containing the results of the query. There are one mandatory argument and two optional arguments.

$query

This is the HTTP GET query string to be processed.

$template_node

As above, this may indicate which of several templates is to be used for this query. It is an XML::LibXML::Element object.

$rowcount

It is possible to limit the number of rows returned in one query, either in response to a user request (by using the pagesize option, see below) or by passing the maxpagesize option when creating the DBIx::XMLServer object. In these cases it may be useful to know the total number of rows that would have been returned had no limit been in place. The number of rows can be put into the output XML document using the <sql:meta type="rows"> element in the template (see below). This argument chooses how this information should be obtained from the database.

FOUND_ROWS

Passing rowcount => 'FOUND_ROWS' tells the module to use the SQL_COUNT_FOUND_ROWS option and the FOUND_ROWS function. If your database supports these, use this option.

COUNT

Passing rowcount => 'COUNT' means that a second query will be done after the main database query, of this form:

  SELECT COUNT(*) FROM ... WHERE ...
$userformat

Setting this to a true value allows the user to choose between several templates in the file by specifying the format option in the query string.

The return value of this method is either an XML::LibXML::Document object containing the result, or a string containing an error message. An error string is only returned for errors caused by the HTTP query string and thus the user's fault; other errors, which are the programmer's fault, will croak.

EXAMPLE

This example is taken from the tests included with the module. The database contains two tables.

  Table dbixtest1:

  +----+--------------+---------+------+
  | id | name         | manager | dept |
  +----+--------------+---------+------+
  |  1 | John Smith   |    NULL |    1 |
  |  2 | Fred Bloggs  |       3 |    1 |
  |  3 | Ann Other    |       1 |    1 |
  |  4 | Minnie Mouse |    NULL |    2 |
  |  5 | Mickey Mouse |       4 |    2 |
  +----+--------------+---------+------+

  Table dbixtest2:

  +----+----------------------+
  | id | name                 |
  +----+----------------------+
  |  1 | Widget Manufacturing |
  |  2 | Widget Marketing     |
  +----+----------------------+

The template file (in t/t10.xml) contains the following three table definitions:

  <sql:table name="employees" sqlname="dbixtest1"/>
  <sql:table name="managers" sqlname="dbixtest1"
    join="left" jointo="employees" refcolumn="manager" keycolumn="id"/>
  <sql:table name="departments" sqlname="dbixtest2"
    join="left" jointo="employees" refcolumn="dept" keycolumn="id"/>

The template element is as follows:

  <sql:template table="employees">
    <employees>
      <sql:record>
        <employee id="foo">
          <sql:field type="number" attribute="id" expr="employees.id"/>
          <name>
            <sql:field type="text" expr="employees.name"/>
          </name>
          <manager>
            <sql:field type="text" expr="managers.name" join="managers"
              null='nil'/>
          </manager>
          <department>
            <sql:field type="text" expr="departments.name" join="departments"/>
          </department>
        </employee>
      </sql:record>
    </employees>
  </sql:template>

The query string name=Ann* produces the following output:

  <?xml version="1.0"?>
  <employees>
    <employee id="3">
      <name>Ann Other</name>
      <manager>John Smith</manager>
      <department>Widget Manufacturing</department>
    </employee>
  </employees>

The query string department=Widget%20Marketing&fields=name produces the following output:

  <?xml version="1.0"?>
  <employees>
    <employee id="4">
      <name>Minnie Mouse</name>
    </employee>
    <employee id="5">
      <name>Mickey Mouse</name>
    </employee>
  </employees>

HOW IT WORKS: OVERVIEW

The main part of the template file which controls DBIx::XMLServer is the template element. This element gives a skeleton for the output XML document. Within the template element is an element, the record element, which gives a skeleton for that part of the document which is to be repeated for each row in the SQL query result. The record element is a fragment of XML, mostly not in the sql: namespace, which contains some <sql:field> elements.

Each <sql:field> element corresponds to a point in the record element where data from the database will be inserted. Often, this means that one <sql:field> element corresponds to one column in a table in the database. The field has a type; this determines the mappings both between data in the database and data in the XML document, and between the user's HTTP query string and the SQL WHERE clause.

The HTTP query which the user supplies consists of search criteria, together with other special options which control the format of the XML output document. Each criterion in the HTTP query selects one field in the record and gives some way of limiting data on that field, typically by some comparison operation. The selection of the field is accomplished by an XPath expression, normally very simply consisting just of the name of the field. After the field has been selected, the remainder of the criterion is processed by the Perl object corresponding to that field type. For example, the built-in text field type recognises simple string comparisons as well as regular expression comparisons; and the build-in number field type recognises numeric comparisons.

All these criteria are put together to form the WHERE clause of the SQL query. The user may also use the special fields=... option to select which fields appear in the resulting XML document; the value of this option is again an XPath expression which selects a part of the record template to be returned.

Other special options control how many records are returned on each page, which page of the results should be returned, and may choose one of several templates in the file.

The template to use for a query is chosen as follows:

  1. If the userformat option is set when calling DBIx::XMLServer::process() and the user has chosen a template with the format option in the query string, that template is used.

  2. Otherwise, if a template was specified when calling DBIx::XMLServer::process(), then that template is used.

  3. Otherwise, if a template was specified when constructing the DBIx::XMLServer object, then that template is used.

  4. Otherwise, the first template in the file is used.

THE TEMPLATE FILE

The behaviour of DBIx::XMLServer is determined entirely by the template file, which is an XML document. This section explains the format and meaning of the various elements which can occur in the template file.

Namespace

All the special elements used in the template file are in the namespace associated to the URI http://boojum.org.uk/NS/XMLServer. In this section we will suppose that the prefix sql: is bound to that namespace, though of course any other prefix could be used instead.

The root element

The document element of the template file must be an <sql:spec> element. This element serves only to contain the other elements in the template file.

Contained in the root element are elements of three types:

  • Field type definition elements;

  • Table definition elements;

  • One or more template elements.

We now describe each of these in more detail.

Field type definitions

A field type definition is given by a <sql:type> element. Each field in the template has a type. That type determines: how a criterion from the query string is converted to an SQL WHERE clause for that field; how the SQL SELECT clause to retrieve data for that field is created; and how the resulting SQL data is turned into XML. For example, the standard date field type can interpret simple date comparisons in the query string, and puts the date into a specific format in the XML.

Each field type is represented by a Perl object class, derived from DBIx::XMLServer::Field. For information about the methods which this class must define, see DBIx::XMLServer::Field. The class may be defined in a separate Perl module file, as for the standard field types; or the methods of the class may be included verbatim in the XML file, as follows.

The <sql:type> element has one attribute, name, and four element which may appear as children.

attribute: name

The name attribute defines the name by which this type will be referred to in the templates.

element: <sql:module>

If the Perl code implementing the field type is contained in a Perl module in a separate file, this element is used to give the name of the module. It should contain the Perl name of the module (for example, DBIx::XMLServer::NumberField).

Example

  <sql:type name="number">
    <sql:module>DBIx::XMLOut::NumberField</sql:module>
  </sql:type>

Instead of the <sql:module> element, the <sql:type> element may have separate child elements defining the various facets of the field type.

element: <sql:isa>

This element contains the name of a Perl module from which the field type is derived. The default is DBIx::XMLServer::Field.

element: <sql:select>

This element contains the body of the select method (probably inside a CDATA section).

element: <sql:where>

This element contains the body of the where method (probably inside a CDATA section).

element: <sql:join>

This element contains the body of the join method (probably inside a CDATA section).

element: <sql:value>

This element contains the body of the value method (probably inside a CDATA section).

element: <sql:init>

This element contains the body of the init method (probably inside a CDATA section).

Table definitions

Any SQL table which will be accessed by the template needs a table definition. As a minimum, a table definition associates a local name for a table with the table's SQL name. In addition, the definition can specify how this table is to be joined to the other tables in the database.

Note that one SQL table may often be joined several times in different ways; this can be accomplished by several table definitions, all referring to the same SQL table.

A table definition is represented by the <sql:table> element, which has no content but several attributes.

attribute: name

This mandatory attribute gives the name by which the table will be referred to in the template, and also the alias by which it will be known in the SQL statement.

attribute: sqlname

This mandatory attribute gives the SQL name of the table. In the SELECT statement, the table will be referenced as <sqlname> AS <name>.

attribute: jointo

This attribute specifies the name of another table to which this table is joined. Whenever a query involves a column from this table, this and the following attributes will be used to add an appropriate join to the SQL SELECT statement.

attribute: join

This attribute specifies the type of join, such as LEFT, RIGHT, INNER or OUTER.

attribute: on

This attribute specifies the ON clause used to join the two tables. In the most common case, the following two attributes may be used instead.

attribute: keycolumn

This attribute gives the column in this table used to join to the other table.

attribute: refcolumn

This attribute gives the column in the other table used for the join. Specifying keycolumn and refcolumn is equivalent to giving the on attribute value

  <this table's name>.<keycolumn> = <other table's name>.<refcolumn> .

The template element

A template file must contain at least one <sql:template> element. This element defines the shape of the output document. It may contain arbitrary XML elements, which are copied straight to the output document. It also contains one <sql:record> element, which defines that part of the output document which is repeated for each row returned from the SQL query.

Further, the template element may contain <sql:meta> elements which indicate that certain information about the query should be inserted into the output document.

As the output document is formed from the content of the <sql:template> element, it follows that this element must have exactly one child element.

The <sql:template> may have the following attributes:

attribute: id

This optional attribute gives a unique identifier to the template. The user may, if allowed, choose which template to use by specifying the format option in the query string together with this identifier.

attribute: table

This mandatory attribute specifies the main table for this template, to which any other tables will be joined.

attribute: default-namespace

In the HTTP query string, the user must refer to parts of the template. To avoid them having to specify namespaces for these, this attribute gives a default namespace which will be used for unqualified names in the query string.

The record element

Each template contains precisely one <sql:record> element among its descendants. This record element defines that part of the output XML document which is to be repeated once for each row in the result of the SQL query. The content of the record element consists of a fragment of XML containing some <sql:field> elements; each of these defines a point at which SQL data will be inserted into the record. The <sql:record> must have precisely one child element.

It is also to the structure inside the <sql:record> element that the user's HTTP query refers.

The <sql:record> element has no attributes.

The field element

The record element will contain several <sql:field> elements. Each of these field elements defines what the user will think of as a field; that is, a part of the XML record which changes from one record to the next. Normally this will correspond to one column in an SQL table, though this is not obligatory.

A field has a type, which refers to one of the field type definitions in the template file. This type determines the mappings both between SQL data and XML output data, and between the user's query and the SQL WHERE clause.

The <sql:field> element may have the following attributes:

attribute: type

This mandatory attribute gives the type of the field. It is the name of one of the field types defined in the template file.

attribute: join

This attribute specifies which table needs to be joined to the main table in order for this field to be found. (Note: this attribute is only read by the field type class's join method. If that method is overridden, this attribute may become irrelevant.)

attribute: attribute

If this attribute is set, the contents of the field will not be returned as a text node, but rather as an attribute on the <sql:field> node's parent node. The value of the attribute attribute gives the name of the attribute on the parent node which should be filled in with the value of this field. When this attribute is set, the parent node should always have an attribute of that name defined; the contents are irrelevant.

attribute: expr

This attribute gives the SQL SELECT expression which should be evaluated to find the value of the field. (Note: this attribute is only ever looked at in the field type class's select method. If this method is overridden, this attribute need not necessarily still be present.)

attribute: null

This attribute determines the action when the field value is null. There are three possible values:

empty (default)

The field is omitted from the result, but the parent element remains.

omit

The parent element is omitted from the record

nil

The parent element has the xsi:nil attribute set.

The sql:omit attribute

Any element in the record may have the Boolean attribute sql:omit. If this attribute is set, then this element will be omitted from any record in which the element is empty (because child elements have been omitted).

The meta element

The <sql:meta> element is used for putting information about the query into the output document. The information is selected by the type attribute of the element. The following type attributes are recognised:

type='args'

This gives the original query string passed to DBIx::XMLServer.

type='page'

This gives the page number within the results, as selected by the page= option in the query string.

type='pagesize'

This gives the page size, as selected by the pagesize= option in the query string.

type='query'

This gives the SQL query which was executed to produce the results.

type='rows'

This gives the number of rows which the query would have returned, had it not been for the page= and pagesize= options. To tell the module how to find this information, set the rowcount option when processing the request (see above).

The <sql:meta> element in the template will be replaced by the corresponding string in the output document. Alternatively, it is possible to place the string into the output document as an attribute to the parent element of the <sql:meta> element. To do this, include an attribute attribute="name" on the <sql:meta> element, where name is the local name of the attribute. To add a namespace to the attribute, additionally include an attribute namespace="foo" on the <sql:meta> element, replacing foo with whatever namespace should be used.

SPECIAL OPTIONS IN THE QUERY STRING

The HTTP query string may contain certain special options which are not interpreted as criteria on the records to be returned, but instead have some other effect.

fields = <expression>

This option selects which part of each record is to be returned. In the absence of this option, an entire record is returned for each row in the result of the SQL query. If this option is set, its value should be an XPath expression. The expression will be evaluated in the context of the single child of the <sql:record> element and should evaluate to a set of nodes; the part of the record returned is the smallest subtree containing all the nodes selected by the expression.

pagesize = <number>, page = <number>

These options give control over how many records are returned in one query, and which of several pages is returned. To put a limit on the page size which can be requested, use the maxpagesize option when creating the DBIx::XMLServer object. By default there is no limit.

order = <list>

This option controls how records are ordered in the output document. The <list> should be a comma-separated list of XPath expressions, each optionally followed by a space and the string ascending or descending. Each of these XPath expressions is evaluated within the context of the single child of the <sql:record> element and should select one or more fields; these fields are used to order the result records. Fields are used in the order that they appear in the list; if a single list element selects more than one field, they are used in document order.

HOW IT REALLY WORKS

When a DBIx::XMLServer object is created, the template file is parsed. A new Perl module is compiled for each field type defined.

The process() method performs the following steps.

  1. The HTTP query string is parsed. It is split at each `&' character, and each resulting fragment is un-URL-escaped. Each fragment is then examined, and a leading part removed which matches a grammar very similar to the Pattern production in XSLT (see http://www.w3.org/TR/xslt). This leading part is assumed to be an expression referring to a field in the <sql:record> element of the template, unless it is one of the special options fields, pagesize or page. If the <sql:template> has a default-namespace attribute, then any unqualified name in this expression has that default namespace added to it.

  2. Each criterion in the query string is turned into part of the WHERE clause. The leading part of each fragment of the query string is evaluated as an XPath expression in the context of the single child of the <sql:record> element. The result must be either a nodeset having a unique <sql:field> descendant; or an attribute on an element having a child <sql:field> element whose attribute attribute matches. In either case, a single <sql:field> element is found. That field's type is looked up and the resulting field type class's where method called, being passed the remainder of the fragment of the HTTP query string. The result of the where method is added to the WHERE clause; all criteria are combined by AND.

  3. A new result document is created whose document element is a clone of the <sql:template> element. The <sql:record> in this new document is located. The value of the special fields option is evaluated, as an XPath expression, within the unique child of that element, and the smallest subtree containing the resulting fields is formed. The rest of the record is pruned away. The SQL SELECT clause is now created by calling the select method of each of the <sql:field> elements left after this pruning.

  4. The `tables' part of the SELECT statement is formed by calling the join methods of all the tables which are referred to either in the search criteria, or by any of the field to be returned.

  5. The SELECT statement is executed. For each result row, a copy of the pruned result record is created. Each field in this record is filled in by calling the value method of the corresponding field type.

  6. The resulting document is passed through an XSL transform for tidying up before being returned to the caller.

BUGS

There are quite a lot of stray namespace declarations in the output. They make no difference to the semantic meaning of the markup, but they are ugly. I gather that XML::LibXML will provide the means to remove them in the near future.

The way we build JOIN expressions isn't very clever, and probably doesn't work for anything more than the simplest situations.

The way we add a default prefix to every XPath expression is a bit of a hack. I think the only way to fix this is to wait for XPath 2.0 to be widely available.

This module has been written to use MySQL and has only been tested on that platform. I would be interested to hear from users who have been able to make it work on other platforms.

SEE ALSO

DBIx::XMLServer::Field

AUTHOR

Martin Bright <martin@boojum.org.uk>

COPYRIGHT AND LICENCE

Copyright (C) 2003-4 Martin Bright

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.