The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Rosetta::Overview - Rosetta compared to DBI

DESCRIPTION

CAVEAT: This document's contents are old, and some details out of date or even woefully incorrect, but it will only be replaced in the indetermined future; meanwhile, many parts and details are still just fine.

This document is an overview of Rosetta DBMS framework, mainly done in terms of comparing and contrasting it with the Perl DBI.

Details of the architecture, such as data structures, allowed inputs, and functional APIs, are discussed in other documents.

Rosetta provides a common API for access to relational databases of all shapes and sizes, including those implemented with libraries embedded in your application (such as SQLite) and those implemented with a client-server architecture (such as Postgres, MySQL or Oracle).

The API is rigorously defined, such that an application written to it should easily port to alternative relational database engines with minimal effort.

This might seem strange to somebody who has not tried to port between databases before, especially given that the Perl DBI purports to provide "Database Independence". However, the level of DBI's provided independence is Database Driver Independence, and not Database Language Independence. To further demonstrate the difference, it is useful to compare the DBI and Rosetta.

SIMILARITIES TO DBI

Rosetta and the Perl DBI have a lot in common:

  • Both are highly modular, and see this as a fundamental feature. The core of both just defines a common database access API which multiple interchangable back-ends implement (each of which is usually made by a third party).

    Each back-end would usually map the framework API to a pre-existing native API for a particular database manager (such as with PostgreSQL), or it would sometimes implement the framework API itself (such as for CSV files).

    DBI calls these back-ends Drivers, since they sit on top of and control the database manager (following a hardware device driver analogy), while Rosetta calls its back-ends Engines, to emphasize that they are effectively its implementations, and be more abstract that they may not be controlling something else.

  • Both can use multiple back-ends at once within the same program.

  • Both are conceptually close to the typical underlying database manager, and their native languages are expressed in relational database terms. Terms such as connections, users, permissions, transactions, schemas, tables, queries, views, joins, data manipulations, constraints, columns, rows, nullability, primary and foreign keys, domains, sequence generators, statements, preparations, cursors, LOBs, indexes, etc, are the common language that they share.

  • Both focus on being a transport medium between the database and the application, endeavouring to give applications the widest degree of interaction possible, and access to the most database features as possible, with relatively few transformations between both ends.

    An application using either to access a database should be able to do all the same useful things they can do using the native API for that database.

  • Both provide mechanisms for running data queries, and performing schema manipulation or discovery.

  • Both are designed to be widely applicable and defacto standard tool kits upon which enterprising third parties can build alternate database access APIs that better serve the needs of themselves and others in less generic situations.

    Both take care of lower level transport matters and some cross-database portability issues, so that alternate APIs layered on top of them can focus on their added value instead.

  • Neither cache database records locally, although they can buffer row sets from the database that are fetched piecemeal by an application. They do cache database connections and prepared statements when appropriate/requested.

  • Both frameworks are designed to work with any existing database as it is, and do not store any of their own meta-data in the database or require their own custom schemas, using special tables such as some alternate APIs do.

    With Rosetta, any meta-data is stored using a database's native mechanisms (such as what CREATE TABLE can specify). A point of difference here is that if you prefer the database schema to be the "slave", the object version of the database schema can be stored anywhere else at all - including another database or space in the database.

DIFFERENCES TO DBI

Having described how similar DBI and Rosetta are, we should now talk about the differences.

Rosetta has a much higher emphasis on cross-database portability than DBI does. It achieves this through a more highly abstracted API, which also allows for a greater level of transformations of queries and schemas.

The most prominant example of this is that DBI does transparent pass-thru of SQL statements, such that the database receives the literal SQL string that the application provides, perhaps with a few limited string substitutions as provided for by the Driver. The application must know about and code to each database manager's proprietary SQL dialect, not the least of which includes the names of data types, formats of dates, and the syntax of outer joins.

By contrast, Rosetta will take its inputs as or process its inputs into an abstract syntax tree (AST) that represents the meaning (and has corresponding structure) to one or more standard SQL:2003 statements, and each Rosetta back-end will translate this into the database manager's proprietary SQL version (assuming a SQL engine is being used). In this way, the application only has to know *what* it wants to tell the database, and not how to express that in the database's native dialect.

That being the summary difference, other differences follow. Rosetta:

  • Represents all host parameters, query columns, stored procedure parameters, built-in function calls, and other such things in named format (such as ":foo") rather than positional format (?), so that the right thing will happen despite generated SQL using them in a different order than applications declare them or pass to them.

    These might be converted to positional placeholders when queries are run - it really doesn't matter. This abstraction alone saves a common DBI abstraction implemented by many DBI programmers from being necessary.

  • Uses a native SQL AST that is fully "decomposed" and does not normally contain any fragments of string SQL, although it does give the option for such to be used in esoteric situations.

    This is meant to be a fall-back "circumvention" option for savvy developers that understand the consequences, so that no one actually loses functionality they had with DBI when using Rosetta. This is akin to putting CPU-specific assembly language in your code, and is better as a short term solution rather than a long-term one.

  • Does basic transformations of data that it shuttles to and from the database, presenting a homogenous format.

    For example, strings are passed around in Unicode, and dates are dealt with in the standard ISO-8601 format natively supported by most Perl Date & Time modules.

    Rosetta internally uses native format numbers, such as string coded decimals, which have exact precision and are effectively of unlimited length. They won't lose accuracy in conversion for large values like floating point numbers. Rosetta will return numbers as Perl strings and coerce any input to such as well.

    Rosetta natively supports binary data, which it receives and returns exactly as is.

    By contrast, DBI passes-thru all data in the way the database sent it, which the application has to know about and specify, and doesn't guarantee particular formats between databases such that would help with portability.

  • Returns schema definitions when reverse-engineering an existing database in the same AST format it takes as input, so the result Rosetta can be passed right back in to make a clone if one wishes.

    By contrast, the DBI API has some native functions to return some schema info as hashes and arrays, and requires the application to encode SQL that requests the database send most other details in its native format, such as string SQL or information schema rows; the results are heterogeneous.

  • The DBI is mature and heavily battle tested, while Rosetta is a lot newer and will need more time before it can be considered as stable.

Rosetta can be implemented as an alternate API over DBI (and the reverse is also true) whose main contribution is SQL generation and parsing, and limited data munging, but Rosetta can also be used independently of DBI; the choice is up to the Rosetta Engine implementer.

The choice between using DBI and using Rosetta seems to be analogous to the choice between the C and Ruby programming languages, respectively, where each database product is analogous to a hardware CPU architecture or wider hardware platform.

The DBI is great for people who like working as close to the metal as possible, with much closer access to (or less abstraction from) each database product's native way of doing things, those who want to talk to their database in its native SQL dialect. Rosetta is more high level, for those who want the write-once run-anywhere experience, less of a burden on their creativity, more development time saving features.

Rosetta has several algorithmic differences from the DBI that can impact performance, so the net performance of your program depends on how you use each of them. Assuming it is used properly (which shouldn't be too difficult), Rosetta can deliver a net performance that is similar to the DBI, so your programs should not become slower when using it, and they may even become faster. For example:

  • Rosetta performs better (as does the DBI) with programs that reuse open database connections for multiple transactions, rather than closing and reopening separate connections for each one. It likewise performs better when prepared statements are reused for multiple executions, rather than re-preparing for each execute. Likewise, when queries are optimized to just return the data we actually need, and not extra that will just be ignored. Likewise, when work is pushed onto the database server, such as by using stored procedures.

  • Rosetta performs better with longer running programs (including some mod_perl scripts) where its ASTs or data structures and generated string SQL or Perl closures can be produced once (often at the start of a program) and cached for re-use. In this situation, most of the extra work that Rosetta does relative to DBI is factored out. Also, a good Rosetta Engine usually does this caching work for you, so your program doesn't gain complexity from its use.

  • Rosetta performs worse than the DBI with briefer running programs that talk to non-Perl databases which take string SQL as their native input, and all of the SQL strings a program will issue are entirely pre-written and hard-coded (bind variables exempted) in the database's native dialect. In this situation, DBI has essentially no overhead at all, strictly passing thru the SQL, whereas Rosetta has the SQL generation overhead, and caching statements that are only run once before a program quits is useless.

  • Rosetta performs better than the DBI with programs that talk to databases coded in Perl, which either do not take string SQL as their native input, or those that do but also provide an alternate API where you can directly supply the Perl data structures that it would parse SQL into. In this situation, the round trip of both generating and then parsing string SQL by Perl code is avoided, saving a considerable amount of work. These databases would either read the Rosetta AST natively, or the Rosetta Engine would simply copy between corresponding AST nodes.

  • Programs that already use some other database language abstraction tool in combination with DBI will almost certainly perform better by replacing the combination with Rosetta, since Rosetta is both more optimized towards pushing work to the database when possible, and better than making fast Perl emulations when pushing isn't possible. Likewise,

  • Programs that do work themselves which should conceptually be pushed to a database, even when not for the purpose of abstraction, will perform better if they push such work onto Rosetta, which can in turn either push it onto a capable database, or perform better at emulation if it can't.

  • Rosetta performs better with programs that natively define or generate from data dictionaries Rosetta's AST for their database requests, rather than defining or generating any variant of string SQL which a Rosetta wrapper then has to parse before use. Of course, the latter is what you would have if you want to take a DBI using program and move to Rosetta with the fewest material changes possible; this solution should still be very useable, just sub-optimal.

SEE ALSO

Go to Rosetta for the majority of distribution-internal references, and Rosetta::SeeAlso for the majority of distribution-external references.

AUTHOR

Darren R. Duncan (perl@DarrenDuncan.net)

Some editorial assistance from Sam Vilain (samv@cpan.org).

LICENCE AND COPYRIGHT

This file is part of the Rosetta DBMS framework.

Rosetta is Copyright (c) 2002-2006, Darren R. Duncan.

See the LICENCE AND COPYRIGHT of Rosetta for details.

ACKNOWLEDGEMENTS

The ACKNOWLEDGEMENTS in Rosetta apply to this file too.