The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

=head1 NAME
 
 OOPS - Object Oriented Persistent Store

=head1 SYNOPSIS

 use OOPS;

 transaction(sub {
	$oops = new OOPS 
		dbi_dsn => $DBI_DSN, 
		username => $username, 
		password => $password,
		table_prefix => "MY",
		readonly => 0;

	$oops->commit;

	$oops->{my_1st_bit_of_data} = 'a scalar';
	$oops->{my_2nd_bit_of_data} = { A => 'hash' };
	$oops->{my_3rd_bit_of_data} = [ qw(An Array) ];
	$oops->{my_4rd_bit_of_data} = \'a reference to a scalar, ref, hash, or array' ];

	my $old_value = $oops->{multiple}{level}{dereference};

	my $object = $oops->load_object($integer_object_id);

	my $dbh = $oops->dbh();

	$oops->workaround27555($reference);

	my $was_virtual = $oops->virtual_object(\%{$oops->{some}{hash}, [ $new_value ]);

 });

 my $ref = getref(%hash, 'key')

=head1 DESCRIPTION

The goal of OOPS is to make perl objects easily and transparently persistent.
OOPS handles deeply nested and cross-linked objects -- even object
hierarchies that are too large to fit in memory and (with a hint)
individual hash tables that are too large for memory.  Objects will be
demand-loaded into memory as they are accessed.  All changes to your
object hierarchy will be saved with a single commit().

Full transactional consistency is the only operational mode.  Either all
of your changes are saved or none of them are.	While your program runs, you
will see a consistent view of the data: no other running transactions
will change the data you see.  If another transaction changes data
that you are using then at least one of the transactions must abort.
OOPS will die() to abort the transaction.

OOPS maps all perl objects to the same RDBMS schema.  No advance schema
definition is required on the part of the user of OOPS.  The name of
the package (OOPS) comes from the realization that perl's data model
is much more complicated than I initially understood.  Internally, the
RDBMS schema uses four tables: a table of objects, a table of attributes
(keys and values), a table of large attributes that are too big to fit
in normal (VARCHAR) columns, and a table of counters.

At this time, OOPS is expecting a web-like work flow: 

 create OOPS instance

 access some objects

 modify some objects

 commit

 exit

If you need more than one transaction in a program, create more than
one OOPS instance.

To make your data persistent, make a reference to your data from the
OOPS object.  To later retrieve your data, simply access it through the
OOPS object.

=head1 EXAMPLE PROGRAM

 use OOPS;

 transaction(sub {
	my $oops = new OOPS 
		dbi_dsn => 'DBI:mysql:database=MY-DATABASE-NAME;host=localhost', 
		username => 'MY-USERNAME', 
		password => 'MY-PASSWORD',
		table_prefix => "MY-TABLE-PREFIX";

	my $p = $oops->{pages}{"/some/path"};

	$p->{next} = $oops->{pages}{"/some/other/path"};
	$p->{jpgs} = [ read_file("x.jpg"), read_file("y.jpg") ];

	$oops->commit;
 });

 exit;

=head1 SUPPORTED DATA TYPES

Perl blessed and unblessed HASHes, REFs, SCALARs, and ARRAYs are supported.
Currently, HASH keys may not be longer than 255 characters.  Class names may
not be more than 128 characters long.  References to hash keys and array
elements are supported.

At the current time, large ARRAYs are not efficient.  Use HASHes instead
if this matters to you.  References to array elements and hash values
are not efficient.

Large HASHes are supported by only loading keys as tehy are accessed.

HASHes, array elements, and REFs are implemented with tie().  ARRAYs are
not currently tie()d because of bugs in perl.  Multiple references to the
same scalar are supported.  References to array elements and hash values
are supported.	Persistent data is reference counted and cycles must be
manually broken to assure de-allocation.  A persistent data garbage
collector L<OOPS::GC> is supplied if you cannot guarantee to break your
own cycles.

=head1 SUPPORTED PLATFORMS

Support for the following RDBMSs are is included:

=over 15

=item PostgreSQL

OOPS 0.2005 has been tested with PostgreSQL version 9.1.3 on Linux.
OOPS 0.2002 has been tested with PostgreSQL version 8.1.8 on Linux.  
OOPS 0.1004 was tested with PostgreSQL version 8.1.4 on Linux.	
OOPS 0.1003 was tested with 7.4.2 and 7.3.5 on Linux and 7.3.5 on
pre-release DragonflyBSD.  Somewhere between 7.4.2 and 8.1.4 PostgreSQL
got more strict about TEXT versus BYTEA types.	OOPS 1.003 was using
TEXT where it should have been using BYTEA.  Be sure to upgrade OOPS
to 0.1004 or later *before* upgrading PostgreSQL to 8.x or later.

Biggest issue: PostgreSQL runs the test suite more slowly than mysql
or SQLite.

On rare occaision PostgresSQL will fail with an 
C<ERROR:  duplicate key violates unique constraint>
error.  If this happens inside a call to C<transaction()> it
will automatically restart.  You only need to worry about this
if you are not using C<transaction()>.

=item mysql

OOPS 0.2005 has been tested with mysql 5.1.46 on Linux.
OOPS 0.2003 has been tested with mysql 5.0.45 on Linux.
OOPS 0.2002 has been tested with mysql 5.0.22 on Linux.
OOPS 0.1004 has been tested with mysql 4.1.15 on Linux.
OOPS 0.1003 was tested with mysql 4.0.16 and 4.0.18 using InnoDB tables. 

The OOPS schema for mysql does not support large (>256 byte) hash
keys.  

Mysql locking doesn't work correctly.  To get correct behavior, access
to the database must be exclusive L<http://bugs.mysql.com/?id=28029>.
This problem is fixed in 5.0.45.

=item SQLite2

L<DBD::SQLite> versions 0.x use SQLite 2.x.  
L<DBD::SQLite2> is the continuation of DBD::SQLite version 0.x codebase and uses SQLite 2.x.

DBD::SQLite2 is not 8-bit clean with respect to C<\0>.
OOPS uses a DBD::SQLite2 feature to translate binary nulls.  
A side-effect is that backslash will be doubled C<\> -> C<\\> in DBD::SQLite2 (the extras will be removed
transparently by DBD::SQLite2.	

OOPS does not use the C<counters> table with DBD::SQLite2. 

=item SQLite

DBD::SQLite versions 1.x use SQLite 3.x.  

DBD::SQLite versions 1.x leak file descriptors and will eventually 
die if you create use and destroy too many OOPS instances.
DBD::SQLite versions 1.x cannot pass the regression suite and may
not be reliable in production use.  Use DBD::SQLite2 instead.

OOPS does not use the C<counters> table with DBD::SQLite. 

SQLite is not 8-bit clean with respect to C<\0>.
OOPS uses a DBD::SQLite feature to translate binary nulls.  
A side-effect is that backslash will be doubled C<\> -> C<\\> in DBD::SQLite (the extras will be removed
transparently by DBD::SQLite.	
As of 0.2002, OOPS doesn't make use of the 
newer (version 1.x) DBD::SQLite features for handling nulls.

To work around the problem with the leaked file descriptors, on systems
that support C</proc>, file descriptors in excess of 
C<$OOPS::sqlite_v3::enable_fd_leak_workaround> 
(500 by default)
that match the sqlite database file will be closed when new
OOPS instances are created.  This sometimes helps.  Set 
C<$OOPS::sqlite_v3::enable_fd_leak_workaround> 
to 0 to disable.

Related bug reports: 
L<http://rt.cpan.org/Ticket/Display.html?id=20286>,
L<http://rt.cpan.org/Ticket/Display.html?id=20288>.

=back

Perl versions 5.8.2 through 5.12.1 are supported.  Prior to 5.8.2, it wasn't
possible to untie scalars from within the a tied scalar access method.
An ugly workaround is possible if there is enough interest.

OOPS 0.2002 and 0.1004 have been tested on Linux 2.6.15 (Ubuntu 6.06LTS -
Dapper Drake).	OOPS 0.1003 was tested on Linux 2.4.23 (Debian unstable
as of April '04); on FreeBSD 4.9; and on DragonflyBSD prerelease.

As far as performance goes, mysql and SQLite are both about twice as
fast as PostgreSQL for applications that only have one transaction at
at time.  SQLite and mysql are particularly slow when there are multiple
transactions as their lock granularity is the entire database.

Each RDBMS uses a somewhat different schema.

=head1 FUNCTIONS

=over 4

=item C<transaction($funcref, @args)>

C<transaction()> is a wrapper for a complete transaction.   Transactions
that fail due to deadlock with other processes will be re-run
automatically.	

The first parameter is a reference to a function.  Any additional
parameters will be passed as parameters to that function.  The return
value of C<transaction()> is the return value of C<&$funcref()>.

It is not necessary to use the C<transaction()> method.  Beware that nearly
any operation on persistent date (even read operations) can cause deadlock.

Any use of persistent data can trigger a deadlock.  The
C<transaction()> function catches this and retries automatically up to
C<$OOPS::transaction_maxtries> times (15 times unless you change it).
If you don't use C<transaction()> you might want to catch the exceptions
that C<transaction()> catches.	To do this, you can regex match C<$@>
against C<$OOPS::transfailrx>.

Basically, transaction is a slightly fancier version of the following:

 sub transaction
 {
	my ($sub, @args) = @_;
	for (0..15) {
		eval {
			&$sub(@args);
		}
		last unless $@;
		next if $@ =~ /$OOPS::transfailrx/;
		die $@;
	}
 }

The important thing to notice is that your code will be called over and over
until the transaction succeeds.  This means you should write your code so that
it doesn't have any external side effects until after it calls commit().

The following package variables are available during transactions:

=over 4

=item C<$OOPS::transaction_tries>

The number of attempts at this transaction.  Starts with 1.

=item C<$OOPS::transaction_maxtries>

The maximum number of attempts that will be made.

=item C<$OOPS::transaction_failure_sleep>

The minimum amount of time to C<sleep()> if the transaction fails.

=item C<$OOPS::transaction_failure_maxsleep>

The maximum amount of time to C<sleep()> if the transaction fails.

=item C<@OOPS::transaction_rollback>

A list of functions to call in the event of a transaction rollback
(code refs).

=back

=item C<getref(%hash, $key)>

References to tied hash keys are buggy in all perls through 5.8.7 (and
beyond?).  Use C<getref(%hash, $key)> to create your reference to a tied
hash key.  See: L<http://rt.perl.org/rt3/Ticket/Display.html?id=27555>
and L<http://rt.perl.org/rt3/Ticket/Display.html?id=29224>.

 $ref = getref(%hash, $key);

Alternatively, use C<$oops-E<gt>workaround27555($ref)>.

Getref() and workaround27555() work around all the perl bugs
with tied hash key references.	Failure to use them may result
in unexpected and inconsistent results.

=item C<walk_hash(%hash, $stride, [ $previous_last_key ])>

If you have a large virtual hash, you may want to be able to 
iterate over the keys in the hash as a bunch of transactions
rather than as one transaction.

Walk_hash() can be called on any HASH.	If it isn't an OOPS
virtual hash, it will return keys in batches just like if it
had been called on an OOPS virtual hash.

Here is an example of how to use it:

 my $stride = 20;
 my $done = 0;
 my $lastkey;
 while (not $done) {
	transaction(sub {
		my $oops = OOPS->new(...);
		my @things = walk_hash(%{$oops->{my_table}}, $stride, $lastkey);
		for my $thing (@things) {
			# do stuff
		}
		$lastkey = $things[$#things];
		$done = 1 unless @things == $stride;
	});
 }

Here is another example of how to use it (with L<OOPS::TxHash>):

 use OOPS::TxHash;

 my $stride = 20;

 my %real_todo = ( stuff => undef );
 while (%real_todo) {
	transaction(sub {
		my $th = tie my %todo, 'OOPS::TxHash', \%real_todo or die;
		my $oops = OOPS->new(...);

		if (exists $todo{stuff}) {
			my @keys = walk_hash(%{$oops->{stuff}}, $stride, $todo{stuff});

			for my $k (@keys) {
				# work on $oops->{stuff}{$k}
			}

			$todo{stuff} = $keys[$#keys];
			delete $todo{stuff} unless @keys == $stride;
		}

		$oops->commit;
		$th->commit;
	});
 }

Here is yet another example (with a wrapper function):

 walk_eval(20, 
	sub { OOPS->new(...) },
	sub { my $oops = shift; $oops->{my_table} },
	sub { 
		my ($oops, $thing) = @_;
		# do stuff
	},
	allatonce => 0
 );

 sub walk_eval
 {
	my ($stride, $get_oops, $get_hash, $agent, %opts) = @_;
	my $done = 0;
	my $last = undef;
	$stride ||= 100;
	while (! $done) {
		transaction(sub {
			my $oops = &$get_oops();
			my $hash = &$get_hash($oops);
			my @items = walk_hash(%$hash, $stride, $last);
			if ($opts{allatonce}) {
				&$agent($oops, @items);
			} else {
				for my $item (@items) {
					&$agent($oops, $item);
				}
			}
			$oops->commit();
			$last = $items[$#items];
			$done = 1 unless @items == $stride;
		});
	}
 }

=back

=head1 PUBLIC CLASS METHODS

=over 4

=item C<OOPS-E<gt>new(%parameters)>

Creates a OOPS object instance.  More than one object instance is allowed at 
the same time.	Instances are unaware of each other.  Making a reference from
a persistent object in one instance to a persistent object in another instance
will cause recursive copying from one instance to the other.  (Untested).

The C<%parameters> are documented in the next section.

=item C<OOPS-E<gt>initial_setup(%parameters)>

Drops and recreates the database tables.  Don't use it too often :-)
The regression suite drops and re-creates the tables many times.  

The C<%parameters> are documented in the next section.

=back

=head1 PARAMETERS FOR NEW

The C<new()> and C<initial_setup()> methods take a hash specification
for their behavior.  Here are the parameters allowed:

=over 4

=item C<dbi_dsn> / C<$ENV{OOPS_DSN}> / C<$ENV{OOPS_DRIVER}> / C<$ENV{DBI_DSN}> / C<$ENV{DBI_DRIVER>

Many ways to specify the DSN for DBI: as an argument; as an
OOPS environment variable; as a DBI environment variable.  Use
at least one :-)  See the L<DBI> documentation for more details.

=item C<user> / C<$ENV{OOPS_USER}> / C<$ENV{DBI_USER}>

Three ways to specify the user for DBI: as an argument; as an
OOPS environment variable; as a DBI environment variable.  Not
required for all databases.

=item C<password> / C<$ENV{OOPS_PASS}> / C<$ENV{DBI_PASS}>

Three ways to specify the password for DBI: as an argument; as an
OOPS environment variable; as a DBI environment variable.  Not
required for all databases.

=item C<readonly>

If true, commit() will be disabled and the transaction isolation level
will be set to READ COMMITTED instead of SERIALIZABLE for the databases
where this matters (mysql, PostgreSQL).  This should be used if you
plan to scan across large amounts of data because otherwise everything you read
will end up locked.  This feature is still experimental (not covered by
the test suite).

=item C<less_caching>

If true, iterating over the keys of a virtual hash (see C<virtual_object>)
will not cache the key values in memory.  If your virtual hash is too
big to fit in memory, this is a very good idea.  Alternatively, you can
use the C<clear_cache> method (below).	This feature is still experimental
(not covered by the test suite).

=item C<table_prefix> / C<$ENV{OOPS_PREFIX}>

OOPS allows a prefix to be supplied for it's internal table names.
If you set a prefix of C<FOO_> then it will use a C<FOO_object> table
instead of an C<object> table.	This can be set as an argument to C<new()>
or it can be set with the environment variable C<$OOPS_PREFIX>.  This
allows multiple separate object spaces to exist within the same backend
SQL database.  It's intended use is to support testing vs. production
environments but it could also be used to segregate object spaces that 
don't link to each other.

The current tables and indexes are:

=over 15

=item object

Each row is an object.

=item attribute

Each row is an attribute value in an object.

=item big

Each row is a large attribute value.

=item counters

Counters for things like object ids.

=item temp

A temporary table.

=item group_index

Object grouping index.

=item value_index

Lookup by attribute value.

=back

=item C<auto_upgrade> / C<$ENV{OOPS_UPGRADE}>

The relational schema for OOPS is not same for all versions of OOPS.  If 
this is set, then an older schema will be upgraded to the current schema.
If this is not set, then OOPS will use the older schema unchanged.  In most
cases, this means that OOPS will use a historical version of itself rather
than the current version.   See the SCHEMA VERSIONS section of this document.

=item C<auto_initialize> / C<$ENV{OOPS_INIT}>

The DBMS must be initialized before it can be used: tables created, a few
rows inserted, etc.  This can be handled by calling C<initial_setup()> or
it can be handled automatically by C<new()> if C<auto_initialize> or
C<$OOPS_AUTO_INIT> is set.

=item C<default_synchronous> / C<$ENV{OOPS_SYNC}>

With SQLite, an additional parameter to C<OOPS-E<gt>new()> is recognized:
C<default_synchronous>.  Possible values are:

=over 8

=item FULL

Sync() all transactions to disk before returning.

=item NORMAL

The default: sync() at critical moments only - protects against program
failure, but not all power or OS failures.

=item OFF

Don't sync() at all and go really fast.

=back

=back

=head1 PUBLIC OBJECT METHODS

=over 4

=item C<-E<gt>commit()>

Writes any changed objects back to the database and commits
the transaction.  Currently only one commit() call is allowed.	
Do not access your persistent data after commit() -- it may work
but this is not covered well in the regression suite.

=item C<-E<gt>virtual_object(\%hash [,$new_value])>

Queries [and sets] the load-virtual flag on a persistent hash.	Hashes that
load virtual will do separate queries for each key rather than load the
entire hash.  This is a good thing if your has has lots of keys.  This flag
takes effect the next time the hash is loaded.	The value is a perl boolean.

This may be handled automatically in the future.

=item C<-E<gt>workaround27555($reference)>

References to tied hash keys are buggy in all perls through 5.12.1 (and
beyond?).  Use workaround27555($reference) to register your new tied
hash key references so that they can be transformed into references that
actually work correctly.

 $ref = \%hash{$key};
 $oops->workaround27555($ref);

C<workaround27555()> is harmless if called on other sorts of
references so it is safe to use indiscriminately.
See L<http://rt.perl.org/rt3/Ticket/Display.html?id=27555>.

Alternatively, use C<getref(%hash, $key)>.

=item C<-E<gt>dbh()>

This returns the main DBI database handle used by OOPS.  This function
is provided for those who want to hand-write queries.  Please note: no
changes are written to the DBMS by OOPS until C<$oops->commit()>.

=item C<-E<gt>load_object($id)>

This will load a persistent object by number.  It returns the object
or undef if the object doesn't exists.	This function is provided for
those who want to hand-write queries.

=item C<-E<gt>clear_cache()>

This will clear out some of the cache that oops maintains.   It is useful
if you're iterating over the keys of a large virtual object and don't want to
overflow memory.  This feature is still experimental (not covered by the
test suite).

=item C<-E<gt>lock($ref)>

Lock an object or a scalar.  While this is never needed for transactional
consistentcy within the OOPS data, it can be useful for transactions
that have external side-effects like sending an email.	In theory, if you
pre-lock everything you might change, your transaction will not need to
be rolled back due to deadlock.  When you lock an object, it is the object
record that is locked, not the contents.  Locking C<\%hash> won't stop another
process from modifying C<$hash-E<gt>{key}>.  Lock C<\$hash-E<gt>{key}>
if you want that.

=back

=head1 LIMITATIONS, ERRATA, DEVELOPMENT STATUS, BUGS

OOPS has been thoroughly tested.  The regression suite is very well
developed: there is twice as much code in the test suite as there is in
the module itself.  The suite does over 1.5million tests.  
I have so much confidence in my test suite, I'm offering a bounty on bugs!

=head2 Limitations

OOPS caches in memory all data that is pulled in during a transaction.
For C<virtual_object> hash'es only, there are ways to prevent absolutely
everything from being cached (the C<less_caching> contructor agrument and
the C<clear_cache> method) but for other objects you need to have enough
memory to hold everything you touch.

You can only call -E<gt>commit once per OOPS instance.	The cache is 
emptied when the OOPS instance is destroyed.  This means that
OOPS will be loading up its cache over and over.

=head2 Known bugs in OOPS

=over 4

=item memory leaks

OOPS currently has memory leaks. This may or may not matter to your
application.  The rate of leakage varies depending on which RDBMS is
used.  SQLite seems to have the most significant problems.  Most of
the leaks are not in OOPS itself but in the modules it uses and thus
are not easily addressed by changing OOPS.

=item delayed DESTROY

Additional references to the in-memory copies of persistent data are
kept by OOPS.  These extra references will prevent DESTROY methods from
being called as soon as they otherwise would be.  They'll usually be
delayed until the OOPS object is itself DESTROYed.

=item other magic

Other perl magic attributes are not currently stored
persistently.  Many probably could be supported, but many
could not.  For example, taint does not work on tied hashes:
L<http://rt.perl.org/rt3/Ticket/Display.html?id=6758>.

=item unreferenced blessed scalars

When you bless a reference to a scalar value, the blessing is stored
with the scalar, not the reference.  The blessing remains even if there
is no reference to the scalar.	The following code prints C<true>.

 my $x = 'foobar';
 my $y = \$x;
 bless $y, 'baz';
 $y = 7;
 $y = \$x;
 my $z = ref($y);
 print "true\n" if $z eq 'baz';

At the current time, OOPS does not store such blessings.  OOPS does
not remember blessings when there isn't a reference.

=item re-blessing the OOPS object

Reblessing the OOPS object is not supported.

=item DBD::Pg and ASCII NULL

DBD::Pg does not easily support ASCII NULL.  OOPS has only partial
support for ASCII NULL with PostgreSQL.  Everthing works, but don't
have ASCII NULL in your class names.

=item Circular references sometimes break

OOPS mostly allows you to make circular self-references:

 my $x;
 $x = \$x;

In some (rare) cases circular self-references sometimes break.	Since
I can't think on any good reason to make a circular self-reference, I'm
not going to fix this bug.

=item Long hash keys

The database schema for OOPS does not support huge hash keys.

=back

=head2 Bugs in perl that effect OOPS

=over 4

=item References to hash keys

Persistent hashes are implemented with tie.  There are bugs with 
perl's implementation of references to tied hash keys.	These bugs will
be triggered in several situations: creating a reference to tied hash
key that doesn't exist yet; deleting a key that has a reference tied to
it; assigning through a reference to a key that has multiple references.

All of the above can either be avoided or you can workaround them by
either calling C<workaround27555>($YOUR_REFERENCE) whenever you create 
a tied hash key reference or by using C<getref(%hash, $key)> to 
create your reference.

The perl bugs are documented in:
L<http://rt.perl.org/rt3/Ticket/Display.html?id=27555> and
L<http://rt.perl.org/rt3/Ticket/Display.html?id=29224>.

=item C<local> and tie

C<local(%some_tied_hash)> doesn't work right.  
Thus C<local(%some_persistent_hash)> won't work right either:
L<http://rt.perl.org/rt3/Ticket/Display.html?id=6017>.

=item C<scalar(%hash)>

Tied C<scalar(%hash)> support was added in perl 5.8.3 and does not
exist in 5.8.2.

=item Tied arrays don't work right

There are a couple of bugs with tied arrays that prevent OOPS from
using them: L<http://rt.perl.org/rt3/Ticket/Display.html?id=22570>
and L<http://rt.perl.org/rt3/Ticket/Display.html?id=22571>.  OOPS
fully loads arrays into memory to work-around this problem.  This isn't
a big deal unless you've got big arrays.  

This was fixed in perl 5.10, but OOPS does not yet take advantage of
tied arrays.

=item SQLite and perl's malloc().

If SQLite is used with a perl that has been compiled to use perl's
C<malloc()>, it will report LOTS of C<Bad free() ignored (PERL_CORE)>
errors.  It is not currently known if these errors are harmful beyond
generating lots of output to STDERR.  The default perl configuration on
FreeBSD uses perl's C<malloc()>.

=back

=head1 SCHEMA VERSIONS

As OOPS is developed, the database schema that OOPS uses changes.

OOPS notices if the version of the schema in the DBMS is different than
the version the code currently supports.  When this happens there are
two possibilities: either OOPS will upgrade the schema to the current
version or it will use an older version of OOPS to access the data.
See the C<auto_upgrade> notes in the B<PARAMETERS FOR NEW> section.

This behavior allows the OOPS module to be upgraded without disrupting
installed applications.

=head1 DATABASE ADMINISTRATION

Like Perl, OOPS uses reference counting.  If you have circular
data structures and you delete all you references to the circular
data structure, you'll leak persistent memory.

Use L<OOPS::GC> to mop up the leak.

Previous versions of OOPS have had bugs that caused the reference
counts on the persistent data to become inaccurate.  Use
L<OOPS::Fsck> to fix such problems.  Do this before once, before
using L<OOPS::GC>.

=head1 FUTURE DIRECTIONS

OOPS isn't done.  There are a bunch of things that I am considering
adding to it.  If any of these things is important to you, speak up so
that I know there is interest...

=over 4

=item fix the bugs

There are bugs listed in the DEVELOPMENT STATUS section that
could be fixed.  First up is fixing the memory leaks that are
in OOPS itself.

=item code cleanup and general performance enhancements

The initial releases of OOPS concentrated on correct behavior
and other aspects of the module were somewhat ignored.	The code
could be cleaned up a bunch. 

=item perl-syntax SQL query translator

 SELECT Employee WHERE $Employee->{salary} > 5000

It's possible to translate perl-syntax queries into real SQL
that can be used to query the object store.

=item better grouping

Objects are loaded in groups rather than individually.	There is
much room for improvement in choosing how groups are formed.  This
is largely undeveloped as yet.

=item caching

Many possibilities.  A cache-invalidation daemon to note when objects
have changed.  Re-verification of touched data from the database.  
Ability to call commit() more than once. 

=item weak references

Support for persistent weak references is possible.

=item external references to objects

Currently objects are reference counted internally.  You must have
a reference to something from an already existing object for it
to continue to exist.  

=item contracts

OOPS has to do a lot of scanning of objects to see
if they've changed.  Explicit notification of changes would improve
performance.

OOPS could call functions before saving and after
loading to transform objects for a better or cleaner on-disk 
representation.

=item support for 'base' & accessor methods

This isn't something that I care about but maybe someone else does?

=item schema enforcement

Allow explicit schemas to be defined.  Do not save objects that 
don't conform.	Eg: Hash::Util::lock_keys().

=item RDBMS -> object mapping

Map existing RDBMS schemas into objects. 

=item data viewer

Viewing large datasets of deep and cross-linked data
is difficult.  Perhaps a CGI-based or Tk-based data navigator
would help.

=item support for tied data structures

It is possible to support storing tied data.  The tied object is what
would need to be persistent.  This would only work on some kinds of
ties.

=item support for other base types.

Right now, just HASH, SCALAR, REF, and ARRAY are supported.  Regular
expressions, file handles,  I don't know it's possible to support code
references.

=item on-line data migration

By doing double updates, OOPS could support live migration from one
DBMS to another.

=back

=head1 WRITING SQL QUERIES BY HAND

If you want to query your data, then until a translator is written, your
only choice for making queries is to write them by hand.  Using your
data does not require a query: anything you've got a reference to will
be loaded as you access it.  Queries are for performing searches that
don't have a perl-object index.

Each perl HASH, REF/SCALAR, or ARRAY has a row in the C<object> table and
multiple rows in the C<attribute> and C<big> tables.

Here are the columns you'll care about:

=over 12

=item object

There is one row per perl object. 

=over 10

=item id

The object id.

=item class

The blessed class name (limited to 255 characters).

=item otype

The type of object:

=over 5

=item 'H'

A HASH.

=item 'A'

An ARRAY.

=item 'S'

A SCALAR or REF.

=back

=back

=item attribute

This is a table of key/value pairs.  The keys correspond to perl hash keys
and perl array indexes.  The values correspond to perl hash values and
array element values.

=over 10

=item id

The object id.

=item pkey

The hash key or array index.

=item pval

he hash value or array value.  Limited to 255 characters.

=item ptype

Flags the type of the value.  Possible values are:

=over 5

=item '0'

A normal value.  Numeric or string.

=item 'B'

An big value.  C<pval> will be a copy of the start of the
value for the first N characters.  The end of C<pval> will be
a MD5 checksum of the full big value.

=item 'R'

A reference to another object.

=back

=back

=item big

This is a table of values that were too large for the normal columns.
Even with databases that support wide columns, a separate big table
is used so that you don't load large scalars unless you actually
need the value.

=over 10

=item id

The object id.

=item pkey

The hash key or array index.

=item pval

The hash value or array value.	Limited to whatever the
underlying database will support as it's largest blob.

=item fragno

Blob fragment number.  This column only exists with SQLite.  SQLite has a
smallish maximum row size and so big values must be split into multiple
rows.

=back

=back

REFs are are special.	There are several types of 
REFs: references to scalar values; references to
objects; secondary references to scalar values; references
to scalar values that are part of another object 
(references to hash keys and references to array elements).

The representation of references is designed so that you
don't need to care what sort of REF it is when you're doing
a query.

The basic REF is a ref to a value inside another object.
An example:

 OBJECT TABLE
 id		class		otype

 1		OOPS::NamedObj	H
 383		SCALAR		R
 384		SCALAR		R
 385		SCALAR		R
 386		REF		R
 400		HASH		H
 500		ARRAY		A

 ATTRIBUTE TABLE
 id		pkey		pval		ptype

 1		A500		500		R
 1		H400		400		R
 1		R383		383		R
 1		R384		384		R
 1		R385		385		R
 1		R386		386		R

 383		400		'a-key'		0

 384		384		'nopkey'	0
 384		'nopkey'	'a-value'	0

 385		384		'nopkey'	0

 386		386		'nopkey'	0
 386		'nopkey'	500		R

 400		'a-key'		'a-value	0
 400		'another-key'	'another-value'	0
 400		'A500'		500		R

 500		0		'a-value'	0
 500		1		'another-value'	0

HASH 1 is C<%$oops>.

REF 383 is a reference to the key 'a-key' in object
#400 (a HASH).

REF 384 is a ref to scalar.  It uses two rows to make writing
queries easier.

REF 385 is a duplicate reference to a scalar value.  It duplicates
REF 384.  In behavior, these two REFs should be identical even though
they are represented differently in the database.

REF 386 is a ref to an object: #500 (an ARRAY).

HASH 400 is a normal hash.

ARRAY 500 is a normal hash.

This example data is what you would end up with after running code like:

 my $oops = new OOPS 
	dbi_dsn => 'DBI:mysql:database=MY-DATABASE-NAME;host=localhost', 
	username => 'MY-USERNAME', 
	password => 'MY-PASSWORD';

 $oops->{A500} = [ 'a-value', 'another-value' ];

 $oops->{H400} = { 
	'a-key' => 'a-value',
	'another-key' => 'another-value',
	'A500' => $oops->{A500},
 };

 $oops->{R383} = \$oops->{H400}{'a-key'};
 $oops->workaround27555($oops->{R383});

 $oops->{R384} = \'a-value';

 $oops->{R385} = $oops->{R384};

 $oops->{R386} = \$oops->{A500};

 $oops->commit;

SQL queries require a bunch of joins to link data structures together.
Here are some examples.

=over 4

=item C<SELECT Foobar WHERE $Foobar-E<gt>{xyz} = 'abc'>

 SELECT	object.id
 FROM	object, attribute
 WHERE	object.class = 'Foobar'
 AND	object.id = attribute.id
 AND	object.otype = 'H'
 AND	attribute.pkey = 'xyz'
 AND	attribute.pval = 'abc'
 AND	attribute.ptype = '0'

=item C<SELECT Foobar WHERE ${$Foobar-E<gt>{xyz}} = 'abc'>

This example should show why an automatic translator would be a good
idea...

 SELECT	ohash.object
 FROM	object AS ohash,
	attribute AS ahash,
	object AS oref,
	attribute AS aref,
	attribute AS target
 WHERE	ohash.class = 'Foobar'
 AND	ohash.otype = 'H'
 AND	ahash.id = ohash.id
 AND	ahash.pkey = 'xyz'
 AND	ahash.ptype = 'R'
 AND	oref.id = ahash.pval
 AND	oref.otype = 'S'		# this is the outer ref
 AND	oref.id = aref.id		
 AND	aref.pval = target.pkey		# here's the reference indirection
 AND	target.pval = 'abc'
 AND	target.ptype = '0'

=back

If you construct a query like these examples that return object id's, then
use C<$object = $oops-E<gt>load_object($id)> to load them into memory.

I recommend that hand-written queries be read-only as there are 
additional columns that must be kept consistent.  For example, the
object table includes a reference count column to handle garbage
collection of the persistent data.

=head1 RUNNING THE REGRESSION TEST SUITE

The regression test suite empties and re-creates the persistent store
over and over again.  To prevent the accidental erasure of production
data, all of the tests require a special environment variable to be
set C<$OOPSTEST_DSN>.  This variable replaces the normal C<$DBI_DSN>
or C<$OOPS_DSN>.  Correspondingly there is a C<$OOPSTEST_USER>,
C<$OOPSTEST_PASS>, and $OOPSTEST_PREFIX.

Set these variables to something different than what you use for
your production data!

Most of the tests take a long time to run and are disabled by default.
If you can run the full suite in less than six hours please tell me about
your configuration.

Beware mysql logging.  On Debian unstable, the default configuration
for mysql used to log every SQL statement.  Running the test suite to
completion will generate several gigabytes of log file.  Running out of
disk space will cause the tests to fail.  On DragonflyBSD
the default mysql configuration includes making replication master logs.

=head1 THE COMPETITION

There are a number of other modules that make perl objects persistent.

Other object persistence solutions either store data in blobs that
cannot be querried or they they're a mapping from a relation database
into perl data that does not allow arbitrary data to be stored.

L<http://poop.sourceforge.net/> has an overview of options.

=head1 EXAMPLE PROGRAMS

The following applications use OOPS:
L<Qpsmtpd::Plugin::Quarantine> 

=head1 LICENSE

Copyright(C) 2004-2006 David Muir Sharnoff <muir@idiom.org>
Copyright(C) 2012 Google, Inc

OOPS may be used, modifified and redistributed on the same terms
as Perl itself.