The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

KiokuDB::Tutorial - Getting started with KiokuDB

INSTALLATION

The easiest way to install KiokuDB along with a number of backends is Task::KiokuDB.

KiokuDB depends on Moose and a few other modules out of the box, but no specific storage module.

KiokuDB is a frontend to several backends, much like DBI uses DBDs to connect to actual databases.

For development and testing you can use the KiokuDB::Backend::Hash backend, which is an in memory store, but for production use KiokuDB::Backend::DBI or KiokuDB::Backend::BDB are the recommended backends.

See below for instructions on getting KiokuDB::Backend::BDB installed.

CREATING A DIRECTORY HANDLE

A KiokuDB directory is the main object through which all work is done.

The simplest directory that is ready for use can be created like this:

    my $dir = KiokuDB->new(
        backend => KiokuDB::Backend::Hash->new
    );

We will revisit other more interesting backend configuration later in this document, but for now this will do.

You can also use DSN strings to connect to the various backends:

    KiokuDB->connect("hash");

    KiokuDB->connect("dbi:SQLite:dbname=foo", create => 1);

    KiokuDB->connect("bdb:dir=foo", create => 1);

You can also use a configuration file:

    KiokuDB->connect("/path/to/my_db.yml");

Which is just a YAML file:

    ---
    # these are basically the arguments for 'new'
    backend:
      class: KiokuDB::Backend::DBI
      dsn: dbi:SQLite:dbname=/tmp/test.db
      create: 1

USING THE DBI BACKEND

During this tutorial we will be using the DBI backend for two reasons. The first is DBI's ubiquity. The second is the possibility of easily looking behind the scenes, to more clearly demonstrate what KiokuDB is doing.

That said, the examples will work with all backends exactly the same.

First we create $dir:

    my $dir = KiokuDB->connect(
        "dbi:SQLite:dbname=kiokudb_tutorial.db",
        create => 1, # this causes the tables to be created
    );

Note that if you are connecting with a username and password you need to specify these as named arguments:

    my $dir = KiokuDB->connect(
        $dsn,
        user     => $user,
        password => $password,
    );

INSERTING OBJECTS

Let's start by defining a simple class using Moose:

    package Person;
    use Moose;

    has name => (
        isa => "Str",
        is  => "rw",
    );

We can instantiate it:

    my $obj = Person->new( name => "Homer Simpson" );

and insert the object to the database as follows:

    my $scope = $dir->new_scope;

    my $homer_id = $dir->store($obj);

This is very trivial use of KiokuDB, but it illustrates a few important things.

First, no schema is necessary. KiokuDB uses Moose to introspect your object without needing to predefine anything like tables.

Second, every object in the database has an ID. If you don't choose an ID for an object, KiokuDB will assign a UUID instead.

This ID is like a primary key in a relational database.

You can also specify an ID instead of letting one be generated:

    $dir->store( homer => $obj );

Third, all KiokuDB operations need to be performed within a scope. The scope is not really doing anything important in this simple example, but becomes necessary when cycles and weak references are in use. We will look into that in more detail later.

LOADING OBJECTS

So now that Homer has been inserted into the database, we can fetch him out of there using the ID we got from store.

    my $homer = $dir->lookup($homer_id);

Assuming that $scope and $obj are still in scope, $homer and $obj will actually be the same object:

    # this is true:
    refaddr($homer) == refaddr($obj)

This is because KiokuDB tracks which objects are "live" in the live object set (KiokuDB::LiveObjects).

If the object wasn't already in memory then KiokuDB would have fetched it from the backend instead.

WHAT WAS STORED

Let's peek into the database:

    % sqlite3 kiokudb_tutorial.db
    SQLite version 3.4.0
    Enter ".help" for instructions
    sqlite>

The database schema has two tables, entries and gin_index:

    sqlite> .tables
    entries    gin_index

gin_index is used for more complex queries, and we'll get back to it at the end of the tutorial.

For now let's just have a closer look at entries:

    sqlite> .schema entries
    CREATE TABLE entries (
      id varchar NOT NULL,
      data blob NOT NULL,
      class varchar,
      root boolean NOT NULL,
      tied char(1),
      PRIMARY KEY (id)
    );

The main columns are id and data. In KiokuDB every object has an ID which serves as a primary key and a BLOB of data associated with it.

Since the default serializer for the DBI backend is KiokuDB::Serializer::JSON, we examine the data.

First let's set sqlite's output mode to line. This is easier to read with large columns:

    sqlite> .mode line

And select the data from the table:

    sqlite> select id, data from entries;
       id = 201C5B55-E759-492F-8F20-A529C7C02C8B
     data = {"__CLASS__":"Person","data":{"name":"Homer Simpson"},"id":"201C5B55-E759-492F-8F20-A529C7C02C8B","root":true}

As you can see the name attribute is stored under the data key inside the blob, as is the object's class.

The data column contains all of the data necessary to recreate the object.

All the other columns are only for searches. Later on you'll also see how to create user defined columns.

When using KiokuDB::Backend::BDB the on-disk format is just a hash of id to data with no additional columns.

OBJECT RELATIONSHIPS

Let's extend the Person class to hold some more interesting data than just a name:

    package Person;

    has spouse => (
        isa => "Person",
        is  => "rw",
        weak_ref => 1,
    );

This new spouse attribute will hold a reference to another person object.

Let's first create and insert another object:

    my $marge_id = $dir->store(
        Person->new( name => "Marge Simpson" ),
    );

Now that we have both objects in the database, let's link them together:

    {
        my $scope = $dir->new_scope;

        my ( $marge, $homer ) = $dir->lookup( $marge_id, $homer_id );

        $marge->spouse($homer);
        $homer->spouse($marge);

        $dir->store( $marge, $homer );
    }

Now we have created a persistent object graph, that is several objects which point to each other.

The reason spouse had the weak_ref option was so that this circular structure will not leak.

When then objects are updated in the database, KiokuDB sees that their spouse attribute contains references, and this relationship will be encoded using their unique ID in storage.

To load the graph, we can do something like this:

    {
        my $scope = $dir->new_scope;

        my $homer = $dir->lookup($homer_id);

        print $homer->spouse->name; # Marge Simpson
    }

    {
        my $scope = $dir->new_scope;

        my $marge = $dir->lookup($marge_id);

        print $marge->spouse->name; # Homer Simpson

        refaddr($marge) == refaddr($marge->spouse->spouse); # true
    }

When KiokuDB is loading the initial object, all the objects the object depends on will also be loaded. The spouse attribute contains a reference to another object (by ID), and this link is resolved at inflation time.

The purpose of new_scope

This is where new_scope becomes important. As objects are inflated from the database, they are pushed onto the live object scope, in order to increase their reference count.

If this was not done, by the time $homer was returned from lookup his spouse attribute would have been cleared because there is no other reference to Marge.

This demonstrates why:

    sub get_homer {
        my $homer = Person->new( name => "Homer Simpson" );
        my $marge = Person->new( name => "Marge Simpson" );

        $homer->spouse($marge);
        $marge->spouse($homer);

        return $homer;

        # at this point $homer and $marge go out of scope
        # $homer has a refcount of 1 because it's the return value
        # $marge has a refcount of 0, and gets destroyed
        # the weak reference in $homer->spouse is cleared
    }

    my $homer = get_homer();

    $homer->spouse; # this returns undef

By using this idiom:

    {
        my $scope = $dir->new_scope;

        # do all KiokuDB work in here
    }

You are ensuring that the objects live at least as long as is necessary.

In a web application context you usually create one new scope per request. In fact, Catalyst::Model::KiokuDB does this automatically.

REFERENCES IN THE DATABASE

Now that we have an object graph in the database let's have another look at what's inside.

    sqlite> select id, data from entries;
       id = 201C5B55-E759-492F-8F20-A529C7C02C8B
     data = {"__CLASS__":"Person","data":{"name":"Homer Simpson","spouse":{"$ref":"05A8D61C-6139-4F51-A748-101010CC8B02.data"}},"id":"201C5B55-E759-492F-8F20-A529C7C02C8B","root":true}

       id = 05A8D61C-6139-4F51-A748-101010CC8B02
     data = {"__CLASS__":"Person","data":{"name":"Marge Simpson","spouse":{"$ref":"201C5B55-E759-492F-8F20-A529C7C02C8B.data"}},"id":"05A8D61C-6139-4F51-A748-101010CC8B02","root":true}

You'll notice the spouse field has a JSON object with a $ref field inside it holding the UUID of the target object.

When data is loaded KiokuDB queues up references to unloaded objects and then loads them in order to materialize the memory resident object graph.

If you're curious about why the data is represented this way, this format is called JSPON, or JavaScript Persistent Object Notation (http://www.jspon.org/). When using KiokuDB::Backend::Storable the KiokuDB::Entry and KiokuDB::Reference objects are serialized with their storable hooks instead.

OBJECT SETS

More complex relationships (not necessarily 1 to 1) are usually easy to model with Set::Object.

Let's extend the Person class to add such a relationship:

    package Person;

    has children => (
        does => "KiokuDB::Set",
        is   => "rw",
    );

KiokuDB::Set objects are KiokuDB specific wrappers for Set::Object.

    my @kids = map { Person->new( name => $_ ) } qw(maggie lisa bart);

    use KiokuDB::Util qw(set);

    my $set = set(@kids);

    $homer->children($set);

    $dir->store($homer);

The set convenience function creates a new KiokuDB::Set::Transient object. A transient set is one which started its life in memory space (as opposed to a set that was loaded from the database).

The weak_set convenience function also exists, creating a transient set with Set::Object::Weak used internally to help avoid circular structures (for instance if setting a parent attribute in our example).

The set object behaves pretty much like a normal Set::Object:

    my @kids = $dir->lookup($homer_id)->children->members;

The main difference is that sets coming from the database are deferred by default, that is the objects in @kids are not loaded until they are actually needed.

This allows large object graphs to exist in the database, while only being partially loaded, without breaking the encapsulation of user objects. This behavior is implemented in KiokuDB::Set::Deferred and KiokuDB::Set::Loaded.

This set object is optimized to make most operations defer loading. For instance, if you intersect two deferred sets, only the members of the intersection set will need to be loaded.

THE TYPEMAP

Storing an object with KiokuDB involves passing it to KiokuDB::Collapser, the object that "flattens" objects into KiokuDB::Entry before the entries are inserted into the backend.

The collapser uses a KiokuDB::TypeMap object that tells it how objects of each type should be collapsed.

During retrieval of objects the same typemap is used to reinflate objects back into working objects.

Trying to store an object that is not in the typemap is an error. The reason behind this is that it doesn't make sense to store every type of object (for instance DBI handles need a socket, objects based on XS modules have an internal pointer as an integer, whose address won't be valid the next time it's loaded), and even though the majority of objects are safe to serialize, even a small bit of unreported fragility is usually enough to create large, hard to debug problems.

An exception to this rule is Moose based objects, because they have sufficient meta information available through Moose's powerful reflection support in order to be safely serialized.

Additionally, the standard backends provide a default typemap for common objects (DateTime, Path::Class, etc), which by default is merged with any custom typemap you pass to KiokuDB.

So, in order to actually get KiokuDB to store things like Class::Accessor based objects, you can do something like this:

    KiokuDB->new(
        backend => $backend,
        allow_classes => [qw(My::Object)],
    );

Which is shorthand for:

    my $dir = KiokuDB->new(
        backend => $backend,
        typemap => KiokuDB::TypeMap->new(
            entries => {
                "My::Object" => KiokuDB::TypeMap::Entry::Naive->new,
            },
        ),
    );

KiokuDB::TypeMap::Entry::Naive is a type map entry that performs naive collapsing of the object, by simply walking it recursively.

When the collapser encounters an object it will ask KiokuDB::TypeMap::Resolver for a collapsing routine based on the class of the object.

This lookup is typically performed by ref $object, not using inheritance, because a typemap entry that is safe to use with a superclass isn't necessarily safe to use with a subclass. If you do want inherited entries, specify isa_entries:

    KiokuDB::TypeMap->new(
        isa_entries => {
            "My::Object" => KiokuDB::TypeMap::Entry::Naive->new,
        },
    );

If no normal (ref keyed) entry is found for an object, the isa entries are searched for a superclass of that object. Subclass entries are tried before superclass entries. The result of this lookup is cached, so it only happens once per class.

Typemap Entries

If you want to do custom serialization hooks, you can specify hooks to collapse your object:

    KiokuDB::TypeMap::Entry::Callback->new(
        collapse => sub {
            my $object = shift;

            ...

            return @some_args;
        },
        expand => sub {
            my ( $class, @some_args ) = @_;

            ...

            return $object;
        },
    );

These hooks are called as methods on the object to be collapsed.

For instance the Path::Class related typemap ISA entry is:

    'Path::Class::Entity' => KiokuDB::TypeMap::Entry::Callback->new(
        intrinsic => 1,
        collapse  => "stringify",
        expand    => "new",
    );

The intrinsic flag is discussed in the next section.

Another option for typemap entries is KiokuDB::TypeMap::Entry::Passthrough, which is appropriate when you know the backend's serialization can handle that data type natively.

For example, if your object has a Storable hook which you know is appropriate (e.g. contains no sub objects that need to be collapsible) and your backend uses KiokuDB::Backend::Serialize::Storable. DateTime is an example of a class with such storable hopes:

    'DateTime' => KiokuDB::Backend::Entry::Passthrough->new( intrinsic => 1 )

Intrinsic vs. First Class

In KiokuDB every object is normally assigned an ID, and if the object is shared by several objects this relationship will be preserved.

However, for some objects this is not the desired behavior. These are objects that represent values, like DateTime, Path::Class entries, URI objects, etc.

KiokuDB can be asked to collapse such objects intrinsicly, that is instead of creating a new KiokuDB::Entry with its own ID for the object, the object gets collapsed directly into its parent's structures.

This means that shared references that are collapsed intrinsically will be loaded back from the database as two distinct copies, so updates to one will not affect the other.

For instance, when we run the following code:

    use Path::Class;

    my $path = file(qw(path to foo));

    $obj_1->file($path);

    $obj_2->file($path);

    $dir->store( $obj_1, $obj_2 );

While the following is true when the data is being inserted, it will no longer be true when $obj_1 and $obj_2 are loaded from the database:

    refaddr($obj_1->file) == refaddr($obj_2->file)

This is because both $obj_1 and $obj_2 each got its own copy of $path.

This behavior is usually more appropriate for objects that aren't mutated, but are instead cloned and replaced, and for which creating a first class entry in the backend with its own ID is undesired.

The Default Typemap

Each backend comes with a default typemap, with some built in entries for common CPAN modules' objects. KiokuDB::TypeMap::Default contains more details.

SIMPLE SEARCHES

Most backends support an inefficient but convenient simple search, which scans the entries and matches fields.

If you want to make use of this API we suggest using KiokuDB::Backend::DBI since simple searching is implemented using an SQL where clause, which is much more efficient (you do have to set up the column manually though).

Calling the search method with a hash reference as the only argument invokes the simple search functionality, returning a Data::Stream::Bulk with the results:

    my $stream = $dir->search({ name => "Homer Simpson" });

    while ( my $block = $stream->next ) {
        foreach my $object ( @$block ) {
            # $object->name eq "Homer Simpson"
       }
    }

This exact API is intentionally still underdefined. In the future it will be compatible with DBIx::Class 0.09's syntax.

DBI SEARCH COLUMNS

In order to make use of the simple search API we need to configure columns for our DBI backend.

Let's create a 'name' column to search by:

    my $dir = KiokuDB->connect(
        "dbi:SQLite:dbname=foo",
        columns => [
            # specify extra columns for the 'entries' table
            # in the same format you pass to DBIC's add_columns

            name => {
                data_type => "varchar",
                is_nullable => 1, # probably important
            },
        ],
    );

You can either alter the schema manually, or use kioku dump to back up your data, delete the database, connect with create => 1 and then use kioku load.

To populate this column we'll need to load Homer and update him:

    {
        my $s = $dir->new_scope;
        $dir->update( $dir->lookup( $homer_id ) );
    }

And this is what it looks in the database:

       id = 201C5B55-E759-492F-8F20-A529C7C02C8B
     name = Homer Simpson

GETTING STARTED WITH BDB

The most mature backend for KiokuDB is KiokuDB::Backend::BDB. It performs very well, and supports many features, like Search::GIN integration to provide customized indexing of your objects and transactions.

KiokuDB::Backend::DBI is newer and not as tested, but also supports transactions and Search::GIN based queries. It performs quite well too, but isn't as fast as KiokuDB::Backend::BDB.

Installing KiokuDB::Backend::BDB

KiokuDB::Backend::BDB needs the BerkeleyDB module, and a recent version of Berkeley DB itself, which can be found here: http://www.oracle.com/technology/software/products/berkeley-db/db/index.html.

BerkeleyDB (the library) normally installs into /usr/local/BerkeleyDB.4.7, while BerkeleyDB (the module) looks for it in /usr/local/BerkeleyDB, so adding a symbolic link should make installation easy.

Once you have BerkeleyDB installed, KiokuDB::Backend::BDB should install without problem and you can use it with KiokuDB.

Using KiokuDB::Backend::BDB

To use the BDB backend we must first create the storage. To do this the create flag must be passed:

    my $backend = KiokuDB::Backend::BDB->new(
        manager => {
            home   => Path::Class::Dir->new(qw(path to storage)),
            create => 1,
        },
    );

The BDB backend uses BerkeleyDB::Manager to do a lot of the BerkeleyDB gruntwork. The BerkeleyDB::Manager object will be instantiated using the arguments provided in the manager attribute.

Now that the storage is created we can make use of this backend, much like before:

    my $dir = KiokuDB->new( backend => $backend );

Subsequent opens will not require the create argument to be true, but it doesn't hurt.

This connect call is equivalent to the above:

    my $dir = KiokuDB->connect( "bdb:dir=path/to/storage", create => 1 );

TRANSACTIONS

Some backends (ones which do the KiokuDB::Backend::Role::TXN role) can be used with transactions.

If you are familiar with DBIx::Class this should be very familiar:

    $dir->txn_do(sub {
        $dir->store($obj);
    });

This will create a BerkeleyDB level transaction, and all changes to the database are committed if the block was executed cleanly.

If any error occurred the transaction will be rolled back, and the changes will not be visible to subsequent reads.

Note that KiokuDB does not touch live instances, so if you do something like

    $dir->txn_do(sub {
        my $scope = $dir->new_scope;

        $obj->name("Dancing Hippy");
        $dir->store($obj);

        die "an error";
    });

the name attribute is not rolled back, it is simply the store operation that gets reverted.

Transactions will nest properly, and with most backends they generally increase write performance as well.

QUERIES

KiokuDB::Backend::BDB::GIN is a subclass of KiokuDB::Backend::BDB that provides Search::GIN integration.

Search::GIN is a framework to index and query objects, inspired by Postgres' internal GIN api. GIN stands for Generalized Inverted Indexes.

Using Search::GIN arbitrary search keys can be indexed for your objects, and these objects can then be looked up using queries.

For instance, one of the pre canned searches Search::GIN supports out of the box is class indexing. Let's use Search::GIN::Extract::Callback to do custom indexing of our objects:

    my $dir = KiokuDB->new(
        backend => KiokuDB::Backend::BDB::GIN->new(
            extract => Search::GIN::Extract::Callback->new(
                extract => sub {
                    my ( $obj, $extractor, @args ) = @_;

                    if ( $obj->isa("Person") ) {
                        return {
                            type => "user",
                            name => $obj->name,
                        };
                    }

                    return;
                },
            ),
        ),
    );

    $dir->store( @random_objects );

To look up the objects, we use the a manual key lookup query:

    my $query = Search::GIN::Query::Manual->new(
        values => {
            type => "person",
        },
    );

    my $stream = $dir->search($query);

The result is Data::Stream::Bulk object that represents the search results. It can be iterated as follows:

    while ( my $block = $stream->next ) {
        foreach my $person ( @$block ) {
            print "found a person: ", $person->name;
        }
    }

Or even more simply, if you don't mind loading the whole resultset into memory:

    my @people = $stream->all;

Search::GIN is very much in its infancy, and is very under documented. However it does work for simple searches such as this and contains pre canned solutions like Search::GIN::Extract::Class.

In short, it works today, but watch this space for new developments.