The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Ref::Store - Store objects, index by object, tag by objects - all without leaking.

SYNOPSIS

        my $table = Ref::Store->new();
        

Store a value under a simple string key, maintain the value as a weak reference. The string key will be deleted when the value is destroyed:

        $table->store("key", $object);

Store $object under a second index ($fh), which is a globref; $fh will automatically be garbage collected when $object is destroyed.

        {
                open my $fh, ">", "/foo/bar";
                $table->store($fh, $object, StrongKey => 1);
        }
        # $fh still exists with a sole reference remaining in the table
        

Register an attribute type (foo_files), and tag $fh as being one of $foo_files, $fh is still dependent on $object

        # assume $fh is still in scope
        
        $table->register_kt("foo_files");
        $table->store_a(1, "foo_files", $fh);

Store another foo_file

        open my $fh2, ">", "/foo/baz"
        $table->store_a(1, "foo_files", $fh);
        # $fh2 will automatically be deleted from the table when it goes out of scope
        # because we did not specify StrongKey
        

Get all foo_files

        my @foo_files = $table->fetch_a(1, "foo_files");
        
        # @foo_files contains ($fh, $fh2);
        

Get rid of $object. This can be done in one of the following ways:

        # Implicit garbage collection
        undef $object;
        
        # Delete by value
        $table->purge($object);
        
        # Delete by key ($fh is still stored under the foo_keys attribute)
        $table->purgeby($fh);

        # remove each key for the $object value
        $table->unlink("key");
        $table->unlink($fh); #fh still exists under "foo" files
        

Get rid of foo_file entries

        # delete, by attribute
        $table->purgeby_a(1, "foo_files");
        
        # delete a single attribute from all entries
        $table->unlink_a(1, "foo_files");
        
        # dissociate the 'foo_files' attribtue from each entry
        $table->dissoc_a(1, "foo_files", $fh);
        $table->dissoc_a(1, "foo_files", $fh2);
        
        # implicit garbage collection:
        undef $fh;
        undef $fh2;

For a more detailed walkthrough, see Ref::Store::Walkthrough

DESCRIPTION

Ref::Store provides an efficient and worry-free way to index objects by arbitrary data - possibly other objects, simple scalars, or whatever.

It relies on magic and such to ensure that objects you put in the lookup table are not maintained there unless you want them to be. In other words, you can store objects in the table, and delete them without having to worry about what other possible indices/references may be holding down the object.

If you are looking for something to store Data, then direct your attention to KiokuDB, Tangram or Pixie - these modules will store your data.

However, if you are specifically wanting to maintain garbage-collected and reference counted perl objects, then this module is for you. continue reading.

FEATURES

The problem

I've had quite the difficult task of explaining what this module actually does.

Specifically, Ref::Store is intended for managing and establishing dependencies and lookups between perl objects, which at their most basic levels are opaque entities in memory that are reference counted.

The lifetime of an object ends when its reference count hits zero, meaning that no data structure or code is referring to it.

When a new object is created (normally done through bless), it has a reference count of 1. As more copies of the object reference are made, the reference count of the object increases.

What this also means is that each time an object is inserted into a hash as a value, its reference count increases - and as a result, the object (and all other objects which it contains) will not be destroyed until it is removed from all those hashes.

Perl core offers the concept of a weak reference, and is exposed to perl code via Scalar::Util's weaken. Weak references allow the maintaining of an object reference without actually increasing the object's reference count.

Internally (in the Perl core), the object maintains a list of all weak references referring to it, an when the object's own reference count hits zero, those weak references are changed to undef.

As this relates to hash entries, the value of the entry becomes undef, but the entry itself is not deleted.

        use Scalar::Util qw(weaken);
        my %hash;
        my $object = \do { my $o };
        weaken($hash{some_key} = $object);
        undef $object;
        
        #The following will be true:
        exists$hash{some_key} && $hash{some_key} == undef;
        

When iterating over the hash keys, one must then check to see if the value is undefined - which is often a messy solution.

If the hash's values are constantly being changed and updated, but not frequently iterated through, this can cause a significant memory leak.

Additionally, weakref semantics are cumbersome to deal with in pure perl. The weakness of a reference only applies to that specific variable; therefore, the general semantics of my $foo = $bar, where $foo is understood to be an exact replica of $bar are broken. If $bar is a weak reference, $foo does not retain such a property, and will increase the reference count of whatever foo and bar refer to.

Dealing with collection items, especially tied hashes and arrays becomes even more cumbersome. In some perls, doing the following with a tied hash will not work:

        weaken($foo);
        $tied_hash{some_key} = $foo;
        
        # $foo is copied to the tied hash's STORE method.
        

This bug has since been fixed, but not everyone has the luxury of using the newest and shiniest Perl.

Other modules and their featuresets

In the event that weak reference keys are needed, there are several modules that implement solutions:

Tie::RefHash::Weak is a pure-perl tied hash which maintains weak keys. However it will not necessarily do the same for values, even when weaken is used explicitly, because of aforementioned bugs. It is also significantly slow due to its TIE interface.

Hash::Util::FieldHash is part of core since perl 5.10, and depends on something known as Hash uvar magic, a new feature introduced in 5.10. It does not suffer from the slowness of Tie::RefHash::Weak, and allows you to (manually) create weak values. However, keys used for FieldHash are permanently associated with the hash they have been keyed to, even if the entry or hash have been deleted.

Hash::FieldHash attempts to eliminate the version dependencies and caveats of Hash::Util::FieldHash, but restricts the keys to only being references.

Ref::Store's featureset

Ref::Store supports all features in the above mentioned modules, and the following

By-Value garbage collection

When a value is stored as a weak reference (see the next section) and itsreference count hits zero, then the entire hash entry is deleted; the table does not collect stale entries

By-Value deletion

Unlike normal hashes, Ref::Store can quickly and efficiently delete a value and all its dependent keys given just the value. No need to remember any lookup key under which the value has been stored

Multi-Value keys (Attributes)

Traditional hashes only allow storing of a single value under a single key. For multi-value storage under a single key, one must fiddle with nested data structures.

Like single-value keys, attributes may be reference objects, and will be discarded from the hash when the last remaining value's reference count hits zero.

Additionally, the same reference attribute object can serve as a key for multiple independent value sets, allowing to logically decouple collections which are dependent on a single object.

User defined retention policy

For each key and value, it is possible to define whether either the key, the value, or both should be strong or weak references. Ref::Store by default stores keys and values as weak references, but can be modified on a per-operation basis.

Values can also be stored multiple times under multiple keys with different retention policies: maintain a single 'primary' index, and multiple 'secondary' indices.

Speed

Most of the features are implemented in C, some because of speed, and others because there was no pure-perl way to do it (See the PP backend, which is fairly buggy)

Works on perl 5.8

This has been tested and developed for perl 5.8.8 (which is the perl provided in el5).

Here are some caveats

No Hash Syntax

Ref::Store is a proper object. You cannot use the table as an actual hash (though it would be easy enough to wrap it in the tie interface, it would be signficantly slower, and suffer from other problems related to tie)

Slow/Untested thread cloning

While attempts have been made to make this module thread-safe, there are strange messages about leaked scalars and unbalanced string tables when dealing with threads.

Values Restricted to References

Values themselves must be reference objects. It's easy enough, however, to do

        $table->store("foo", \"foo");
        ${$table->fetch("foo")} eq "foo";
        
Slower than Hash::Util::FieldHash

This module is about 40% slower than Hash::Util::FieldHash

Must use Attribute API for multi-entry lookup objects

If you wish to use the same key twice, the key must be used with the Attribute API, and not the (faster) key API

API

LOOKUP TYPES

There are three common lookup types by which values can be indexed and mapped to.

A Lookup Type is just an identifier by which one can fetch and store a value. The uniqueness of identifiers is dependent on the lookup type. Performance for various lookup types varies.

Each lookup type has a small tag by which API functions pertaining to it can be identified

Value-specific operations

These functions take a value as their argument, and work regardless of the lookup type

purge($value)

Remove $value from the database. For all lookup types which are linked to $value, they will be removed from the database as well if they do not link to any other values

vexists($value)

Returns true if $value is stored in the database

Simple Key (SK)

This is the quickest and simplest key type. It can use either string or object keys. It support. The functions it supports are

store($key, $value, %options)

Store $value under lookup <$key>. Key can be an object reference or string.

A single value can be stored under multiple keys, but a single key can only be linked to a single value.

Options are two possible hash options:

StrongKey

If the key is an object reference, by default it will be weakened in the databse, and when the last reference outside the database is destroyed, an implicit "unlink" will be called on it. Setting StrongKey to true will disable this behavior and not weaken the key object.

A strong key is still deleted if its underlying value gets deleted

StrongValue

By default the value is weakened before it is inserted into the database, and when the last external reference is destroyed, an implicit "purge" is performed. Setting this to true will disable this behavior and not weaken the value object.

It is important to note the various rules and behaviors with key and value storage options.

There are two conditions under which an entry (key and value) may be deleted from the table. The first condition is when a key or value is a reference type, and its referrent goes out of scope; the second is when either a key or a value is explicitly deleted from the table.

It is helpful to think of entries as a miniature version of implicit reference counting. Each key represents an inherent increment in the value's reference count, and each key has a reference count of one, represented by the amount of values it actually stores.

Based on that principle, when either a key or a value is forced to leave the table (either explicitly, or because its referrant has gone out of scope), its dependent objects decrease in their table-based implicit references.

Consider the simple case of implicit deletion:

        {
                my $key = "string":
                my $value = \my $foo
                $table->store($key, $foo);
        }
        

In which case, the string "string" is deleted from the table as $foo goes out of scope.

The following is slightly more complex

        my $value = \my $foo;
        {
                my $key = \my $baz;
                $table->store($key, $value, StrongValue => 1);
        }
        

In this case, $value is removed from the table, because its key object's referrant ($baz) has gone out of scope. Even though StrongValue was specified, the value is not deleted because its own referrant ($foo) has been destroyed, but rather because its table-implicit reference count has gone down to 0 with the destruction of $baz

The following represents an inverse of the previous block

        my $key = \my $baz;
        {
                my $value = \my $foo;
                $table->store($key, $value, StrongKey => 1);
        }
        

Here $value is removed from the table because naturally, its referrant, $foo has been destroyed. StrongKey only maintains an extra perl reference to $baz.

However, by specifying both StrongKey and StrongValue, we are able to completely disable garbage collection, and nothing gets deleted

        {
                my $key = \my $baz;
                my $value = \my $foo;
                $table->store($key, $value, StrongKey => 1, StrongValue => 1);
        }

This method is also available as store_sk.

It is an error to call this method twice on the same lookup <-> value specification.

fetch($key)

Returns the value object indexed under $key, if any. Also available under fetch_sk

lexists($key)

Returns true if $key exists in the database. Also available as lexists_sk

unlink($key)

Removes $key from the database. If $key is linked to a value, and that value has no other keys linked to it, then the value will also be deleted from the databse. Also available as unlink_sk

        $table->store("key1", $foo);
        $table->store("key2", $foo);
        $table->store("key3", $bar);
        
        $table->unlink("key1"); # $foo is not deleted because it exists under "key2"
        $table->unlink("key3"); # $bar is deleted because it has no remaining lookups
        
purgeby($key)

If $key is linked to a value, then that value is removed from the database via "purge". Also available as purgeby_sk.

These two blocks are equivalent:

        # 1
        my $v = $table->fetch($k);
        $table->purge($v);
        
        # 2
        $table->purgeby($k);
Typed Keys

Typed keys are like simple keys, but with more flexibility. Whereas a simple key can only store associate any string with a specific value, typed keys allow for associating the same string key with different values, so long as the type is different. A scenario when this is useful is associating IDs received from different libraries, which may be identical, to different values.

For instance:

        use Library1;
        use Library1;
        
        my $hash = Ref::Store->new();
        $hash->register_kt('l1_key');
        $hash->register_kt('l2_key');
        
        #later on..
        my $l1_value = Library1->get_handle();
        my $l2_value = Library2->get_handle();
        
        #assume that this is possible:
        
        $l1_value->ID == $l2_value->ID();
        
        $hash->store_kt($l1_value->ID(), 'l1_key', $l1_value);
        $hash->store_kt($l2_value->ID(), 'l2_key', $l2_value);

Note that this will only actually work for string keys. Object keys can still only be unique to a single value at a time.

All functions described for "Simple Keys" are identical to those available for typed keys, except that the $key argument is transformed into two arguments;

thus:

        store_kt($key, $type, $value);
        fetch_kt($key, $type);

and so on.

In addition, there is a function which must be used to register key types:

register_kt($ktype, $id)

Register a keytype. $ktype is a constant string which is the type, and $id is a unique identifier-prefix (which defaults to $ktype itself)

Attributes

Whereas keys map value objects according to their identities, attributes map objects according to arbitrary properties or user defined tags. Hence an attribute allows for a one-to-many relationship between a lookup index and its corresponding value.

The common lookup API still applies. Attributes must be typed, and therefore all attribute functions must have a type as their second argument.

A suffix of _a is appended to all API functions. In addition, the following differences in behavior and options exist

store_a($attr, $type, $value, %options)

Like "store", but option hash takes a StrongAttr option instead of a StrongKey option, which is the same. Attributes will be weakened for all associated values if StrongAttr was not specified during any insertion operation.

fetch_a($attr, $type)

Fetch function returns an array of values, and not a single value.

thus:

        my $value = $hash->fetch($key);
        #but
        my @values = $hash->fetch_a($attr,$type);
        

However, storing an attribute is done only one value at a time.

dissoc_a($attr, $type, $value)

Dissociates an attribute lookup from a single value. This function is special for attributes, where a single attribute can be tied to more than a single value.

Removes the attribtue from the database. Since multiple values can be tied to the same attribute, this can potentially remove many values from the DB. Be sure to use this function with caution

It is possible to use attributes as tags for boolean values or flags, though the process right now is somewhat tedious (eventually this API will be extended to allow less boilerplate)

        use constant ATTR_FREE => "attr_free";
        use constant ATTR_BUSY => "attr_busy";
        
        $hash->register_kt(ATTR_FREE);
        $hash->register_kt(ATTR_BUSY);
        
        $hash->store_a(1, ATTR_FREE, $value); #Value is now tagged as 'free';
        
        #to mark the value as busy, be sure to inclusively mark the busy tag first,
        #and then remove the 'free' mark. otherwise the value will be seen as destroyed
        #and associated references removed:
        
        $hash->store_a(1, ATTR_BUSY, $value);
        $hash->dissoc_a(1, ATTR_FREE, $value);
        
        #mark as free again:
        
        $hash->store_a(1, ATTR_FREE, $value);
        $hash->dissoc_a(1, ATTR_BUSY, $value);
        

The complexities come from dealing with a triadic value for a tag. A tag for a value can either be true, false, or unset. so 0, ATTR_FREE is valid as well.

CONSTRUCTION

new(%options)

Creates a new Ref::Store object. It takes a hash of options:

keyfunc

only in PP backend

This function is responsible for converting a key to something 'unique'. The default implementation checks to see whether the key is a reference, and if so uses its address, otherwise it uses the stringified value. It takes the user key as its argument

Ref::Store will try and select the best implementation (Ref::Store::XS and Ref::Store::PP, in that order). You can override this by seting $Ref::Store::SelectedImpl to a package of your choosing (which must be loaded).

ITERATION

While Ref::Store is not a simple collection object, you can still iterate over values using the following formats

        #Like keys %hash:
        
        my @keyspecs = $refstore->klist();
        my @values       = $refstore->vlist();
        
        #Like each %hash;
        
        $refstore->iterinit(); #Initialize the iterator
        while ( my ($lookup_type, $lookup_prefix, $my_key, $my_value) = $refstore->iter )
        {
                #do stuff.. but don't modify the list!
        }
        #Cleanup internal iterator state
        $refstore->iterdone();

Because of the various types of keys which can be stored, all iteration functions return a 'key specification' - a list of three elements:

Lookup Type

This is one of REF_STORE_KEY for key lookups, and REF_STORE_ATTRIBUTE for attribute lookups (depending on whether this was stored with store or store_a).

The type is necessary to determine which lookup function to call in the event that a more detailed operation is needed.

Prefix/Type

For store functions which use a type (store_kt and store_a), this is the type. Useful to determine the parameter to call for further operations

User Key

This is the actual key which was passed in to the store function. This can be a string or a reference.

A simpler API which copies its return values in lists are provided:

vlist()

Returns a list of all value objects. This list is a copy.

klist()

Returns a list of arrayrefs containing key specifications

    A more complex iterator API is provided. Unlike the simpler API, this does not copy its return values, thus saving memory

    iterinit()

    Initialize the internal iterator state. This must be called each time before anything else happens. Like perl hashes, Ref::Store maintains a global iterator state; calling iterinit resets it.

    It is an error to initialize a possibly active iterator without explicitly destroying the previous one. Also, modifying the table during iteration could lead to serious problems. Unlike perl hashes, it is not safe to delete the currently iterated-over element; this is because the iteration is emulated and gathered from multiple hash states, and deletion of one element can trigger subsequent deletions.

    If you happen to clobber an old iterator, this module will throw a warning. To avoid accidentally doing so, call "iterdone" after you have finished iterating.

    iter()

    Compare to each. Returns a 4-element list, the first three of which are the key specification, and the last is a value specification. For attribute lookups, this is an arrayref of values indexed under the attribute, and for key lookups this is the sole value.

    Once this function returns the empty list, there are no more pairs left to traverse.

    iterdone()

    Call this function if you break out of a loop, or to explicitly de-initialize the iterator state.

DEBUGGING/INFORMATIONAL

Often it is helpful to know what the table is holding and indexing, possibly because there is a bug or because you have forgotten to delete something.

The following functions are available for debugging

vexists($value)

Returns true if $value exists in the database. The database internally maintains a hash of values. When functioning properly, a value should never exist without a key lookup, but this is still alpha software

vlookups($value)

not yet implemented

Returns an array of stringified lookups for which this value is registered

lexists(K)

Returns true if the lookup K exists. See the "API" section for lookup-specific parameters for K

is_empty

Returns true if there are no lookups and no values in the database

dump

Prints a tree-like representation of the database. This will recurse the entire database and print information about all values and all lookup types. In addition, for object references, it will print the reference address in decimal and hexadecimal, the actual SV address of the reference, and whether the reference is a weak reference.

THREAD SAFETY

Ref::Store is tested as being threadsafe the XS backend.

Due to tricky limitations and ninja-style coding necessary to properly facilitate threads, thread-safety is not supported in the PP backend, though YMMV (on my machine, PP thread tests segfault).

Thread safety is quite difficult since reference objects are keyed by their memory addresses, which change as those objects are duplicated.

USAGE APPLICATIONS

This module caters to the common, but very narrow scope of opaque perl references. It cares nothing about what kind of objects you are using as keys or values. It will never dereference your object or call any methods on it, thus the only requirement for an object is that it be a perl reference.

Using this module, it is possible to create arbitrarily complex, but completely leak free dependency and relationship graphs between objects.

Sometimes, expressing object lifetime dependencies in terms of encapsulation is not desirable. Circular references can happen, and the hierarchy can become increasingly complex and deep - not just logically, but also syntactically.

This becomes even more unwieldy when the same object has various sets of dependants.

A good example would be a socket server proxy, which accepts requests from clients, opens a second connection to a third-party server to process the client request, forwards the request to the client's intended origin server, and then finally relays the response back to the client.

For most simple applications there is no true need to have multiple dynamically associated and deleted object entries. The benefits of this module become apparent in design and ease of use when larger and more complex, event-oriented systems are in use.

In shorter terms, this module allows you to reliably use a Single Source Of Truth for your object lookups. There is no need to synchronize multiple lookup tables to ensure that there are no dangling references to an object you should have deleted

BUGS AND CAVEATS

Probably many.

The XS backend (which is also the default) is the more stable version. The pure- perl backend is more of a reference implementation which has come to lag behind its XS brother because of magical voodoo not possible in a higher level language like Perl.

Your system will need a working C compiler which speaks C99. I have not tested this on

  • It is not advisable to store one table within another. Doing so may cause strange things to happen. It would make more sense to have a table being module-global anyway, though. Ref::Store is really not a lightweight object.

  • It is currently not possible to store key objects with prefixes/types. If you need prefixed object keys, use the attribute API.

  • On my system, making a circular link between two objects will not work properly in terms of garbage collection; this means the following

            $foo = \do bless { my $o };
            $bar = \do bless { my $o };
            #both objects
            $rs->store($foo, $bar);
            $rs->store($bar, $foo);
            

    Will not work on the PP backend. It is fully supported in the default XS backend, however.

  • The iteration and klist APIs are not yet implemented in the PP version

  • Keyfunc and friends are not fully implemented, either

AUTHOR

Copyright (C) 2011 by M. Nunberg

You may use and distribute this program under the same terms as perl itself

SEE ALSO

Hash::Util::FieldHash

Ref::Store implements a superset of Hash::Util::FieldHash, but the latter is most likely quicker. However, it will only work with perls newer than 5.10

Tie::RefHash::Weak
Variable::Magic

Perl API for magic interface, used by the PP backend

KiokuDB, Tangram, Pixie

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 1485:

You forgot a '=back' before '=head2'

You forgot a '=back' before '=head2'

Around line 1617:

You forgot a '=back' before '=head1'

You forgot a '=back' before '=head1'