Hash::Util::FieldHash - Associate references with data
use Hash::Util qw(fieldhash fieldhashes); # Create a single field hash fieldhash my %foo; # Create three at once... fieldhashes \ my(%foo, %bar, %baz); # ...or any number fieldhashes @hashrefs;
Two functions generate field hashes:
fieldhash %hash;
Creates a single field hash. The argument must be a hash. Returns a reference to the given hash if successful, otherwise nothing.
fieldhashes @hashrefs;
Creates any number of field hashes. Arguments must be hash references. Returns the converted hashrefs in list context, their number in scalar context.
Field hashes have three basic features:
If a reference is used as a field hash key, it is replaced by the integer value of the reference address.
In a new thread a field hash is updated so that its keys reflect the new reference addresses of the original objects.
When a reference goes stale after having been used as a field hash key, the hash entry will be deleted.
Field hashes are designed to maintain an association of a reference with a value. The association is independent of the bless status of the key, it is thread safe and garbage-collected. These properties are desirable in the construction of inside-out classes.
When used with keys that are plain scalars (not references), field hashes behave like normal hashes.
The association of a reference (namely an object) with a value is central to the concept of inside-out classes. These classes don't store the values of object variables (fields) inside the object itself, but outside, as it were, in private hashes keyed by the object.
Normal hashes can be used for the purpose, but turn out to have some disadvantages:
The stringification of references depends on the bless status of the reference. A plain hash reference $ref may stringify as HASH(0x1801018), but after being blessed into class foo the same reference will look like as foo=HASH(0x1801018), unless class foo overloads stringification, in which case it may show up as wurzelzwerg. In a normal hash, the stringified reference wouldn't be found again after the blessing.
$ref
HASH(0x1801018)
foo
foo=HASH(0x1801018)
wurzelzwerg
Bypassing stringification by use of Scalar::Util::refaddr has been used to correct this. Field hashes automatically stringify their keys to the reference address in decimal.
Scalar::Util::refaddr
When a new thread is created, the Perl interpreter is cloned, which implies that all variables change their reference address. Thus, in a daughter thread, the "same" reference $ref contains a different address, but the cloned hash still holds the key based on the original address. Again, the association is broken.
A CLONE method is required to update the hash on thread creation. Field hashes come with an appropriate CLONE.
CLONE
When a reference (an object) is used as a hash key, the entry stays in the hash when the object eventually goes out of scope. That can result in a memory leak because the data associated with the object is not freed. Worse than that, it can lead to a false association if the reference address of the original object is later re-used. This is not a remote possibility, address re-use happens all the time and is a certainty under many conditions.
If the references in question are indeed objects, a DESTROY method must clean up hashes that the object uses for storage. Special methods are needed when unblessed references can occur.
DESTROY
Field hashes have garbage collection built in. If a reference (blessed or unblessed) goes out of scope, corresponding entries will be deleted from all field hashes.
Thus, an inside-out class based on field hashes doesn't need a DESTROY method, nor a CLONE method for thread support. That facilitates the construction considerably.
Traditionally, the definition of an inside-out class contains a bare block inside which a number of lexical hashes are declared and the basic accessor methods defined, usually through Scalar::Util::refaddr. Further methods may be defined outside this block. There has to be a DESTROY method and, for thread support, a CLONE method.
When field hashes are used, the basic structure reamins the same. Each lexical hash will be made a field hash. The call to refaddr can be omitted from the accessor methods. DESTROY and CLONE methods are not necessary.
refaddr
If you have an existing inside-out class, simply making all hashes field hashes with no other change should make no difference. Through the calls to refaddr or equivalent, the field hashes never get to see a reference and work like normal hashes. Your DESTROY (and CLONE) methods are still needed.
To make the field hashes kick in, it is easiest to redefine refaddr as
sub refaddr { shift }
instead of importing it from Scalar::Util. It should now be possible to disable DESTROY and CLONE. Note that while it isn't disabled, DESTROY will be called before the garbage collection of field hashes, so it will be invoked with a functional object and will continue to function.
Scalar::Util
It is not desirable to import the functions fieldhash and/or fieldhashes into every class that is going to use them. They are only used once to set up the class. When the class is up and running, these functions serve no more purpose.
fieldhash
fieldhashes
If there are only a few field hashes to declare, it is simplest to
use Hash::Util::FieldHash;
early and call the functions qualified:
Hash::Util::FieldHash::fieldhash my %foo;
Otherwise, import the functions into a convenient package like HUF or, more generic, Aux
HUF
Aux
{ package Aux; use Hash::Util::FieldHash ':all'; }
and call
Aux::fieldhash my %foo;
as needed.
Well... really only one example, and a rather trivial one at that. There isn't much to exemplify.
The following example shows an utterly simple inside-out class TimeStamp, created using field hashes. It has a single field, incorporated as the field hash %time. Besides new it has only two methods: an initializer called stamp that sets the field to the current time, and a read-only accessor when that returns the time in localtime format.
TimeStamp
%time
new
stamp
when
localtime
# The class TimeStamp use Hash::Util::FieldHash; { package TimeStamp; Hash::Util::FieldHash::fieldhash my %time; sub stamp { $time{ $_[ 0]} = time; shift } # initializer sub when { scalar localtime $time{ shift()} } # read accessor sub new { bless( do { \ my $x }, shift)->stamp } # creator } # See if it works my $ts = TimeStamp->new; print $ts->when, "\n";
Remarkable about this class definition is what isn't there: there is no DESTROY method, inherited or local, and no CLONE method is needed to make it thread-safe. Not to mention no need to call refaddr or something similar in the accessors.
The outstanding property of inside-out classes is their "inheritability". Like all inside-out classes, TimeStamp is a universal base class. We can put it on the @ISA list of arbitrary classes and its methods will just work, no matter how the host class is constructed. No traditional Perl class allows that. The following program demonstrates the feat:
@ISA
# Make a sample of objects to add time stamps to. use Math::Complex; use IO::Handle; my @objects = ( Math::Complex->new( 12, 13), IO::Handle->new(), qr/abc/, # in class Regexp bless( [], 'Boing'), # made up on the spot # add more ); # Prepare for use with TimeStamp for ( @objects ) { no strict 'refs'; push @{ ref() . '::ISA' }, 'TimeStamp'; } # Now apply TimeStamp methods to all objects and show the result for my $obj ( @objects ) { $obj->stamp; report( $obj, $obj->when); } # print a description of the object and the result of ->when use Scalar::Util qw( reftype); sub report { my ( $obj, $when) = @_; my $msg = sprintf "This is a %s object(a %s), its time is %s", ref $obj, reftype $obj, $when; $msg =~ s/\ba(?= [aeiouAEIOU])/an/g; # grammar matters :) print "$msg\n"; }
Garbage collection in a field hash means that entries will "spontaneously" disappear when the object that created them disappears. That must be borne in mind, especially when looping over a field hash. If anything you do inside the loop could cause an object to go out of scope, a random key may be deleted from the hash you are looping over. That can throw the loop iterator, so it's best to cache a consistent snapshot of the keys and/or values and loop over that. You will still have to check that a cached entry still exists when you get to it.
Garbage collection can be confusing when keys are created in a field hash from normal scalars as well as references. Once a reference is used with a field hash, the entry will be collected, even if it was later overwritten with a plain scalar key (every positive integer is a candidate). This is true even if the original entry was deleted in the meantime. In fact, deletion from a field hash, and also a test for existence constitute use in this sense and create a liability to delete the entry when the reference goes out of scope. If you happen to create an entry with an identical key from a string or integer, that will be collected instead. Thus, mixed use of references and plain scalars as field hash keys is not entirely supported.
To make Hash::Util::FieldHash work, there were two changes to perl itself. PERL_MAGIC_uvar was made avaliable for hashes, and weak references now call uvar get magic after a weakref has been cleared. The first feature is used to make field hashes intercept their keys upon access. The second one triggers garbage collection.
Hash::Util::FieldHash
PERL_MAGIC_uvar
get
PERL_MAGIC_uvar get magic is called from hv_fetch_common and hv_delete_common through the function hv_magic_uvar_xkey, which defines the interface. The call happens for hashes with "uvar" magic if the ufuncs structure has equal values in the uf_val and uf_set fields. Hashes are unaffected if (and as long as) these fields hold different values.
hv_fetch_common
hv_delete_common
hv_magic_uvar_xkey
ufuncs
uf_val
uf_set
Upon the call, the mg_obj field will hold the hash key to be accessed. Upon return, the SV* value in mg_obj will be used in place of the original key in the hash access. The integer index value in the first parameter will be the action value from hv_fetch_common, or -1 if the call is from hv_delete_common.
mg_obj
SV*
action
This is a template for a function suitable for the uf_val field in a ufuncs structure for this call. The uf_set and uf_index fields are irrelevant.
uf_index
IV watch_key(pTHX_ IV action, SV* field) { MAGIC* mg = mg_find(field, PERL_MAGIC_uvar); SV* keysv = mg->mg_obj; /* Do whatever you need to. If you decide to supply a different key newkey, return it like this */ sv_2mortal(newkey); mg->mg_obj = newkey; return 0; }
When a weak reference is stored in an SV that has "uvar" magic, set magic is called after the reference has gone stale. This hook can be used to trigger further garbage-collection activities associated with the referenced object.
SV
set
The three features of key hashes, key replacement, thread support, and garbage collection are supported by a data structure called the object registry. This is a private hash where every object is stored. An "object" in this sense is any reference (blessed or unblessed) that has been used as a field hash key.
The object registry keeps track of references that have been used as field hash keys. The keys are generated from the reference address like in a field hash (though the registry isn't a field hash). Each value is a weak copy of the original reference, stored in an SV that is itself magical (PERL_MAGIC_uvar again). The magical structure holds a list (another hash, really) of field hashes that the reference has been used with. When the weakref becomes stale, the magic is activated and uses the list to delete the reference from all field hashes it has been used with. After that, the entry is removed from the object registry itself. Implicitly, that frees the magic structure and the storage it has been using.
Whenever a reference is used as a field hash key, the object registry is checked and a new entry is made if necessary. The field hash is then added to the list of fields this reference has used.
The object registry is also used to repair a field hash after thread cloning. Here, the entire object registry is processed. For every reference found there, the field hashes it has used are visited and the entry is updated.
# test if %hash is a field hash my $result = _fieldhash \ %hash, 0; # make %hash a field hash my $result = _fieldhash \ %hash, 1;
_fieldhash is the internal function used to create field hashes. It takes two arguments, a hashref and a mode. If the mode is boolean false, the hash is not changed but tested if it is a field hash. If the hash isn't a field hash the return value is boolean false. If it is, the return value indicates the mode of field hash. When called with a boolean true mode, it turns the given hash into a field hash of this mode, returning the mode of the created field hash. _fieldhash does not erase the given hash.
_fieldhash
Currently there is only one type of field hash, and only the boolean value of the mode makes a difference, but that may change.
Anno Siegel, <anno4000@zrz.tu-berlin.de>
Copyright (C) 2006 by (Anno Siegel)
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.
To install Env, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Env
CPAN shell
perl -MCPAN -e shell install Env
For more information on module installation, please visit the detailed CPAN module installation guide.