The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Data::BFDump - Class for dumping data structures in Breadth First order.

VERSION

Version '0.3'

SYNOPSIS

  use Data::BFDump;

  my $somevar=Some::Class->new();

  Data::BFDump->Dump([$somevar]);
  Data::BFDump->report;

  my $dumper=Data::BFDump->new();
  $dumper->Dump([$somevar]);
  $dumper->report;

DESCRIPTION

Data::BFDump is intended to be used for interpreting and understanding a data structure. Where Data::Dumper and Data::Dump do a depth first traversal of a data structure, Data::BFDump does a breadth first traversal. Often this produces a more intuitive looking structure as it ensures that a given sub element will be first mentioned or displayed at a position where its "path" from the root will be as short as possible.

WHEN TO USE THIS MODULE

This module is primarily design for dumping data structures so a developer can read them and understand them, in other words for analytical purposes. If you are looking for a dumping module for persistancy purposes this module is not ideal (although there is no reason you can't use it) as it will consume more memory and will be slower than everything else out there. On the other hand for particularly insane data structures it is my intention that this module shall be as accurate (see MUTABILITY) as it comes.

MUTABILITY

Currently Data::BFDump will df_dump the data structure in such a way that it is as mutable as possible. This means that all references to string constants are treated as references to variables equivelent to the string constant. If I can find a useful way to determine if a reference is indeed to an unmutable constant then I may change this behaviour.

NOTE Data::Dumper seems to behave in the opposite manner. References to variables containing a string constant are treated as references to a string constant meaning that some parts of an evaled df_dump end up being unmutable. This approach is however more attractive and simpler to understand.

Whats the difference to Data::Dumper?

Data::BFDump was written to make figuring out complicated data structures easier. Where Data::Dumper will descend as far into the data structure as possible BFDump ensures that an object is declared as close to root as possible. What does this mean? Well consider a class that models a collection of people and their friends, like this

        package Person;
        use strict;

        our %People;

        sub population { \%People };
        sub name  { $_[0]->[0] }
        sub named { $People{$_[1]} }

        sub new {
                my $class = shift;
                my $name  = shift;
                # There can only be one person with any given name
                $People{$name}=bless [ $name , {} ],$class
                        unless $People{$name};
                return $People{$name}
        }

        sub made_friend {
                my $self  =shift;
                my $friend=shift;
                $self->[1]->{$friend->name}=$friend;
                return $self;
        }
        1;

The hash %People stores all the people in the population keyed by name. Each person is represented as a blessed array containing their name, and a hash containing their friends names and references to them. (Admittedly this example is a bit contrived :-) So if we add in a little code to make some people and relationships like so

        my @names=("A".."D");
        for my $name (@names) {
                my $obj=Person->new($name);
        }
        for my $i (1..10) {
                Person->named($names[rand @names])->made_friend(Person->named($names[rand @names]));
        }

So now we want look at the population, which if we df_dump using Data::Dumper will produce something like the following deeply nested and (IMO) confusing example.

        $VAR1 = {
                  'A' => bless( [
                                  'A',
                                  {
                                    'C' => bless( [
                                                    'C',
                                                    {
                                                      'A' => []
                                                    }
                                                  ], 'Person' ),
                                    'D' => bless( [
                                                    'D',
                                                    {
                                                      'A' => [],
                                                      'B' => bless( [
                                                                      'B',
                                                                      {
                                                                        'B' => [],
                                                                        'C' => [],
                                                                        'D' => []
                                                                      }
                                                                    ], 'Person' ),
                                                      'C' => []
                                                    }
                                                  ], 'Person' )
                                  }
                                ], 'Person' ),
                  'B' => [],
                  'C' => [],
                  'D' => []
                };
        $VAR1->{'A'}[1]{'C'}[1]{'A'} = $VAR1->{'A'};
        $VAR1->{'A'}[1]{'D'}[1]{'A'} = $VAR1->{'A'};
        $VAR1->{'A'}[1]{'D'}[1]{'B'}[1]{'B'} = $VAR1->{'A'}[1]{'D'}[1]{'B'};
        $VAR1->{'A'}[1]{'D'}[1]{'B'}[1]{'C'} = $VAR1->{'A'}[1]{'C'};
        $VAR1->{'A'}[1]{'D'}[1]{'B'}[1]{'D'} = $VAR1->{'A'}[1]{'D'};
        $VAR1->{'A'}[1]{'D'}[1]{'C'} = $VAR1->{'A'}[1]{'C'};
        $VAR1->{'B'} = $VAR1->{'A'}[1]{'D'}[1]{'B'};
        $VAR1->{'C'} = $VAR1->{'A'}[1]{'C'};
        $VAR1->{'D'} = $VAR1->{'A'}[1]{'D'};

the statements at the end (which I refer to as 'fix statements') are eye-straining to say the least. Whereas BFDumper will produce something a little more intuitive like this

        do{
                my $HASH1             = {
                                             A => bless([
                                                             'A',
                                                             {
                                                                  C => '$HASH1->{C}',
                                                                  D => '$HASH1->{D}'
                                                             }
                                                   ],'Person'),
                                             B => bless([
                                                             'B',
                                                             {
                                                                  B => '$HASH1->{B}',
                                                                  C => '$HASH1->{C}',
                                                                  D => '$HASH1->{D}'
                                                             }
                                                   ],'Person'),
                                             C => bless([ 'C', { A => '$HASH1->{A}' } ],'Person'),
                                             D => bless([
                                                             'D',
                                                             {
                                                                  A => '$HASH1->{A}',
                                                                  B => '$HASH1->{B}',
                                                                  C => '$HASH1->{C}'
                                                             }
                                                   ],'Person')
                                        };
                $HASH1->{A}->[1]->{C} = $HASH1->{C};
                $HASH1->{A}->[1]->{D} = $HASH1->{D};
                $HASH1->{B}->[1]->{B} = $HASH1->{B};
                $HASH1->{B}->[1]->{C} = $HASH1->{C};
                $HASH1->{B}->[1]->{D} = $HASH1->{D};
                $HASH1->{C}->[1]->{A} = $HASH1->{A};
                $HASH1->{D}->[1]->{A} = $HASH1->{A};
                $HASH1->{D}->[1]->{B} = $HASH1->{B};
                $HASH1->{D}->[1]->{C} = $HASH1->{C};
                $HASH1;
        }

Here objects are printed out at the level that they are first mentioned, the fact that there is a collection of objects which are themselves interlinked, and the precise nature of that linkage is much easier to discern.

Funky Stuff

Data::BFDump can use the B::Deparse module to df_dump coderefs if they are present in your data. In fact this is currently the default behaviour.

So whats the catch?

Data::Dumper is faster and (currently :-) better tested and more flexible than Data::BFDump. Furthermore Data::BFDump necessarily has to make a parallel datastructure for anything it has to df_dump. This takes time and memory. On the other hand this extra pass allows Data::BFDump to be more precise than Data::Dumper in some situations, it also allows much more flexibility in terms of the way the data is presented and offers potential for other analytical tools. Unfortunately at present a lot of this is unutilized.

However I do intend to keep this module growing and to improve it as much as I can. I welcome feedback, improvements, bug reports, fixes, and especially new tests. ;-)

Future Plans

Very very soon I will be adding code to allow complicated data to be sliced up in such a way as to reduce forward references.

Eventualy I want to be able to support the full interface of Data::Dumper as well as the current Data::Dump style output.

CALLING CONVENTIONS

All of the documented methods (not new!) in DATA::BFDump can be called as an object method or as a class method. In the case where a class method call needs access to object base state information a singleton is used. Once created this singleton will not be destroyed until program termination. If you are dumping large data structures using class methods then you may want to call

  Data::BFDump->init();

To release that memory. OTOH this means that Data::BFDump->report; will work as expected.

If you create new dumper objects using new() and use them for your dumping you dont need to worry about the singleton.

METHODS

new()

Build a new object. Currently does not support parameters.

init()

Initializes the dumper back to empty. Same as calling _reset('ALL') Returns the object.

_reset()

Resets the object. Eventually this will be able to _reset various subsets of object attributes at once. Currently it should only be used with the parameter 'ALL', wherupon it completely _reset the internal state of the object

capture(LIST)

Captures a set of values inside of an array. The interesting thing is that dumping the following

  Data::BFDump->capture($x,$y,$z,$x)
  [$x,$y,$z,$x]

Will produce different results! (The first will show that the same variable has been passed twice, whereas the second will show that two variables with the same value had been passed. This is because the array returned will actually be a reference to a @_ which has special magic associated with it. This is the OO equivelent of

  sub capture{\@_};

Only provided for really weird analysis situations. (See merlyns Data::Dumper bug.)

uber_ref(ITEM)

This is an up market reference, that also identifies non references as well. In list context it returns a 5 element list

  ($reftype, $rid, $type, $class, $globname)

In scalar context it returns $reftype alone.

$reftype

The underlying type of the object. This may be GLOB SCALAR ARRAY HASH or an empty string for a non reference.

$rid

For a reference this is the numeric representation of the reference. For a non reference it is the numeric value of a reference to the given item. (Thus $id alone cannot tell you if you have a reference or not)

$type

This maybe be one of the following standard types GLOB SCALAR ARRAY HASH CODE REF, or one of the following special types OBJ, REGEXP. The result of a qr// will be a REGEXP and a blessed object will be an OBJ.

$class

If a reference is blessed than this value will hold the name of the package

$globname

If the item is a glob (not a reference to a GLOB!) then this value will hold its name.

glob_data(GLOB)

Takes a glob (not a glob reference!) and returns a list containing the name of the glob followed by name value pairs for each of its 'things'. Things being its SCALAR, HASH, ARRAY and CODE elements.

  ($name,%things)=glob_data(*Foo);

Dump(ARRAYREF,[ARRAYREF])

Takes an array of values to df_dump and optionally an array of names to use for them. Currently the second argument is unsupported.

report()

Once an object has been dumped using the Dump method this method will produce a report about the contents of the data structure. In list and scalar context returns the string of the report. In void context this printed to the currently selected filehandle.

EXPORTS

Dumper(LIST)

Simpler version of the Dump() method that can be optionally exported.

DEPENDENCIES

Data::BFDump uses the following pragmas, packages and classes.

Pragmas Used

strict warnings warnings::register vars constant overload

Modules Used

Carp Carp::Assert B::Deparse Text::Quote

TODO

More Pod

More Tests

Accessor methods for attributes.

Variable slicing.

More tests.

THANKS

Gurusamy Sarathy for Perl 5.6 and Data::Dumper.

Gisle Aas for lots of stuff not least being Data::Dump.

Dan Brook for testing and encouragement.

Perlmonks for being an awesome place to learn perl from.

AUTHOR

Yves Orton <demerphq@hotmail.com>

COPYRIGHT

Yves Orton 2002 -- This program is released under the same terms as perl itself.

SEE ALSO

perl