NAME

Data::Walk::Extracted - An extracted dataref walker

SYNOPSIS

This is a contrived example! For a more functional (complex/useful) example see the roles in this package.

        package Data::Walk::MyRole;
        use Moose::Role;
        requires '_process_the_data';
        use MooseX::Types::Moose qw(
                        Str
                        ArrayRef
                        HashRef
                );
        my $mangle_keys = {
                Hello_ref => 'primary_ref',
                World_ref => 'secondary_ref',
        };

        #########1 Public Method      3#########4#########5#########6#########7#########8

        sub mangle_data{
                my ( $self, $passed_ref ) = @_;
                @$passed_ref{ 'before_method', 'after_method' } =
                        ( '_mangle_data_before_method', '_mangle_data_after_method' );
                ### Start recursive parsing
                $passed_ref = $self->_process_the_data( $passed_ref, $mangle_keys );
                ### End recursive parsing with: $passed_ref
                return $passed_ref->{Hello_ref};
        }

        #########1 Private Methods    3#########4#########5#########6#########7#########8

        ### If you are at the string level merge the two references
        sub _mangle_data_before_method{
                my ( $self, $passed_ref ) = @_;
                if(
                        is_Str( $passed_ref->{primary_ref} ) and
                        is_Str( $passed_ref->{secondary_ref} )          ){
                        $passed_ref->{primary_ref} .= " " . $passed_ref->{secondary_ref};
                }
                return $passed_ref;
        }

        ### Strip the reference layers on the way out
        sub _mangle_data_after_method{
                my ( $self, $passed_ref ) = @_;
                if( is_ArrayRef( $passed_ref->{primary_ref} ) ){
                        $passed_ref->{primary_ref} = $passed_ref->{primary_ref}->[0];
                }elsif( is_HashRef( $passed_ref->{primary_ref} ) ){
                        $passed_ref->{primary_ref} = $passed_ref->{primary_ref}->{level};
                }
                return $passed_ref;
        }

        package main;
        use MooseX::ShortCut::BuildInstance qw( build_instance );
        my      $AT_ST = build_instance(
                        package         => 'Greeting',
                        superclasses    => [ 'Data::Walk::Extracted' ],
                        roles           => [ 'Data::Walk::MyRole' ],
                );
        print $AT_ST->mangle_data( {
                        Hello_ref =>{ level =>[ { level =>[ 'Hello' ] } ] },
                        World_ref =>{ level =>[ { level =>[ 'World' ] } ] },
                } ) . "\n";



        #################################################################################
        #     Output of SYNOPSIS
        # 01:Hello World
        #################################################################################

DESCRIPTION

This module takes a data reference (or two) and recursivly travels through it(them). Where the two references diverge the walker follows the primary data reference. At the beginning and end of each branch or node in the data the code will attempt to call a method on the remaining unparsed data.

Acknowledgement of MJD

This is an implementation of the concept of extracted data walking from Higher-Order-Perl Chapter 1 by Mark Jason Dominus. The book is well worth the money! With that said I diverged from MJD purity in two ways. This is object oriented code not functional code. Second, when taking action the code will search for class methods provided by (your) role rather than acting on passed closures. There is clearly some overhead associated with both of these differences. I made those choices consciously and if that upsets you do not hassle MJD!

What is the unique value of this module?

With the recursive part of data walking extracted the various functionalities desired when walking the data can be modularized without copying this code. The Moose framework also allows diverse and targeted data parsing without dragging along a kitchen sink API for every use of this class.

Extending Data::Walk::Extracted

All action taken during the data walking must be initiated by implementation of action methods that do not exist in this class. It usually also makes sense to build an initial action method as well. The initial action method can do any data-preprocessing that is useful as well as providing the necessary set up for the generic walker. All of these elements can be combined with this class using a Moose role , by extending the class, or it can be joined to the class at run time. See MooseX::ShortCut::BuildInstance . or Moose::Util for more class building information. See the parsing flow to understand the details of how the methods are used. See methods used to write roles for the available methods to implement the roles.

Then, Write some tests for your role!

Recursive Parsing Flow

Initial data input and scrubbing

The primary input method added to this class for external use is refered to as the 'action' method (ex. 'mangle_data'). This action method needs to receive data and organize it for sending to the start method for the generic data walker. Remember if more than one role is added to Data::Walk::Extracted for a given instance then all methods should be named with consideration for other (future?) method names. The '$conversion_ref' allows for muliple uses of the core data walkers generic functions. The $conversion_ref is not passed deeper into the recursion flow.

Assess and implement the before_method

The class next checks for an available 'before_method'. Using the test;

        exists $passed_ref->{before_method};

If the test passes then the next sequence is run.

        $method = $passed_ref->{before_method};
        $passed_ref = $self->$method( $passed_ref );

If the $passed_ref is modified by the 'before_method' then the recursive parser will parse the new ref and not the old one. The before_method can set;

        $passed_ref->{skip} = 'YES'

Then the flow checks for the need to investigate deeper.

Test for deeper investigation

The code now checks if deeper investigation is required checking both that the 'skip' key = 'YES' in the $passed_ref or if the node is a base ref type. If either case is true the process jumps to the after method otherwise it begins to investigate the next level.

Identify node elements

If the next level in is not skipped then a list is generated for all paths in the node. For example a 'HASH' node would generate a list of hash keys for that node. SCALAR nodes will generate a list with only one element containing the scalar contents. UNDEF nodes will generate an empty list.

Sort the node as required

If the list should be sorted then the list is sorted. ARRAYS are hard sorted. This means that the actual items in the (primary) passed data ref are permanantly sorted.

Process each element

For each identified element of the node a new $data_ref is generated containing data that represents just that sub element. The secondary_ref is only constructed if it has a matching type and element to the primary ref. Matching for hashrefs is done by key matching only. Matching for arrayrefs is done by position exists testing only. No position content compare is done! Scalars are matched on content. The list of items generated for this element is as follows;

before_method => -->name of before method for this role here<--

after_method => -->name of after method for this role here<--

primary_ref => the piece of the primary data ref below this element

primary_type => the lower primary (walker) ref type

match => YES|NO (This indicates if the secondary ref meets matching critera)

skip => YES|NO Checks the three skip attributes against the lower primary_ref node. This can also be set in the 'before_method' upon arrival at that node.

secondary_ref => if match eq 'YES' then built like the primary ref

secondary_type => if match eq 'YES' then calculated like the primary type

branch_ref => stack trace

A position trace is generated

The current node list position is then documented and pushed onto the array at $passed_ref->{branch_ref}. The array reference stored in branch_ref can be thought of as the stack trace that documents the node elements directly between the current position and the initial (or zeroth) level of the parsed primary data_ref. Past completed branches and future pending branches are not maintained. Each element of the branch_ref contains four positions used to describe the node and selections used to traverse that node level. The values in each sub position are;

        [
                ref_type, #The node reference type
                the list item value or '' for ARRAYs,
                        #key name for hashes, scalar value for scalars
                element sequence position (from 0),
                        #For hashes this is only relevent if sort_HASH is called
                level of the node (from 0),
                        `#The zeroth level is the initial data ref
        ]

Going deeper in the data

The down level ref is then passed as a new data set to be parsed and it starts at the before_method again.

Actions on return from recursion

When the values are returned from the recursion call the last branch_ref element is poped off and the returned data ref is used to replace the sub elements of the primary_ref and secondary_ref associated with that list element in the current level of the $passed_ref. If there are still pending items in the node element list then the program processes them too

Assess and implement the after_method

After the node elements have all been processed the class checks for an available 'after_method' using the test;

        exists $passed_ref->{after_method};

If the test passes then the following sequence is run.

        $method = $passed_ref->{after_method};
        $passed_ref = $self->$method( $passed_ref );

If the $passed_ref is modified by the 'after_method' then the recursive parser will parse the new ref and not the old one.

Go up

The updated $passed_ref is passed back up to the next level .

Attributes

Data passed to ->new when creating an instance. For modification of these attributes see Public Methods. The ->new function will either accept fat comma lists or a complete hash ref that has the possible attributes as the top keys. Additionally some attributes that have the following prefixed methods; get_$name, set_$name, clear_$name, and has_$name can be passed to _process_the_data and will be adjusted for just the run of that method call. These are called one shot attributes. Nested calls to _process_the_data will be tracked and the attribute will remain in force until the parser returns to the calling 'one shot' level. Previous attribute values are restored after the 'one shot' attribute value expires.

sorted_nodes

Definition: If the primary_type of the $element_ref is a key in this attribute hash ref then the node list is sorted. If the value of that key is a CODEREF then the sort sort function will called as follows.

        @node_list = sort $coderef @node_list

For the type 'ARRAY' the node is sorted (permanantly) by the element values. This means that if the array contains a list of references it will effectivly sort against the ASCII of the memory pointers. Additionally the 'secondary_ref' node is not sorted, so prior alignment may break. In general ARRAY sorts are not recommended.

Default {} #Nothing is sorted

Range This accepts a HashRef.

Example:

        sorted_nodes =>{
                ARRAY   => 1,#Will sort the primary_ref only
                HASH    => sub{ $b cmp $a }, #reverse sort the keys
        }

skipped_nodes

Definition: If the primary_type of the $element_ref is a key in this attribute hash ref then the 'before_method' and 'after_method' are run at that node but no parsing is done.

Default {} #Nothing is skipped

Range This accepts a HashRef.

Example:

        sorted_nodes =>{
                OBJECT => 1,#skips all object nodes
        }

skip_level

Definition: This attribute is set to skip (or not) node parsing at the set level. Because the process doesn't start checking until after it enters the data ref it effectivly ignores a skip_level set to 0 (The base node level). The test checks against the value in last position of the prior trace array ref + 1.

Default undef = Nothing is skipped

Range This accepts an integer

skip_node_tests

Definition: This attribute contains a list of test conditions used to skip certain targeted nodes. The test can target an array position, match a hash key, even restrict the test to only one level. The test is run against the latest branch_ref element so it skips the node below the matching conditions not the node at the matching conditions. Matching is done with '=~' and so will accept a regex or a string. The attribute contains an ArrayRef of ArrayRefs. Each sub_ref contains the following;

$type - This is any of the identified reference node types

$key - This is either a scalar or regex to use for matching a hash key

$position - This is used to match an array position. It can be an integer or 'ANY'

$level - This restricts the skipping test usage to a specific level only or 'ANY'

Example:

        [
                [ 'HASH', 'KeyWord', 'ANY', 'ANY'],
                # Skip the node below the value of any hash key eq 'Keyword'
                [ 'ARRAY', 'ANY', '3', '4'], ],
                # Skip the node stored in arrays at position three on level four
        ]

Range An infinite number of skip tests added to an array

Default [] = no nodes are skipped

change_array_size

Definition: This attribute will not be used by this class directly. However the Data::Walk::Prune role may share it with other roles in the future so it is placed here so there will be no conflicts. This is usually used to define whether an array size shinks when an element is removed.

Default 1 (This probably means that the array will shrink when a position is removed)

Range Boolean values.

fixed_primary

Definition: This means that no changes made at lower levels will be passed upwards into the final ref.

Default 0 = The primary ref is not fixed (and can be changed) 0 -> effectively deep clones the portions of the primary ref that are traversed.

Range Boolean values.

Methods

Methods used to write roles

These are methods that are not meant to be exposed to the final user of a composed role and class but are used by the role to excersize the class.

_process_the_data( $passed_ref, $conversion_ref )

Definition: This method is the gate keeper to the recursive parsing of Data::Walk::Extracted. This method ensures that the minimum requirements for the recursive data parser are met. If needed it will use a conversion ref (also provided by the caller) to change input hash keys to the generic hash keys used by this class. This function then calls the actual recursive function. For an overview of the recursive steps see the flow outline.

Accepts: ( $passed_ref, $conversion_ref )

$passed_ref this ref contains key value pairs as follows;

primary_ref - a dataref that the walker will walk - required

review the $conversion_ref functionality in this function for renaming of this key.

secondary_ref - a dataref that is used for comparision while walking. - optional

review the $conversion_ref functionality in this function for renaming of this key.

before_method - a method name that will perform some action at the beginning of each node - optional

after_method - a method name that will perform some action at the end of each node - optional

[attribute name] - supported attribute names are accepted with temporary attribute settings here. These settings are temporarily set for a single "_process_the_data" call and then the original attribute values are restored.

$conversion_ref This allows a public method to accept different key names for the various keys listed above and then convert them later to the generic terms used by this class. - optional

Example

        $passed_ref ={
                print_ref =>{
                        First_key => [
                                'first_value',
                                'second_value'
                        ],
                },
                match_ref =>{
                        First_key       => 'second_value',
                },
                before_method   => '_print_before_method',
                after_method    => '_print_after_method',
                sorted_nodes    =>{ Array => 1 },#One shot attribute setter
        }

        $conversion_ref ={
                primary_ref     => 'print_ref',# generic_name => role_name,
                secondary_ref   => 'match_ref',
        }

Returns: the $passed_ref (only) with the key names restored to the ones passed to this method using the $conversion_ref.

_build_branch( $seed_ref, @arg_list )

Definition: There are times when a role will wish to reconstruct the data branch that lead from the 'zeroth' node to where the data walker is currently at. This private method takes a seed reference and uses data found in the branch ref to recursivly append to the front of the seed until a complete branch to the zeroth node is generated. The branch_ref list must be explicitly passed.

Accepts: a list of arguments starting with the $seed_ref to build from. The remaining arguments are just the array elements of the 'branch ref'.

Example:

        $ref = $self->_build_branch(
                $seed_ref,
                @{ $passed_ref->{branch_ref}},
        );

Returns: a data reference with the current path back to the start pre-pended to the $seed_ref

_extracted_ref_type( $test_ref )

Definition: In order to manage data types necessary for this class a data walker compliant 'Type' tester is provided. This is necessary to support a few non perl-standard types not generated in standard perl typing systems. First, 'undef' is the UNDEF type. Second, strings and numbers both return as 'SCALAR' (not '' or undef). Much of the code in this package runs on dispatch tables that are built around these specific type definitions.

Accepts: It receives a $test_ref that can be undef.

Returns: a data walker type or it confesses.

_get_had_secondary

Definition: during the initial processing of data in _process_the_data the existence of a passed secondary ref is tested and stored in the attribute '_had_secondary'. On occasion a role might need to know if a secondary ref existed at any level if it it is not represented at the current level.

Accepts: nothing

Returns: True|1 if the secondary ref ever existed

_get_current_level

Definition: on occasion you may need for one of the methods to know what level is currently being parsed. This will provide that information in integer format.

Definition: This method is used to test if the fixed_primary attribute is set.

Accepts: nothing

Returns: $Bool value indicating if the 'fixed_primary' attribute has been set

clear_fixed_primary()

Definition: This method clears the fixed_primary attribute.

Accepts: nothing

Returns: nothing

Definitions

node

Each branch point of a data reference is considered a node. The possible paths deeper into the data structure from the node are followed 'vertically first' in recursive parsing. The original top level reference is considered the 'zeroth' node.

base node type

Recursion 'base' node types are considered to not have any possible deeper branches. Currently that list is SCALAR and UNDEF.

Supported node walking types

ARRAY

HASH

SCALAR

UNDEF

Other node support

Support for Objects is partially implemented and as a consequence '_process_the_data' won't immediatly die when asked to parse an object. It will still die but on a dispatch table call that indicates where there is missing object support, not at the top of the node. This allows for some of the skip attributes to use 'OBJECT' in their definitions.

Supported one shot attributes

explanation

sorted_nodes
skipped_nodes
skip_level
skip_node_tests
change_array_size
fixed_primary

Dispatch Tables

This class uses the role Data::Walk::Extracted::Dispatch to implement dispatch tables. When there is a decision point, that role is used to make the class extensible.

Caveat utilitor

This is not an extention of Data::Walk

The core class has no external effect. All output comes from additions to the class.

This module uses the 'defined or' ( //= ) and so requires perl 5.010 or higher.

This is a Moose based data handling class. Many coders will tell you Moose and data manipulation don't belong together. They are most certainly right in speed intensive circumstances.

Recursive parsing is not a good fit for all data since very deep data structures will fill up a fair amount of memory! Meaning that as the module recursively parses through the levels it leaves behind snapshots of the previous level that allow it to keep track of it's location.

The passed data references are effectivly deep cloned during this process. To leave the primary_ref pointer intact see fixed_primary

Build/Install from Source

1. Download a compressed file with the code

2. Extract the code from the compressed file. If you are using tar this should work:

        tar -zxvf Data-Walk-Extracted-v0.xx.xx.tar.gz

3. Change (cd) into the extracted directory

4. Run the following commands

(For Windows find what version of make was used to compile your perl)

        perl  -V:make

(then for Windows substitute the correct make function (ex. s/make/dmake/g))

        >perl Makefile.PL

        >make

        >make test

        >make install # As sudo/root

        >make clean

SUPPORT

github Data-Walk-Extracted/issues

TODO

1. provide full recursion through Objects

2. Support recursion through CodeRefs (Closures)

3. Add a Data::Walk::Diff Role to the package

4. Add a Data::Walk::Top Role to the package

5. Add a Data::Walk::Thin Role to the package

6. Convert test suite to Test2 direct usage

AUTHOR

Jed Lund
jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

Dependencies

version

5.010 (for use of defined or //)

utf8

Class::Inspector

Scalar::Util

Carp - confess

Moose - 2.1803

MooseX::StrictConstructor

MooseX::HasDefaults::RO

MooseX::Types::Moose

Class::Inspector

Scalar::Util - reftype

MooseX::Types::Moose

Data::Walk::Extracted::Types

Data::Walk::Extracted::Dispatch

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

Acknowledgement of MJD

What is the unique value of this module?

Extending Data::Walk::Extracted

Recursive Parsing Flow

Initial data input and scrubbing

Assess and implement the before_method

Test for deeper investigation

Identify node elements

Sort the node as required

Process each element

A position trace is generated

Going deeper in the data

Actions on return from recursion

Assess and implement the after_method

Go up

Attributes

sorted_nodes

skipped_nodes

skip_level

skip_node_tests

change_array_size

fixed_primary

Methods

Methods used to write roles

_process_the_data( $passed_ref, $conversion_ref )

_build_branch( $seed_ref, @arg_list )

_extracted_ref_type( $test_ref )

_get_had_secondary

_get_current_level

Public Methods

add_sorted_nodes( NODETYPE => 1, )

has_sorted_nodes

check_sorted_nodes( NODETYPE )

clear_sorted_nodes

remove_sorted_node( NODETYPE1, NODETYPE2, )

set_sorted_nodes( $hashref )

get_sorted_nodes

add_skipped_nodes( NODETYPE1 => 1, NODETYPE2 => 1 )

has_skipped_nodes

check_skipped_node( $string )

remove_skipped_nodes( NODETYPE1, NODETYPE2 )

clear_skipped_nodes

set_skipped_nodes( $hashref )

get_skipped_nodes

set_skip_level( $int )

get_skip_level()

has_skip_level()

clear_skip_level()

set_skip_node_tests( ArrayRef[ArrayRef] )

get_skip_node_tests()

has_skip_node_tests()

clear_skip_node_tests()

add_skip_node_tests( ArrayRef1, ArrayRef2 )

set_change_array_size( $bool )

get_change_array_size()

has_change_array_size()

clear_change_array_size()

set_fixed_primary( $bool )

get_fixed_primary()

has_fixed_primary()

clear_fixed_primary()

Definitions

node

base node type

Supported node walking types

Supported one shot attributes

Dispatch Tables

Caveat utilitor

Build/Install from Source

SUPPORT

TODO

AUTHOR

COPYRIGHT

Dependencies

SEE ALSO

Module Install Instructions