The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Seek - Search Complex Data Structures

VERSION

version 0.09

SYNOPSIS

    use Data::Seek;

    my $hash   = {...};
    my $seeker = Data::Seek->new(data => $hash);
    my $result = $seeker->search(...);
    my $data   = $result->data;

DESCRIPTION

Data::Seek is used for querying complex data structures. This module allows you to select and return specific node(s) in a hierarchical data structure using a simple and intuitive query syntax. The results can be returned as a list of values, or as a hash object in the same shape as the original.

ENCODING

During the processing of flattening a data structure with nested data, the following data structure would be converted into a collection of endpoint/value pairs.

    {
        'id' => 12345,
        'patient' => {
            'name' => {
                'first' => 'Bob',
                'last'  => 'Bee'
            }
        },
        'medications' => [{
            'aceInhibitors' => [{
                'name'      => 'lisinopril',
                'strength'  => '10 mg Tab',
                'dose'      => '1 tab',
                'route'     => 'PO',
                'sig'       => 'daily',
                'pillCount' => '#90',
                'refills'   => 'Refill 3'
            }],
            'antianginal' => [{
                'name'      => 'nitroglycerin',
                'strength'  => '0.4 mg Sublingual Tab',
                'dose'      => '1 tab',
                'route'     => 'SL',
                'sig'       => 'q15min PRN',
                'pillCount' => '#30',
                'refills'   => 'Refill 1'
            }],
        }]
    }

Given the aforementioned data structure, the following would be the resulting flattened structure comprised of endpoint/value pairs.

    {
        'id'                                      => 12345,
        'medications:0.aceInhibitors:0.dose'      => '1 tab',
        'medications:0.aceInhibitors:0.name'      => 'lisinopril',
        'medications:0.aceInhibitors:0.pillCount' => '#90',
        'medications:0.aceInhibitors:0.refills'   => 'Refill 3',
        'medications:0.aceInhibitors:0.route'     => 'PO',
        'medications:0.aceInhibitors:0.sig'       => 'daily',
        'medications:0.aceInhibitors:0.strength'  => '10 mg Tab',
        'medications:0.antianginal:0.dose'        => '1 tab',
        'medications:0.antianginal:0.name'        => 'nitroglycerin',
        'medications:0.antianginal:0.pillCount'   => '#30',
        'medications:0.antianginal:0.refills'     => 'Refill 1',
        'medications:0.antianginal:0.route'       => 'SL',
        'medications:0.antianginal:0.sig'         => 'q15min PRN',
        'medications:0.antianginal:0.strength'    => '0.4 mg Sublingual Tab',
        'patient.name.first'                      => 'Bob'
        'patient.name.last'                       => 'Bee',
    }

This structure provides the endpoint strings which will be matched against using the querying strategy.

QUERYING

During the processing of querying the data structure, the criteria (query expressions) are converted into a series of regular expressions to be applied sequentially, filtering/reducing the endpoints and producing a data set of matching nodes or throwing an exception explaining the search failure.

  • Node Expression

        my $result = $seeker->search(...);
    
        # given "id"
        { id => 12345 }

    The node expression is a part of a criterion, which preforms an exact match against a node in the data structure. It is a string which can contain letters, numbers, and/or underscores.

  • Step Expression

        my $result = $seeker->search(...);
    
        # given "patient.name.first"
        { patient => { name => { first => "Bob" } } }
    
        # given "patient.name.last"
        { patient => { name => { last => "Bee" } } }

    The step expression is a criterion, or part of a criterion, made up of one or more node expressions separated using the period character, which matches against nodes in the data structure. It is a string which can contain letters, numbers, and/or underscores, separated using periods.

  • Index Expression

        my $result = $seeker->search(...);
    
        # given "medications:0.aceInhibitors:0.dose"
        { medications => [{ aceInhibitors => [{ dose => "1 tab" }] }] }
    
        # given "medications:0.aceInhibitors:0.name"
        { medications => [{ aceInhibitors => [{ name => "lisinopril" }] }], }
    
        # given "medications:0.aceInhibitors:0.pillCount"
        { medications => [{ aceInhibitors => [{ pillCount => "#90" }] }] }

    The index expression is a criterion, or part of a criterion, having a node expressions suffixed with a colon followed by a number denoting that it should only match an array which has an index corresponding to the numeric portion of the suffix. It is a string which can contain letters, numbers, and/or underscores, suffixed with a semi-colon followed by a number.

  • Iterator Expression

        my $result = $seeker->search(...);
    
        # given "@medications.@aceInhibitors.dose"
        { medications => [{ aceInhibitors => [{ dose => "1 tab" }] }] }
    
        # given "@medications.@aceInhibitors.name"
        { medications => [{ aceInhibitors => [{ name => "lisinopril" }] }], }
    
        # given "@medications.@aceInhibitors.pillCount"
        { medications => [{ aceInhibitors => [{ pillCount => "#90" }] }] }

    The iteration expression is a criterion, or part of a criterion, having a node expressions preceded by an "at" character denoting that the node expression should match all nodes in the data structure which are mapped to array objects. It is a string which can contain letters, numbers, and/or underscores, preceded by a single ampersand character.

  • Wildcard Expression

        my $result = $seeker->search(...);
    
        # given "*"
        { id => 12345 }
    
        # given "*.*.first"
        { patient => { name => { first => "Bob" } } }
    
        # given "*.*.last"
        { patient => { name => { last => "Bee" } } }
    
        # given "patient.*.first"
        { patient => { name => { first => "Bob" } } }
    
        # given "patient.*.last"
        { patient => { name => { last => "Bee" } } }
    
        # given "@*.@*.pillCount"
        {
            medications => [{
                aceInhibitors => [{ pillCount => "#90" }],
                antianginal   => [{ pillCount => "#30" }],
            }],
        }

    The wildcard expression is a criterion, or part of a criterion, which matches against a single node having a single "star" character match and represent one node expression. It is a string which can contain letters, numbers, underscores, and/or a single star character.

  • Greedy-Wildcard Expression

        my $result = $seeker->search(...);
    
        # given "**.first"
        { patient => { name => { first => "Bob" } } }
    
        # given "**.last"
        { patient => { name => { last => "Bee" } } }
    
        # given "patient.**"
        { patient => { name => { first => "Bob", last => "Bee" } } }
    
        # given "medications**.pillCount"
        {
            medications => [{
                aceInhibitors => [{ pillCount => "#90" }],
                antianginal   => [{ pillCount => "#30" }],
            }],
        }

    The greedy-wildcard expression is a criterion, or part of a criterion, which matches against any multitude of nodes having a double "star" character match and represent one or more of any character. It is a string which can contain letters, numbers, underscores, and/or a double star character.

ATTRIBUTES

data

    $seeker->data;
    $seeker->data({...});

The data structure to be introspected, must be a hash reference, which is coerced into a Data::Object::Hash object.

ignore

    $seeker->ignore;
    $seeker->ignore(1);

Bypass exceptions thrown when a criterion is invalid or no data matches can be found. This attribute must be an integer, which is coerced into a Data::Object::Integer object.

METHODS

    my $search = $seeker->search('id', 'person.name.*');

Prepare a search object to use the supplied criteria and return a search object. Introspection is triggered when the result method is enacted. See Data::Seek::Search for usage information.

CONCEPT

The follow is a short and simple overview of the strategy and syntax used by Data::Seek to query complex data structures. The overall idea behind Data::Seek is to flatten/fold the data structure, reduce it by applying a series patterns, then, unflatten/unfold and operate on the new data structure. The introspection strategy is to flatten the data structure producing a non-hierarchical data structure where its keys represent endpoints (using dot-notation and colons to separate (and denote) nested hash keys and array indices respectively) within the structure.

AUTHOR

Al Newkirk <anewkirk@ana.io>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Al Newkirk.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.