Michael Shipper > Data-Range-Compare-Stream > Data::Range::Compare::Stream

Download:
Data-Range-Compare-Stream-4.029.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 4.029   Source  

NAME ^

Data::Range::Compare::Stream - Compute intersections of Ranges with Iterators

SYNOPSIS A ^

  use Data::Range::Compare::Stream;
  use Data::Range::Compare::Stream::Iterator::Array;
  use Data::Range::Compare::Stream::Iterator::Consolidate;
  use Data::Range::Compare::Stream::Iterator::Compare::Asc;

  # create the iterator for column_a's Consolidation iterator
  my $column_a=Data::Range::Compare::Stream::Iterator::Array->new();
  $column_a->create_range(3,11);
  $column_a->create_range(17,19);

  # create the iterator for column_b's Consolidation iterator
  my $column_b=Data::Range::Compare::Stream::Iterator::Array->new();
  $column_b->create_range(0,0);
  $column_b->create_range(1,3);
  $column_b->create_range(5,7);
  $column_b->create_range(6,9);
  $column_b->create_range(11,15);
  $column_b->create_range(17,33);

  # sort columns a and be in consolidate order
  $column_a->prepare_for_consolidate_asc;
  $column_b->prepare_for_consolidate_asc;

  # create the consolidator object for column_a our iterator to it
  my $column_a_consolidator=Data::Range::Compare::Stream::Iterator::Consolidate->new($column_a);

  # create the consolidator object for column_b our iterator to it
  my $column_b_consolidator=Data::Range::Compare::Stream::Iterator::Consolidate->new($column_b);

  # create the object that will compare columns a and b
  my $compare=new Data::Range::Compare::Stream::Iterator::Compare::Asc;

  # add column a for processing
  $compare->add_consolidator($column_a_consolidator);

  # add column b for processing
  $compare->add_consolidator($column_b_consolidator);


  # now we can compute the intersections of our objects

  while($compare->has_next) {
     # fetch our current result object
    my $row=$compare->get_next;
  
    # if no ranges overlap with this row move on
    next if $row->is_empty;
  
    # now we can output the current range
    my $common_range=$row->get_common;
    my $overlap_count=$row->get_overlap_count;
  
    print "A total of: [$overlap_count] Ranges intersected with Common range: $common_range\n";
  
    my $overlap_ids=$row->get_overlap_ids;
    for each my $consolidator_id (@{$overlap_ids}) {
  
      if($consolidator_id==0) {
  
        my $result=$row->get_consolidator_result_by_id($consolidator_id);
        print "  Column a contained the following overlaps $result\n";
  
      } elsif($consolidator_id==1) {
  
        my $result=$row->get_consolidator_result_by_id$consolidator_id);
        print "  Column b contained the following overlaps $result\n";
  
      }
  
    }
  
    print "\n";
 
  }

DESCRIPTION ^

This library implements an algorithm that can be used to compute gaps, differences, and intersections across multiple sets of 2 dimensional ranges from both a vertical and horizontal perspective. The algorithm itself generates missing data creating contiguous ranges allowing for parallel comparisons. Once missing ranges have been generated what was once despondent sets of ranges can now be aligned and compared for intersections.

Figures 1 through 5 represent aspects of the comparison process: Although these steps are explained individually each operation is completed in parallel.

Figure 1 ( Example Data )

Given 3 complex sets of data we will outline the process used to compute the horizontal intersections and vertical gaps in these sets of ranges. See Figure 1 for the example sets of data.

  Numeric Range set: A
  +----------+
  | 1 - 11   |
  | 13 - 44  |
  | 17 - 23  |
  | 55 - 66  |
  +----------+

  Numeric Range set: B
  +----------+
  | 0  - 1   |
  | 2  - 29  |
  | 88 - 133 |
  +----------+

  Numeric Range set: C
  +-----------+
  | 17  - 29  |
  | 220 - 240 |
  | 241 - 250 |
  +-----------+

Figure 2 ( Numeric range set: A, post consolidation )

Looking over data sets A,B, and C we can see several problems: Set A contains ranges that overlap with each other, and sets A through C do not align on consistent boundaries.

Part of the process of comparing sets of data involves making sure there are no duplicates or overlaps in each individual source of data. The first action taken is the removal of duplicates and consolidation of overlapping ranges: This process can be seen by looking at the resulting conversion for "Numeric Range Set: A" seen in Figure 2. The overlapping range "17 - 23" is contained by a larger range in the set "13 - 44" thus "17 - 33" needs to be removed. Data Sets B and C do not contain any duplicated or overlapping ranges and thus require no changes.

  Consolidated Numeric Range set: A
  +----------+
  | 1 - 11   |
  | 13 - 44  |
  | 55 - 66  |
  +----------+

Figure 3 ( Numeric Ranges, gaps filled )

The next step in the comparison process is iterating through our data and figuring out where the gaps are in each data set. Each set of data contains gaps between ranges, and those gaps need be calculated in order for a proper comparison of the data to begin.

  Numeric Range set: A
  +----------+
  | 1 - 11   |
    12 - 12 -- No Data
  | 13 - 44  |
    45 - 54 -- No Data
  | 55 - 66  |
  +----------+

  Numeric Range set: B
  +----------+
  | 0  - 1   |
  | 2  - 29  |
    30 - 87 -- No Data
  | 88 - 133 |
  +----------+

  Numeric Range set: C
  +-----------+
  | 17  - 29  |
    30  - 219 -- No Data
  | 220 - 240 |
  | 241 - 250 |
  +-----------+

Figure 4 ( Results )

The intersecting range represents overlapping points between ranges or the "Common Range". The concept is not just limited to missing columns, the algorithm itself will also compute missing rows. In Figure 4 Row 0 the Common Range is defined as "0 - 0". This is because the only column with overlapping data is in "Numeric Range B" 0 - 1. 0 - 1 also overlaps with "Numeric Column A: 1 - 11". This means the first common range is "0 - 0", and the second common range is defined as "1 - 1". The relationship of the common range is defined not just by the current ranges being compared but also by the next common range to be computed as well.

Three result rows stand out because they contain no data in sets A,B or C: "45 - 54","67 - 87","134 - 219". These 3 rows represent a horizontal gap in our 3 sources of data and can be excluded from the result set to make the data more consistent for someone reviewing just the intersections of the data from sets A,B, and C see: Figure 5.

  +--------------------------------------------------------------------+
  | Common Range | Numeric Range A | Numeric Range B | Numeric Range C |
  +--------------------------------------------------------------------+
  | 0 - 0        |   No Data       |   0 - 1         |   No Data       |
  | 1 - 1        |   1 - 11        |   0 - 1         |   No Data       |
  | 2 - 11       |   1 - 11        |   2 - 29        |   No Data       |
  | 12 - 12      |   No Data       |   2 - 29        |   No Data       |
  | 13 - 16      |   13 - 44       |   2 - 29        |   No Data       |
  | 17 - 29      |   13 - 44       |   2 - 29        |   17 - 29       |
  | 30 - 44      |   13 - 44       |   No Data       |   No Data       |
  | 45 - 54      |   No Data       |   No Data       |   No Data       |
  | 55 - 66      |   55 - 66       |   No Data       |   No Data       |
  | 67 - 87      |   No Data       |   No Data       |   No Data       |
  | 88 - 133     |   No Data       |   88 - 133      |   No Data       |
  | 134 - 219    |   No Data       |   No Data       |   No Data       |
  | 220 - 240    |   No Data       |   No Data       |   220 - 240     |
  | 241 - 250    |   No Data       |   No Data       |   241 - 250     |
  +--------------------------------------------------------------------+

Figure 5 ( Filtered results )

Looking at the results form the compare process we can see the "Common Range" explains the overlap. Figure 4 shows not only sets of data we have, but it also shows sets of data that we don't have: these additional ranges are a bi-product of the comparison process and have been filtered out of the result set in Figure 5.

  +--------------------------------------------------------------------+
  | Common Range | Numeric Range A | Numeric Range B | Numeric Range C |
  +--------------------------------------------------------------------+
  | 0 - 0        |   No Data       |   0 - 1         |   No Data       |
  | 1 - 1        |   1 - 11        |   0 - 1         |   No Data       |
  | 2 - 11       |   1 - 11        |   2 - 29        |   No Data       |
  | 12 - 12      |   No Data       |   2 - 29        |   No Data       |
  | 13 - 16      |   13 - 44       |   2 - 29        |   No Data       |
  | 17 - 29      |   13 - 44       |   2 - 29        |   17 - 29       |
  | 30 - 44      |   13 - 44       |   No Data       |   No Data       |
  | 55 - 66      |   55 - 66       |   No Data       |   No Data       |
  | 88 - 133     |   No Data       |   88 - 133      |   No Data       |
  | 220 - 240    |   No Data       |   No Data       |   220 - 240     |
  | 241 - 250    |   No Data       |   No Data       |   241 - 250     |
  +--------------------------------------------------------------------+

The final stage, only ranges derived from our original source data is presented and we have excluded any rows that contain none of our original data.

SYNOPSIS B ^

  use Data::Range::Compare::Stream::Iterator::File;
  use Data::Range::Compare::Stream;
  use Data::Range::Compare::Stream::Iterator::Consolidate::OverlapAsColumn;
  use Data::Range::Compare::Stream::Iterator::Compare::Asc;
  
  my $compare=new  Data::Range::Compare::Stream::Iterator::Compare::Asc();
  
  foreach my $file (qw(source_a.src source_b.src source_d.src source_c.src source_d.src)) {
    my $src=Data::Range::Compare::Stream::Iterator::File->new(filename=>$file);
    my $con=new Data::Range::Compare::Stream::Iterator::Consolidate::OverlapAsColumn($src,$compare);
    $compare->add_consolidator($con);
  }
  
  my $format='  | %-12s | %-26s |  %-26s|  %-26s|  %-26s|'."\n";
  my $break="  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+\n";
  printf "$break$format$break","Intersection","Set A",'Set B','Set C','Set D';
  
  while($compare->has_next) {
  
    my $result=$compare->get_next;
    next if $result->is_empty;
  
    my $columns=$result->get_root_results($result);
    my @row=($result->get_common);
    foreach my $id (@{$result->get_root_ids}) {
      if($#{$columns->[$id]}!=-1) {
      push @row, join ', ',map { $_->get_common } @{$columns->[$id]};
      } else {
        push @row,"No Data";
      }
    }
  
    printf $format,@row;
    print $break;
  }

Going Beyond Simple Consolidation ^

SYNOPSIS A and Figures 1 through 5 provide a very basic concept for consolidation and comparison: But an alternate way to deal with duplicates and overlaps is to copy the duplicates and overlaps to additional columns: For every duplicate and overlap found that data is then moved over to an additional column.

SYNOPSES B and Figures 6 through 8 represent aspects of the extended comparison process: Although these steps are explained individually each operation is completed in parallel.

Figure 6 ( Example Data )

Given 4 sets of data we will look at the extended consolidation process.

  +-----------+
  | Set A     |
  +-----------+
  | 1 - 11    |
  +-----------+
  | 13 - 44   |
  +-----------+
  | 17 - 24   |
  +-----------+
  | 55 - 66   |
  +-----------+

  +-----------+
  | Set B     |
  +-----------+
  | 0 - 1     |
  +-----------+
  | 2 - 29    |
  +-----------+
  | 88 - 133  |
  +-----------+

  +-----------+
  | Set C     |
  +-----------+
  | 17 - 29   |
  +-----------+
  | 220 - 240 |
  +-----------+
  | 241 - 250 |
  +-----------+

  +-----------+
  | Set D     |
  +-----------+
  | 0 - 1     |
  +-----------+
  | 0 - 0     |
  +-----------+
  | 1 - 1     |
  +-----------+
  | 5 - 7     |
  +-----------+
  | 7 - 9     |
  +-----------+
  | 11 - 19   |
  +-----------+
  | 12 - 18   |
  +-----------+
  | 17 - 29   |
  +-----------+
  | 220 - 240 |
  +-----------+
  | 241 - 250 |
  +-----------+

Figure 7 ( Moving duplicates and overlaps to new columns )

Duplicates and overlaps are moved to additional columns and compared for a common range: Columns A becomes 2 columns, Column D Becomes 3 Columns. Once the additional columns are created each set of data can be compared for the common range.

  +---------------------+----------------------------+
  | Set A Common Ranges | Duplicates and Overlaps    |
  +---------------------+----------------------------+
  | 1 - 11              | 1 - 11                     |
  | 13 - 16             | 13 - 44                    |
  | 17 - 24             | 13 - 44, 17 - 24           |
  | 25 - 44             | 13 - 44                    |
  | 55 - 66             | 55 - 66                    |
  +---------------------+----------------------------+

  +---------------------+----------------------------+
  | Set B Common Ranges | Duplicates and Overlaps    |
  +---------------------+----------------------------+
  | 0 - 1               | 0 - 1                      |
  | 2 - 29              | 2 - 29                     |
  | 88 - 133            | 88 - 133                   |
  +---------------------+----------------------------+

  +---------------------+----------------------------+
  | Set C Common Ranges | Duplicates and Overlaps    |
  +---------------------+----------------------------+
  | 17 - 29             | 17 - 29                    |
  | 220 - 240           | 220 - 240                  |
  | 241 - 250           | 241 - 250                  |
  +---------------------+----------------------------+

  +---------------------+----------------------------+
  | Set D Common Ranges | Duplicates and Overlaps    |
  +---------------------+----------------------------+
  | 0 - 0               | 0 - 1, 0 - 0               |
  | 1 - 1               | 0 - 1, 1 - 1               |
  | 5 - 6               | 5 - 7                      |
  | 7 - 7               | 5 - 7, 7 - 9               |
  | 8 - 9               | 7 - 9                      |
  | 11 - 11             | 11 - 19                    |
  | 12 - 16             | 11 - 19, 12 - 18           |
  | 17 - 18             | 11 - 19, 12 - 18, 17 - 29  |
  | 19 - 19             | 11 - 19, 17 - 29           |
  | 20 - 29             | 17 - 29                    |
  | 220 - 240           | 220 - 240                  |
  | 241 - 250           | 241 - 250                  |
  +---------------------+----------------------------+

Figure 8

Once the duplicates and overlaps have been moved into additional columns the sets of data can be compared in parallel.

  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | Intersection | Set A                      |  Set B                     |  Set C                     |  Set D                     |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 0 - 0        | No Data                    |  0 - 1                     |  No Data                   |  0 - 1, 0 - 0              |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 1 - 1        | 1 - 11                     |  0 - 1                     |  No Data                   |  0 - 1, 1 - 1              |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 2 - 4        | 1 - 11                     |  2 - 29                    |  No Data                   |  No Data                   |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 5 - 6        | 1 - 11                     |  2 - 29                    |  No Data                   |  5 - 7                     |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 7 - 7        | 1 - 11                     |  2 - 29                    |  No Data                   |  5 - 7, 7 - 9              |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 8 - 9        | 1 - 11                     |  2 - 29                    |  No Data                   |  7 - 9                     |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 10 - 10      | 1 - 11                     |  2 - 29                    |  No Data                   |  No Data                   |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 11 - 11      | 1 - 11                     |  2 - 29                    |  No Data                   |  11 - 19                   |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 12 - 12      | No Data                    |  2 - 29                    |  No Data                   |  11 - 19, 12 - 18          |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 13 - 16      | 13 - 44                    |  2 - 29                    |  No Data                   |  11 - 19, 12 - 18          |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 17 - 18      | 13 - 44, 17 - 24           |  2 - 29                    |  17 - 29                   |  11 - 19, 12 - 18, 17 - 29 |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 19 - 19      | 13 - 44, 17 - 24           |  2 - 29                    |  17 - 29                   |  11 - 19, 17 - 29          |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 20 - 24      | 13 - 44, 17 - 24           |  2 - 29                    |  17 - 29                   |  17 - 29                   |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 25 - 29      | 13 - 44                    |  2 - 29                    |  17 - 29                   |  17 - 29                   |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 30 - 44      | 13 - 44                    |  No Data                   |  No Data                   |  No Data                   |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 55 - 66      | 55 - 66                    |  No Data                   |  No Data                   |  No Data                   |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 88 - 133     | No Data                    |  88 - 133                  |  No Data                   |  No Data                   |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 220 - 240    | No Data                    |  No Data                   |  220 - 240                 |  220 - 240                 |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+
  | 241 - 250    | No Data                    |  No Data                   |  241 - 250                 |  241 - 250                 |
  +--------------+----------------------------+----------------------------+----------------------------+----------------------------+

Getting Started

The internals of Data::Range::Compare::Stream only support dealing with integers by default: This section covers how to Extend Data::Range::Compare::Stream to support your data types.

Creating IPV4 Range Support See:

Data::Range::Compare::Stream::Cookbook::COMPARE_IPV4

Creating DateTime Range Support See:

Data::Range::Compare::Stream::Cookbook::COMPARE_DateTime

OO Methods ^

This section covers the OO Methods in the package.

SEE ALSO ^

Data::Range::Compare::Stream::Cookbook

AUTHOR ^

Michael Shipper

Source-Forge Project ^

As of version 0.001 the Project has been moved to Source-Forge.net

Data Range Compare https://sourceforge.net/projects/data-range-comp/

COPYRIGHT ^

Copyright 2011 Michael Shipper. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: