
Data::Range::Compare::Stream::Iterator::File::MergeSortAsc - On Disk Merge Sort for really big data sets!

use Data::Range::Compare::Stream;
use Data::Range::Compare::Stream::Iterator::File;
use Data::Range::Compare::Stream::Iterator::File::MergeSortAsc;
my $iterator=Data::Range::Compare::Stream::Iterator::File::MergeSortAsc->new(
filename=>'somefile.csv',
);
while($iterator->has_next) {
my $next_range=$iterator->get_next;
print $next_range,"\n";
}

This module Extends Data::Range::Compare::Stream::Iterator::Base and provides an on disk merge sort for objects that implement or extend Data::Range::Compare::Stream::Iterator::Base.
Instance Constructor, all arguments are optional
At least one of the following Argument(s) is required:
filename=>'source_file.csv'
# the file is assumed to be an absolute or relative path to the file location.
file_list=>[]
# An array ref of file names in absolute or relative paths
iterator_list=>[]
# an array ref of objects that implement or extend Data::Range::Compare::Stream::Iterator::Base
Optional Arguments:
auto_prepare=>0|1
# Default: 0, If set to 1 sort operations happen on object creation.
unlink_result_file=>1|0
# Default: 1, If set to 0 the sorted result file will not be deleted
bucket_size=>4000
# sets the number of ranges to be pre-sorted
# 2 buckets are created.. so the number of objects loaded into is bucked_size * 2
NEW_ITERATOR_FROM=>'Data::Range::Compare::Stream::Iterator::File'
# sets the file iterator object to be used when loading spooled files for merging
# make sure you load or require the object class being passed in as an argument!
NEW_ARRAY_ITERATOR_FROM=>'Data::Range::Compare::Stream::Iterator::Array'
# sets the array iterator class
NEW_FROM=>'Data::Range::Compare::Stream',
# depricated but still supportd, see factory_instance.
# sets the object class new ranges will be created from
# This argument is passed to objects being constructed from: NEW_ITERATOR_FROM
factory_instance =>$obj
# defines the object that implements the $obj->factory($start,$end,$data).
# new ranges are constructed from the factory interfcae. If a factory interface
# is not created an instance of Data::Range::Compare::Stream is assumed.
parse_line=>undef|code_ref
# Default: undef, Sets the code ref to be used when parsing a line
# if not set the default internals will be used
# This argument is passed to objects being constructed from: NEW_ITERATOR_FROM
result_to_line=>undef|code_ref
# Default: undef, Sets the code ref used to convert a result to a line that can be parsed
# if not set the default internals will be used
# This argument is passed to objects being constructed from: NEW_ITERATOR_FROM
sort_func=>undef|code ref
# Default: undef, Sets the code ref used for comparing objects in the sort process
# if not set the default internals are used.
tmpdir=>undef|'/some/folder'
# tmpdir is defined its value is passed to to File::Temp->new(DIR=>$self->{tmpdir});
Returns the Class that new Range objects are constructed from.
$class will contain the name of the class new file Iterators are to be constructed from.
$class will contain the name of the class new array Iterators are constructed from.
Returns true when there are more rows to fetch.
Returns the next $result from the given source file.
Given a $result from $iterator->get_next, this interface converts the $range object into a line that can be parsed by $iterator->parse_line($line). Think of this function as a data serializer for range objects generated by an $iterator object. When overloading this function or using a call back make sure result_to_line can be parsed by parse_line.
sub result_to_line {
my ($self,$result)=@_;
return $self->{result_to_line}->($result) if defined($self->{result_to_line});
my $range=$result->get_common;
my $line=$range->range_start_to_string.' '.$range->range_end_to_string."\n";
return $line;
}
Given a $line returns the arguments required to construct an object that extends or implements Data::Range::Compare::Stream. When overloading or passing in constructor arguments that provide a call back make sure result_to_line produces the expected line parse_line expects.
sub parse_line {
my ($self,$line)=@_;
return $self->{parse_line}->($line) if defined($self->{parse_line});
chomp $line;
[split /\s+/,$line];
}
This is the internal object compare function used when sorting.
sub sort_method {
my ($self,$left_range,$right_range)=@_;
return $self->{sort_func}->($left_range,$right_range) if $self->{sort_func};
my $cmp=sort_in_consolidate_order_asc($left_range->get_common,$right_range->get_common);
return $cmp;
}

Data::Range::Compare::Stream::Cookbook

Michael Shipper

As of version 0.001 the Project has been moved to Source-Forge.net
Data Range Compare https://sourceforge.net/projects/data-range-comp/

Copyright 2011 Michael Shipper. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.