The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=pod

=head1 NAME

Data::Range::Compare::Stream::Iterator::File::MergeSortAsc - On Disk Merge Sort for really big data sets!

=head1 SYNOPSIS

  use Data::Range::Compare::Stream;
  use Data::Range::Compare::Stream::Iterator::File;
  use Data::Range::Compare::Stream::Iterator::File::MergeSortAsc;

  my $iterator=Data::Range::Compare::Stream::Iterator::File::MergeSortAsc->new(
    filename=>'somefile.csv',
  );

  while($iterator->has_next) {
    my $next_range=$iterator->get_next;
    print $next_range,"\n";
  }

=head1 DESCRIPTION

This module Extends Data::Range::Compare::Stream::Iterator::Base and provides an on disk merge sort for objects that implement or extend Data::Range::Compare::Stream::Iterator::Base.

=head2 OO Methods

=over 3

=item * my $iterator=new Data::Range::Compare::Stream::Iterator::File::MergeSortAsc(key=>value);

Instance Constructor, all arguments are optional
  
At least one of the following Argument(s) is required:

  filename=>'source_file.csv'  
    # the file is assumed to be an absolute or relative path to the file location.

  file_list=>[]
    # An array ref of file names in absolute or relative paths
      
  iterator_list=>[]
   # an array ref of objects that implement or extend Data::Range::Compare::Stream::Iterator::Base

Optional Arguments:
   
   auto_prepare=>0|1
     # Default: 0, If set to 1 sort operations happen on object creation.

   unlink_result_file=>1|0
     # Default: 1, If set to 0 the sorted result file will not be deleted

   bucket_size=>4000
     # sets the number of ranges to be pre-sorted
     # 2 buckets are created.. so the number of objects loaded into is bucked_size * 2

   NEW_ITERATOR_FROM=>'Data::Range::Compare::Stream::Iterator::File'
     # sets the file iterator object to be used when loading spooled files for merging
     # make sure you load or require the object class being passed in as an argument!

   NEW_ARRAY_ITERATOR_FROM=>'Data::Range::Compare::Stream::Iterator::Array'
     # sets the array iterator class

   NEW_FROM=>'Data::Range::Compare::Stream',
     # depricated but still supportd, see factory_instance.
     # sets the object class new ranges will be created from
     # This argument is passed to objects being constructed from: NEW_ITERATOR_FROM

   factory_instance =>$obj
     # defines the object that implements the $obj->factory($start,$end,$data).
     # new ranges are constructed from the factory interfcae.  If a factory interface
     # is not created an instance of Data::Range::Compare::Stream is assumed.


   parse_line=>undef|code_ref
     # Default: undef, Sets the code ref to be used when parsing a line
     # if not set the default internals will be used
     # This argument is passed to objects being constructed from: NEW_ITERATOR_FROM

   result_to_line=>undef|code_ref
     # Default: undef, Sets the code ref used to convert a result to a line that can be parsed
     # if not set the default internals will be used
     # This argument is passed to objects being constructed from: NEW_ITERATOR_FROM

   sort_func=>undef|code ref
     # Default: undef, Sets the code ref used for comparing objects in the sort process
     # if not set the default internals are used.

  tmpdir=>undef|'/some/folder'
      # tmpdir is defined its value is passed to to File::Temp->new(DIR=>$self->{tmpdir});


=item * my $class=$iterator->NEW_FROM;

Returns the Class that new Range objects are constructed from.

=item * my $class=$iterator->NEW_ITERATOR_FROM;

$class will contain the name of the class new file Iterators are to be constructed from.

=item * my $class=$iterator->NEW_ARRAY_ITERATOR_FROM;

$class will contain the name of the class new array Iterators are constructed from.

=item * while($iterator->has_next) { ... }

Returns true when there are more rows to fetch.

=item * my $result=$iterator->get_next;

Returns the next $result from the given source file.

=item * my $line=$iterator->result_to_line($range);

Given a $result from $iterator->get_next, this interface converts the $range object into a line that can be parsed by $iterator->parse_line($line).  Think of this function as a data serializer for range objects generated by an $iterator object.  When overloading this function or using a call back make sure result_to_line can be parsed by parse_line.

  sub result_to_line {
    my ($self,$result)=@_;
    return $self->{result_to_line}->($result) if defined($self->{result_to_line});

    my $range=$result->get_common;
    my $line=$range->range_start_to_string.' '.$range->range_end_to_string."\n";
    return $line;
  }

=item *  my $ref=$iterator->parse_line($line);

Given a $line returns the arguments required to construct an object that extends or implements Data::Range::Compare::Stream.  When overloading or passing in constructor arguments that provide a call back make sure result_to_line produces the expected line parse_line expects.

  sub parse_line {
    my ($self,$line)=@_;
    return $self->{parse_line}->($line) if defined($self->{parse_line});
    chomp $line;
    [split /\s+/,$line];
  }

=item * my $cmp=$iterator->sort_method($left_range,$right_range);

This is the internal object compare function used when sorting.

  sub sort_method {
    my ($self,$left_range,$right_range)=@_;
    
    return $self->{sort_func}->($left_range,$right_range) if $self->{sort_func};
    my $cmp=sort_in_consolidate_order_asc($left_range->get_common,$right_range->get_common);

    return $cmp;
  }


=back

=head1 SEE ALSO

Data::Range::Compare::Stream::Cookbook

=head1 AUTHOR

Michael Shipper

=head1 Source-Forge Project

As of version 0.001 the Project has been moved to Source-Forge.net

L<Data Range Compare|https://sourceforge.net/projects/data-range-comp/>
L<https://sourceforge.net/projects/data-range-comp/>

=head1 COPYRIGHT

Copyright 2011 Michael Shipper.  All rights reserved.

This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.

=cut