The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
package Test::Excel;

use strict; use warnings;

use Carp;
use IO::File;
use Readonly;
use Data::Dumper;
use Test::Builder ();
use Scalar::Util 'blessed';
use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::Utility qw(int2col col2int);

require Exporter;

our @ISA    = qw(Exporter);
our @EXPORT = qw(cmp_excel compare_excel column_row letter_to_number number_to_letter cells_within_range);

=head1 NAME

Test::Excel - Interface to test and compare Excel files.

=head1 VERSION

Version 1.24

=head1 AWARD

Test::Excel has been granted the "Famous Software Award" by Download.FamousWhy.com on Wed 17 Nov 2010.

http://download.famouswhy.com/test_excel/

=cut

our $VERSION = '1.24';

$|=1;

our $DEBUG = 0;
Readonly my $ALMOST_ZERO          => 10**-16;
Readonly my $IGNORE               => 1;
Readonly my $SPECIAL_CASE         => 2;
Readonly my $MAX_ERRORS_PER_SHEET => 0;

=head1 DESCRIPTION

This  module  is  meant to be used for testing custom generated  Excel  files, it provides two
functions at  the  moment,  which  is C<cmp_excel>  and C<compare_excel>. These can be used to
compare_excel 2  Excel files to see if they are I<visually> similar. The function C<cmp_excel>
is for testing purpose where function C<compare_excel> can be used as standalone.

=head1 RULE

The  new paramter has been added to both method cmp_excel() and  method compare_excel() called
RULE.  This is optional, however, this would allow to apply your own rule for comparison. This
should  be passed in as a reference to a HASH with the keys sheet, tolerance,  sheet_tolerance
and optionally swap_check, error_limit and message (only relevant to method cmp_excel()).

    +-----------------+---------------------------------------------------------------------+
    | Key             | Description                                                         |
    +-----------------+---------------------------------------------------------------------+
    | sheet           | "|" seperated sheet names.                                          |
    | tolerance       | Number. Apply to all NUMBERS except on 'sheet'/'spec'. e.g. 10**-12 |
    | sheet_tolerance | Number. Apply to sheets/ranges in the spec. e.g. 0.20               |
    | spec            | Path to the specification file.                                     |
    | swap_check      | Number (optional) (1 or 0). Row swapping check. Default is 0.       |
    | error_limit     | Number (optional). Limit error per sheet. Default is 0.             |
    | message         | String (optional). Only required when calling method cmp_excel().   |
    +-----------------+---------------------------------------------------------------------+

=head1 What is "Visually" Similar?

This module uses the C<Spreadsheet::ParseExcel> module to parse Excel files, then compares the
parsed  data structure for differences. We ignore cetain components of the Excel file, such as
embedded fonts,  images, forms and annotations, and focus entirely on the layout of each Excel
page instead.  Future versions will likely support font and image comparisons, but not in this
initial release.

=head1 METHODS

=head2 cmp_excel($got, $exp, { ...rule... })

This function  will  tell  you  whether the two Excel files are "visually" different, ignoring
differences in  embedded fonts/images and metadata. Both $got and $exp can be either instances
of Spreadsheet::ParseExcel / file path (which is in turn passed to the Spreadsheet::ParseExcel
constructor). This one is for use in TEST MODE.

    use strict; use warnings;
    use Test::More no_plan => 1;
    use Test::Excel;

    cmp_excel('foo.xls', 'bar.xls', { message => 'EXCELSs are identical.' });

    # or

    my $foo = Spreadsheet::ParseExcel::Workbook->Parse('foo.xls');
    my $bar = Spreadsheet::ParseExcel::Workbook->Parse('bar.xls');
    cmp_excel($foo, $bar, { message => 'EXCELs are identical.' });

=cut

sub cmp_excel
{
    my $got  = shift;
    my $exp  = shift;
    my $rule = shift;

    _validate_rule($rule);
    $rule->{test} = 1;
    compare_excel($got, $exp, $rule);
}

=head2 compare_excel($got, $exp, { ...rule... })

This function  will  tell  you  whether the two Excel files are "visually" different, ignoring
differences in  embedded fonts/images and metadata. Both $got and $exp can be either instances
of Spreadsheet::ParseExcel / file path (which is in turn passed to the Spreadsheet::ParseExcel
constructor). This one is for use in STANDALONE MODE.

    use strict; use warnings;
    use Test::Excel;

    print "EXCELs are identical.\n"
        if compare_excel("foo.xls", "bar.xls");

=cut

sub compare_excel
{
    my $got  = shift;
    my $exp  = shift;
    my $rule = shift;

    croak("ERROR: Unable to locate file [$got].\n") unless (-f $got);
    croak("ERROR: Unable to locate file [$exp].\n") unless (-f $exp);
    _log_message("INFO: Excel comparison [$got] [$exp]\n") if $DEBUG;

    unless (blessed($got) && $got->isa('Spreadsheet::ParseExcel::WorkBook'))
    {
        $got = Spreadsheet::ParseExcel::Workbook->Parse($got)
            || croak("ERROR: Couldn't create Spreadsheet::ParseExcel::WorkBook instance with: [$got]\n");
    }
    unless (blessed($exp) && $exp->isa('Spreadsheet::ParseExcel::WorkBook'))
    {
        $exp = Spreadsheet::ParseExcel::Workbook->Parse($exp)
            || croak("ERROR: Couldn't create Spreadsheet::ParseExcel::WorkBook instance with: [$exp]\n");
    }

    my (@gotWorkSheets, @expWorkSheets);
    my ($message, $status, $error, $error_limit, $spec, $test, $TESTER);

    $status = 1;
    $test = $rule->{test}                if ((ref($rule) eq 'HASH') && exists($rule->{test}));
    _validate_rule($rule)                unless (defined($test) && ($test));
    $spec = parse($rule->{spec})         if exists($rule->{spec});
    $error_limit = $rule->{error_limit}  if exists($rule->{error_limit});
    $message     = $rule->{message}      if exists($rule->{message});
    $error_limit = $MAX_ERRORS_PER_SHEET unless defined $error_limit;

    @gotWorkSheets = $got->worksheets();
    @expWorkSheets = $exp->worksheets();

    $TESTER = Test::Builder->new if (defined($test) && ($test));
    if (scalar(@gotWorkSheets) != scalar(@expWorkSheets))
    {
        $error = "ERROR: Sheets count mismatch. ";
        $error .= "Got: [".scalar(@gotWorkSheets)."] exp: [".scalar(@expWorkSheets)."]\n";
        _log_message($error);
        if (defined($test) && ($test))
        {
            $TESTER->ok(0, $message);
            return;
        }
        return 0;
    }

    my ($i, @sheets);
    @sheets = split(/\|/,$rule->{sheet})
        if (exists($rule->{sheet}) && defined($rule->{sheet}));

    for ($i=0; $i<scalar(@gotWorkSheets); $i++)
    {
        my ($error_on_sheet);
        my ($gotWorkSheet, $expWorkSheet);
        my ($gotSheetName, $expSheetName);
        my ($gotRowMin, $gotRowMax, $gotColMin, $gotColMax);
        my ($expRowMin, $expRowMax, $expColMin, $expColMax);

        $error_on_sheet = 0;
        $gotWorkSheet   = $gotWorkSheets[$i];
        $expWorkSheet   = $expWorkSheets[$i];
        $gotSheetName   = $gotWorkSheet->get_name();
        $expSheetName   = $expWorkSheet->get_name();
        if (uc($gotSheetName) ne uc($expSheetName))
        {
            $error = "ERROR: Sheetname mismatch. Got: [$gotSheetName] exp: [$expSheetName].\n";
            _log_message($error);
            if (defined($test) && ($test))
            {
                $TESTER->ok(0, $message);
                return;
            }
            return 0;
        }

        ($gotRowMin, $gotRowMax) = $gotWorkSheet->row_range();
        ($gotColMin, $gotColMax) = $gotWorkSheet->col_range();
        ($expRowMin, $expRowMax) = $expWorkSheet->row_range();
        ($expColMin, $expColMax) = $expWorkSheet->col_range();

        if ($DEBUG > 1)
        {
            _log_message("\n");
            _log_message("INFO:[$gotSheetName]:[$gotRowMin][$gotColMin]:[$gotRowMax][$gotColMax]");
            _log_message("INFO:[$expSheetName]:[$expRowMin][$expColMin]:[$expRowMax][$expColMax]");
        }

        if (defined($gotRowMax) && defined($expRowMax) && ($gotRowMax != $expRowMax))
        {
            $error  = "\nERROR: Max row counts mismatch in sheet [$gotSheetName]. ";
            $error .= "Got[$gotRowMax] Expected: [$expRowMax]\n";
            _log_message($error);
            if (defined($test) && ($test))
            {
                $TESTER->ok(0, $message);
                return;
            }
            return 0;
        }

        if (defined($gotColMax) &&  defined($expColMax) && ($gotColMax != $expColMax))
        {
            $error  = "\nERROR: Max column counts mismatch in sheet [$gotSheetName]. ";
            $error .= "Got[$gotColMax] Expected: [$expColMax]\n";
            _log_message($error);
            if (defined($test) && ($test))
            {
                $TESTER->ok(0, $message);
                return;
            }
            return 0;
        }

        my ($row, $col, $swap);
        for ($row=$gotRowMin; $row<=$gotRowMax; $row++)
        {
            for ($col=$gotColMin; $col<=$gotColMax; $col++)
            {
                my ($gotData, $expData, $error);
                $gotData = $gotWorkSheet->{Cells}[$row][$col]->{Val};
                $expData = $expWorkSheet->{Cells}[$row][$col]->{Val};

                next if ( defined($spec)
                          &&
                          exists($spec->{uc($gotSheetName)}->{$col+1}->{$row+1})
                          &&
                          ($spec->{uc($gotSheetName)}->{$col+1}->{$row+1} == $IGNORE) );

                if (defined($gotData) && defined($expData))
                {
                    if (($gotData =~ /^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$/)
                        &&
                        ($expData =~ /^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$/))
                    {
                        if (($gotData < $ALMOST_ZERO) && ($expData < $ALMOST_ZERO))
                        {
                            # Can be treated as the same.
                            next;
                        }
                        else
                        {
                            if (defined($rule))
                            {
                                my ($compare_with, $difference);
                                $difference = abs($expData - $gotData) / abs($expData);

                                if ( ( defined($spec)
                                       &&
                                       exists($spec->{uc($gotSheetName)}->{$col+1}->{$row+1})
                                       &&
                                       ($spec->{uc($gotSheetName)}->{$col+1}->{$row+1} == $SPECIAL_CASE)
                                     )
                                     ||
                                     ( scalar(@sheets)
                                       &&
                                       grep(/$gotSheetName/,@sheets)
                                     ) )
                                {
                                    print "\nINFO: [NUMBER]:[$gotSheetName]:[SPC][".($row+1)."][".($col+1)."]:[$gotData][$expData] ... "
                                        if $DEBUG > 1;
                                    $compare_with = $rule->{sheet_tolerance};
                                }
                                else
                                {
                                    print "\nINFO: [NUMBER]:[$gotSheetName]:[STD][".($row+1)."][".($col+1)."]:[$gotData][$expData] ... "
                                        if $DEBUG > 1;
                                    $compare_with = $rule->{tolerance};
                                }

                                if ($compare_with < $difference)
                                {
                                    print "[FAIL]" if $DEBUG > 1;
                                    $difference = sprintf("%02f", $difference);
                                    $status = 0;
                                }
                                else
                                {
                                    $status = 1;
                                    print "[PASS]" if $DEBUG > 1;
                                }
                            }
                            else
                            {
                                print "\nINFO: [NUMBER]:[$gotSheetName]:[N/A][".($row+1)."][".($col+1)."]:[$gotData][$expData] ... "
                                    if $DEBUG > 1;
                                if ($expData != $gotData)
                                {
                                    print "[FAIL]" if $DEBUG > 1;
                                    $status = 0;
                                }
                                else
                                {
                                    $status = 1;
                                    print "[PASS]" if $DEBUG > 1;
                                }
                            }
                        }
                    }
                    else
                    {
                        if (uc($gotData) ne uc($expData))
                        {
                            _log_message("INFO: [STRING]:[$gotSheetName]:[$expData][$gotData] ... [FAIL]");
                            $status = 0;
                        }
                        else
                        {
                            $status = 1;
                            _log_message("INFO: [STRING]:[$gotSheetName]:[STD][".($row+1)."][".($col+1)."]:[$gotData][$expData] ... [PASS]")
                                if $DEBUG > 1;
                        }
                    }

                    if (exists($rule->{swap_check}) && defined($rule->{swap_check}) && ($rule->{swap_check}))
                    {
                        if ($status == 0)
                        {
                            $error_on_sheet++;
                            push @{$swap->{exp}->{number_to_letter($col-1)}}, $expData;
                            push @{$swap->{got}->{number_to_letter($col-1)}}, $gotData;

                            if (($error_on_sheet >= $error_limit) && ($error_on_sheet % 2 == 0) && !_is_swapping($swap))
                            {
                                _log_message("ERROR: Max error per sheet reached.[$error_on_sheet]\n");
                                if (defined($test) && ($test))
                                {
                                    $TESTER->ok($status, $message);
                                    return;
                                }
                                return $status;
                            }
                        }
                    }
                }
            } # col

        if (($error_on_sheet >= $error_limit) && ($error_on_sheet % 2 == 0) && !_is_swapping($swap))
        {
            if (defined($test) && ($test))
            {
                $TESTER->ok($status, $message);
                return;
            }
            return $status;
        }

        } # row

        if (exists($rule->{swap_check}) && defined($rule->{swap_check}) && ($rule->{swap_check}))
        {
            if (($error_on_sheet > 0) && _is_swapping($swap))
            {
                print "\n\nWARN: SWAP OCCURRED.\n\n";
                $status = 1;
            }
        }
        print "INFO: [$gotSheetName]: ..... [OK].\n" if $DEBUG == 1;
    } # sheet


    if (defined($test) && ($test))
    {
        $TESTER->ok($status, $message);
        return;
    }
    return $status;
}

=head2 parse()

This method parse specification file provided by the user.  It  expects spec  file  to be in a
format mentioned below. Key and values are space seperated.

    sheet       Sheet1
    range       A3:B14
    range       B5:C5
    sheet       Sheet2
    range       A1:B2
    ignorerange B3:B8
    
They are grouped as sheet followed by one or more ranges.

    use strict; use warnings;
    use Test::Excel;

    my $data = Test::Excel::parse('spec-1.txt');

=cut

sub parse
{
    my $spec = shift;
    return unless defined $spec;

    croak("ERROR: Unable to locate spec file [$spec].\n")
        unless (-f $spec);

    my ($handle, $row, $sheet, $cells, $data);
    $handle = IO::File->new($spec)
        || croak("ERROR: Couldn't open file [$spec][$!].\n");

    $sheet = undef;
    $data  = undef;
    while ($row = <$handle>)
    {
        chomp($row);
        next unless $row =~ /\w/;
        next if $row =~ /^#/;

        if ($row =~ /^sheet\s+(.*)/i)
        {
            $sheet = $1;
        }
        elsif (defined($sheet) && ($row =~ /^range\s+(.*)/i))
        {
            $cells = Test::Excel::cells_within_range($1);
            foreach (@{$cells})
            {
                $data->{uc($sheet)}->{$_->{col}+1}->{$_->{row}} = $SPECIAL_CASE;
            }
        }
        elsif (defined($sheet) && ($row =~ /^ignorerange\s+(.*)/i))
        {
            $cells = Test::Excel::cells_within_range($1);
            foreach (@{$cells})
            {
                $data->{uc($sheet)}->{$_->{col}+1}->{$_->{row}} = $IGNORE;
            }
        }
        else
        {
            croak("ERROR: Invalid format data [$row] found in spec file.\n");
        }
    }
    $handle->close();

    return $data;
}

=head2 column_row()

This method accepts a cell address and returns column and row address as a list.

    use strict; use warnings;
    use Test::Excel;

    my $cell = 'A23';
    my ($col, $row) = Test::Excel::column_row($cell);

=cut

sub column_row
{
    my $cell = shift;
    return unless defined $cell;

    croak("ERROR: Invalid cell address [$cell].\n")
        unless ($cell =~ /([A-Za-z]+)(\d+)/);

    return ($1, $2);
}

=head2 letter_to_number()

This  method accepts a letter and returns back its equivalent number. This simply wraps around
Spreadsheet::ParseExcel::Utility::col2int().

    use strict; use warnings;
    use Test::Excel;

    my $number = Test::Excel::letter_to_number('AB');

=cut

sub letter_to_number
{
    my $letter = shift;
    return col2int($letter);
}

=head2 number_to_letter()

This  number  accepts  a  number  and  returns its equivalent letter. This simply wraps around 
Spreadsheet::ParseExcel::Utility::int2col().

    use strict; use warnings;
    use Test::Excel;

    my $letter = Test::Excel::number_to_letter(27);

=cut

sub number_to_letter
{
    my $number = shift;
    return int2col($number);
}

=head2 cells_within_range()

This method accepts address range and returns all cell address within the range.

    use strict; use warnings;
    use Test::Excel;

    my $range = 'A1:B3';
    my $cells = Test::Excel::cells_within_range($range);

=cut

sub cells_within_range
{
    my $range = shift;
    return unless defined $range;

    croak("ERROR: Invalid range [$range].\n")
        unless ($range =~ /(\w+\d+):(\w+\d+)/);

    my ($from, $to, $row, $col, $cells);
    my ($min_row, $min_col, $max_row, $max_col);

    $from = $1; $to = $2;
    ($min_col, $min_row) = column_row($from);
    ($max_col, $max_row) = column_row($to);
    $min_col = letter_to_number($min_col);
    $max_col = letter_to_number($max_col);

    for ($row = $min_row; $row <= $max_row; $row++)
    {
        for ($col = $min_col; $col <= $max_col; $col++)
        {
            push @{$cells}, { col => $col, row => $row };
        }
    }

    return $cells;
}

sub _is_swapping
{
    my $data = shift;
    return 0 unless defined $data;

    foreach (keys %{$data->{exp}})
    {
        my $exp = $data->{exp}->{$_};
        my $out = $data->{out}->{$_};

        return 0 if grep(/$exp->[0]/,@{$out});
    }
    return 1;
}

sub _log_message
{
    my $message = shift;
    return unless defined($message);

    print {*STDOUT} "\n".$message;
}

sub _validate_rule
{
    my $rule = shift;
    return unless defined $rule;

    croak("ERROR: Invalid RULE definitions. It has to be reference to a HASH.\n")
        unless (ref($rule) eq 'HASH');

    my ($keys, $valid);
    $keys = scalar(keys(%{$rule}));
    return if (($keys == 1) && exists($rule->{message}));

    croak("ERROR: Rule has more than 8 keys defined.\n")
        if $keys > 8;

    $valid = {'message'         => 1,
              'sheet'           => 2,
              'spec'            => 3,
              'tolerance'       => 4,
              'sheet_tolerance' => 5,
              'error_limit'     => 6,
              'swap_check'      => 7,
              'test'            => 8,};
    foreach (keys %{$rule})
    {
        croak("ERROR: Invalid key found in the rule definitions.\n")
            unless exists($valid->{$_});
    }

    if ((exists($rule->{spec}) && defined($rule->{spec}))
        ||
        (exists($rule->{sheet}) && defined($rule->{sheet})))
    {
        croak("ERROR: Missing key sheet_tolerance in the rule definitions.\n")
            unless (exists($rule->{sheet_tolerance}) && defined($rule->{sheet_tolerance}));
        croak("ERROR: Missing key tolerance in the rule definitions.\n")
            unless (exists($rule->{tolerance}) && defined($rule->{tolerance}));
    }
    else
    {
        if ( (exists($rule->{sheet_tolerance}) && defined($rule->{sheet_tolerance}))
             ||
             (exists($rule->{tolerance}) && defined($rule->{tolerance})) )
        {
            croak("ERROR: Missing key sheet/spec in the rule definitions.\n")
                unless ((exists($rule->{sheet}) && defined($rule->{sheet}))
                        ||
                        (exists($rule->{spec}) && defined($rule->{spec})));
        }
    }
}

=head1 DEBUG

Debug mode can be turned on or off by setting package variable $DEBUG, for example,

   $Test::Excel::DEBUG = 1;

You can set it anything greater than 1 for fine grained debug information. i.e.

   $Test::Excel::DEBUG = 2;

=head1 NOTES

It should be clearly noted that this module does not claim to provide a fool-proof  comparison
of  generated  Excels.  In fact there are still a number of ways in which I want to expand the 
 existing comparison functionality. This module I<is> actively being developed for a number of 
projects  I  am  currently  working on,  so  expect  many  changes  to happen. If you have any 
suggestions/comments/questions please feel free to contact me.

=head1 CAVEATS

Testing  of  large  Excels  can take a long time, this is because, well, we are doing a lot of 
computation.  In  fact,  this  module  test suite includes tests against several large Excels, 
however I am not including those in this distibution for obvious reasons.

=head1 BUGS

None  that I am aware of. Of course, if you find a bug, let me know, and I will be sure to fix
it.  This  is still a very early version, so it is always possible that I have just "gotten it 
wrong" in some places.

=head1 SEE ALSO

=over 4

=item C<Spreadsheet::ParseExcel> - I could not have written this without this module.

=back

=head1 ACKNOWLEDGEMENTS

=over 4

=item John McNamara (author of Spreadsheet::ParseExcel).

=item Kawai Takanori (author of Spreadsheet::ParseExcel::Utility).

=item Stevan Little (author of Test::PDF).

=back

=head1 AUTHOR

Mohammad S Anwar, E<lt>mohammad.anwar@yahoo.comE<gt>

=head1 COPYRIGHT AND LICENSE

Copyright 2010-2011 by Mohammad S Anwar.

This  library  is free software; you can redistribute it and/or modify it under the same terms
as Perl itself.

=head1 DISCLAIMER

This  program  is  distributed  in  the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

=cut

1;
__END__