Desmond Lee > OpenOffice-Parse-SXC > OpenOffice::Parse::SXC

Download:
OpenOffice-Parse-SXC-0.03.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  1
Open  0
View/Report Bugs
Module Version: 0.03   Source  

NAME ^

OpenOffice::Parse::SXC - Perl extension for parsing OpenOffice SXC files

SYNOPSIS ^

  use OpenOffice::Parse::SXC qw( parse_sxc );

  # Non-OO way:

  my @rows      = parse_sxc( "file.sxc" );
  for( @rows ) {
    print join(",", $_ ),"\n";
  }

  # OO way:

  package MyDataHandler;        # Set up a handler object
  sub new {
    my $type            = shift;
    my $self            = {};
    bless $self, $type;
    return $self;
  }
  sub row {
    my $self            = shift;
    my $SXC             = shift;
    my $row_data        = shift;
    print $self->{worksheet},": ",join(",", $_),"\n";   # Simple csv values printed...
  }
  sub worksheet {
    my $self            = shift;
    my $SXC             = shift;
    my $worksheet       = shift;
    $self->{worksheet}  = $worksheet;
  }
  sub workbook {
    my $self            = shift;
    my $SXC             = shift;
    my $workbook        = shift || "unknown_workbook";
  }
  1;

  package Main;

  my $SXC       = OpenOffice::Parse::SXC->new( OPTIONS );
  $SXC->set_data_handler( MyDataHandler->new );
  $SXC->parse_file( "file.sxc" );

DESCRIPTION ^

OpenOffice::Parse::SXC parses an SXC file (OpenOffice spreadsheet) and passes data back through a callback object that you register with the SXC object.

The major benefit of being able to read directly from an OpenOffice spreadsheet is that it allows SXC files to be directly used as a development tool.

The data returned contains no formatting or formula information, only what text is displayed in the spreadsheet.

This module requires XML::Parser and the compression utility unzip to be installed.

DATA CONVERSIONS ^

The data that this module will provide you with is exactly the same as what you would see in the OpenOffice application. This could be different than what you entered. For example, this module would provide the results of a function, not the function itself. If you enter 19.95 into a cell, and format that cell as a currency type, you would see $19.95 (for example), and that is what you would get using this module to parse the spreadsheet.

EXPORT ^

None by default.

EXPORT_OK ^

parse_sxc SXC_FILENAME:

Parses an SXC file returning a list of lists containing the cell data.

csv_quote STRING:

Quotes a string in "CSV format". The transformation converts each double-quote to two double-quotes, then double-quoting the entire string. All newlines are removed!

dump_sxc_file SXC_FILENAME:

Prints out a Dumper'ed version of the entire SXC XML tree. Used for debugging.

PUBLIC METHODS ^

new OPTIONS

Create a new SXC object.

parse FILEHANDLE

Parse file FILENAME. This method calls parse_file().

parse_file SXC_FILENAME

Parse the data in filehandle SXC_FILEHANDLE.

get_current_worksheet_name

Returns the name of the current worksheet. This is only useful to the DATA HANDLER object (ie: during processing)

get_option OPTION_NAME

Gets an option.

set_options OPTION_NAME => VALUE, ...

Set one or more options

set_data_handler

Sets the DATA HANDLER. See the synopsis, and the DATA HANDLER section for details.

get_data_handler

Gets the DATA HANDLER.

OPTIONS ^

The following options can be used (in new() or set_options()):

worksheets => [ LIST_OF_WORKSHEETS_TO_PROCESS ]

An SXC 'workbook' consists of multiple 'worksheets', (internally refered to as tables) You can specify which worksheets you would like to process, or ALL of them if this option is not used.

no_trim => 1

If NOT specified, the trailing empty cells in each row will be spliced out.

DATA HANDLER ^

The DATA HANDLER is what the SXC object calls upon do do work while it parses an SXC file. It expects the DATA HANDLER object to implement the following methods:

row:

Handle row data

worksheet:

Called each time a new worksheet is encountered. Note: there is no callback for when a worksheet ends.

workbook:

Called each time a new workbook is encountered. (This helps when the same SXC object is used to process multiple files. As with worksheet(), there is no callback for the end of a workbook.

Each method gets the SXC object as the first argument, and the data as the second argument: worksheet gets the name of the worksheet, workbook gets the filename of the SXC file, and row receives a list reference to all the cells in that row.

The interesting callback is the row() function, and often it's the only function of any interest. If you want to avoid creating a class and just want to implement a row() callback, you can do something like this:

  sub Whatever::row {
    my($self, $SXC, $row_data) = @_;
    print join(",", map { csv_quote( $_ ) } @$row_data ),"\n";
  }
  sub Whatever::worksheet {}
  sub Whatever::workbook {}
  $SXC->set_data_handler( bless {}, "Whatever" );
  $SXC->parse_file( ... );

AUTHOR ^

Desmond Lee <deslee@shaw.ca>

SEE ALSO ^

sxc2csv.

syntax highlighting: