package XML::XPathScript;
use strict;
use warnings;
use Carp;
# $Revision$ - $Date$
=pod
=head1 NAME
XML::XPathScript - a Perl framework for XML stylesheets
=head1 SYNOPSIS
use XML::XPathScript;
# the short way
my $xps = XML::XPathScript->new;
my $transformed = $xps->transform( $xml, $stylesheet );
# having the output piped to STDOUT directly
my $xps = XML::XPathScript->new( xml => $xml, stylesheet => $stylesheet );
$xps->process;
# caching the compiled stylesheet for reuse and
# outputting to multiple files
my $xps = XML::XPathScript->new( stylesheetfile => $filename )
foreach my $xml (@xmlfiles) {
my $transformed = $xps->transform( $xml );
# do stuff with $transformed ...
};
# Making extra variables available to the stylesheet dialect:
my $xps = XML::XPathScript->new;
$xps->compile( qw/ $foo $bar / );
# in stylesheet, $foo will be set to 'a'
# and $bar to 'b'
$xps->transform( $xml, $stylesheet, [ 'a', 'b' ] );
=head1 DESCRIPTION
XPathScript is a stylesheet language similar in many ways to XSLT (in
concept, not in appearance), for transforming XML from one format to
another (possibly HTML, but XPathScript also shines for non-XML-like
output).
Like XSLT, XPathScript offers a dialect to mix verbatim portions of
documents and code. Also like XSLT, it leverages the powerful
``templates/apply-templates'' and ``cascading stylesheets'' design
patterns, that greatly simplify the design of stylesheets for
programmers. The availability of the I<XPath> query language inside
stylesheets promotes the use of a purely document-dependent,
side-effect-free coding style. But unlike XSLT which uses its own
dedicated control language with an XML-compliant syntax, XPathScript
uses Perl which is terse and highly extendable.
The result of the merge is an extremely powerful tool for rendering
complex XML documents into other formats. Stylesheets written in
XPathScript are very easy to create, extend and reuse, even if they
manage hundreds of different XML tags.
=head1 STYLESHEET WRITER DOCUMENTATION
If you are interested to write stylesheets, refers to the
B<XML::XPathScript::Stylesheet> manpage. You might also want
to take a peek at the manpage of B<xpathscript>, a program
bundled with this module to perform XPathScript transformations
via the command line.
=head1 STYLESHEET UTILITY METHODS
Those methods are meants to be used from within a stylesheet.
=head2 current
$xps = XML::XPathScript->current
This class method returns
the stylesheet object currently being applied. This can be called from
anywhere within the stylesheet, except a BEGIN or END block or
similar. B<Beware though> that using the return value for altering (as
opposed to reading) stuff from anywhere except the stylesheet's top
level is unwise.
=cut
sub current {
croak 'Wrong context for calling current()'
unless defined $XML::XPathScript::current;
return $XML::XPathScript::current;
}
=head2 interpolation
$interpolate = $XML::XPathScript::current->interpolation
$interpolate = $XML::XPathScript::current->interpolation( $boolean )
Gets (first call form) or sets (second form) the XPath interpolation
boolean flag. If true, values set in C< pre > and C< post >
may contain expressions within curly braces, that will be
interpreted as XPath expressions and substituted in place.
For example, when interpolation is on, the following code
$template->set( link => { pre => '<a href="{@url}">',
post => '</a>' } );
is enough for rendering a C<< <link> >> element as an HTML hyperlink.
The interpolation-less version is slightly more complex as it requires a
C<testcode>:
sub link_testcode {
my ($node, $t) = @_;
my $url = $node->findvalue('@url');
$t->set({ pre => "<a href='$url'>",
post => "</a>" });
return DO_SELF_AND_KIDS();
};
Interpolation is on by default.
=cut
sub interpolation {
my $self = shift;
return $self->interpolating( @_ );
}
sub interpolating {
my $self=shift;
if ( @_ ) {
$self->processor->set_interpolation(
$self->{interpolating} = shift
);
}
return $self->{interpolating} || 0;
}
=head2 interpolation_regex
$regex = $XML::XPathScript::current->interpolation_regex
$XML::XPathScript::current->interpolation_regex( $regex )
Gets or sets the regex to use for interpolation. The value to be
interpolated must be capture by $1.
By default, the interpolation regex is qr/{(.*?)}/.
Example:
$XML::XPathScript::current->interpolation_regex( qr#\|(.*?)\|# );
$template->set( bird => { pre => '|@name| |@gender| |@type|' } );
=cut
sub interpolation_regex {
my $self = shift;
if ( my $regex = shift ) {
$self->processor->set_interpolation_regex(
$self->{interpolation_regex} = $regex
)
}
return $self->{interpolation_regex};
}
=head2 binmode
Declares that the stylesheet output is B<not> in UTF-8, but instead in
an (unspecified) character encoding embedded in the stylesheet source
that neither Perl nor XPathScript should have any business dealing
with. Calling C<< XML::XPathScript->current()->binmode() >> is an
B<irreversible> operation with the consequences outlined in L</The
Unicode mess>.
=cut "
sub binmode {
my ($self)=@_;
$self->{binmode}=1;
$self->{processor}->enable_binmode;
binmode ORIGINAL_STDOUT if (! defined $self->{printer});
return;
}
=pod "
=head1 TECHNICAL DOCUMENTATION
The rest of this POD documentation is B<not> useful to programmers who
just want to write stylesheets; it is of use only to people wanting to
call existing stylesheets or more generally embed the XPathScript
engine into some wider framework.
I<XML::XPathScript> is an object-oriented class with the following features:
=over
=item *
an I<embedded Perl dialect> that allows the merging of the stylesheet
code with snippets of the output document. Don't be afraid, this is
exactly the same kind of stuff as in I<Text::Template>, I<HTML::Mason>
or other similar packages: instead of having text inside Perl (that
one I<print()>s), we have Perl inside text, with a special escaping
form that a preprocessor interprets and extracts. For XPathScript,
this preprocessor is embodied by the I<xpathscript> shell tool (see
L</xpathscript Invocation>) and also available through this package's
API;
=item *
a I<templating engine>, that does the apply-templates loop, starting
from the top XML node and applying templates to it and its subnodes as
directed by the stylesheet.
=back
When run, the stylesheet is expected to fill in the I<template object>
$template, which is a lexically-scoped variable made available to it at
preprocess time.
=cut "
use vars qw( $XML_parser $debug_level );
use Symbol;
use File::Basename;
use XML::XPathScript::Processor;
use XML::XPathScript::Template;
our $VERSION = '1.54';
$XML_parser = 'XML::LibXML';
my %use_parser = (
'XML::LibXML' => 'use XML::LibXML',
'XML::XPath' => <<'END_USE',
use XML::XPath 1.0;
use XML::XPath::XMLParser;
use XML::XPath::Node;
use XML::XPath::NodeSet;
use XML::Parser;
END_USE
);
die "parser $XML_parser unknown\n" unless $use_parser{$XML_parser};
eval $use_parser{$XML_parser}.";1"
or die "couldn't import $XML_parser";
# internal variable for debugging information.
# 0 is total silence and 10 is complete verbiage
$debug_level = 0;
sub import
{
my $self = shift @_;
if ( grep { $_ eq 'XML::XPath' } @_ ) {
$XML::XPathScript::XML_parser = 'XML::XPath';
}
elsif ( grep { $_ eq 'XML::LibXML' } @_ ) {
$XML::XPathScript::XML_parser = 'XML::LibXML';
}
return;
}
=pod "
=head1 METHODS
=head2 new
$xps = XML::XPathScript->new( %arguments )
Creates a new XPathScript translator. The recognized named arguments are
=over
=item xml => $xml
$xml is a scalar containing XML text, or a reference to a filehandle
from which XML input is available, or an I<XML::XPath> or
I<XML::libXML> object.
An XML::XPathscript object without an I<xml> argument
to the constructor is only able to compile stylesheets (see
L</SYNOPSIS>).
=item stylesheet => $stylesheet
$stylesheet is a scalar containing the stylesheet text, or a reference
to a filehandle from which the stylesheet text is available. The
stylesheet text may contain unresolved C<< <!--#include --> >>
constructs, which will be resolved relative to ".".
=item stylesheetfile => $filename
Same as I<stylesheet> but let I<XML::XPathScript> do the loading
itself. Using this form, relative C<< <!--#include --> >>s in the
stylesheet file will be honored with respect to the dirname of
$filename instead of "."; this provides SGML-style behaviour for
inclusion (it does not depend on the current directory), which is
usually what you want.
=item compiledstylesheet => $function
Re-uses a previous return value of I<compile()> (see L</SYNOPSIS> and
L</compile>), typically to apply the same stylesheet to several XML
documents in a row.
=item interpolation_regex => $regex
Sets the interpolation regex. Whatever is
captured in $1 will be used as the xpath expression.
Defaults to qr/{(.*?)}/.
=back
=cut "
sub new {
my $class = shift;
die "Invalid hash call to new" if @_ % 2;
my %params = @_;
my $self = \%params;
bless $self, $class;
$self->{processor} = XML::XPathScript::Processor->new;
$self->set_xml( $params{xml} ) if $params{xml};
$self->interpolation( exists $params{interpolation}
? $params{interpolation} : 1 );
$self->interpolation_regex( $params{interpolation_regex}
|| qr/{(.*?)}/ );
if ( $XML::XPathScript::XML_parser eq 'XML::XPath' ) {
require XML::XPath;
require XML::XPath::XMLParser;
require XML::XPath::Node;
require XML::XPath::NodeSet;
require XML::Parser;
}
else {
require XML::LibXML;
}
croak $@ if $@;
return $self;
}
=head2 transform
$xps->transform( $xml, $stylesheet, \@args )
Transforms the document $xml with the $stylesheet (optionally passing to
the stylesheet the argument array @args) and returns the result.
If the passed $xml or $stylesheet is undefined, the previously loaded xml
document or stylesheet is used.
E.g.,
# vanilla-flavored transformation
my $xml = '<doc>...</doc>';
my $stylesheet = '<% ... %>';
my $transformed = $xps->transform( $xml, $stylesheet );
# transform many documents
$xps->set_stylesheet( $stylesheet );
for my $xml ( @xml_documents ) {
my $transformed = $xps->transform( $xml );
# do stuff with $transformed ...
}
# do many transformation of a document
$xps->set_xml( $xml );
for my $stylesheet ( @stylesheets ) {
my $transformed = $xps->transform( undef, $stylesheet );
# do stuff with $transformed ...
}
=cut
sub transform {
my( $self, $xml, $stylesheet, $args ) = @_;
my $output;
$self->set_xml( $xml ) if $xml;
if ( $stylesheet ) {
$self->{compiledstylesheet} = undef;
$self->{stylesheet} = $stylesheet;
}
$self->process( \$output, $args ? @$args : () );
return $output;
}
=head2 set_dom
$xps->set_dom( $dom )
Set the DOM of the document to process. I<$dom>
must be a node object of one of the supported
parsers (XML::LibXML, XML::XPath, B::XPath).
=cut
sub set_dom {
my( $self, $dom ) = @_;
$self->{dom} = $dom;
$self->{processor}->set_dom( $dom );
return $self;
}
=head2 set_xml
$xps->set_xml( $xml )
Sets the xml document to $xml. $xml can be a file, a file handler
reference, a string, or a XML::LibXML or XML::XPath node.
=cut
sub set_xml {
my( $self, $xml ) = @_;
$self->{xml} = $xml;
my $retval = ref $xml ? $self->_set_xml_ref()
: $self->_set_xml_scalar()
;
$self->{processor}->set_dom( $self->{dom} );
return $retval;
# FIXME
my $xpath;
# a third option should be auto, for which we
# would use the already-defined object
if( $XML_parser eq 'auto' )
{
if (UNIVERSAL::isa($self->{xml},"XML::XPath"))
{
$xpath=$self->{xml};
$XML_parser = 'XML::XPath';
}
elsif(UNIVERSAL::isa($self->{xml},"XML::LibXML" ))
{
$xpath=$self->{xml};
$XML_parser = 'XML::LibXML';
}
}
if (UNIVERSAL::isa($self->{xml},"XML::XPath"))
{
if( $XML_parser eq 'XML::XPath' or $XML_parser eq 'auto' )
{
$xpath=$self->{xml};
$XML_parser = 'XML::XPath';
}
else # parser if XML::LibXML
{
$xpath = XML::LibXML->parse_string( $self->{xml}->toString )->documentElement;
}
}
elsif (UNIVERSAL::isa($self->{xml},"XML::libXML"))
{
if( $XML_parser eq 'XML::LibXML' or $XML_parser eq 'auto' )
{
$xpath=$self->{xml};
$XML_parser = 'XML::LibXML';
}
else # parser if xpath
{
$xpath = new XML::XPath( xml => $self->{xml}->toString );
}
}
else
{
$XML_parser = 'XML::LibXML' if $XML_parser eq 'auto';
if (ref($self->{xml}))
{
$xpath= ( $XML_parser eq 'XML::LibXML' ) ?
XML::LibXML->new->parse_fh( $self->{xml} )->documentElement :
XML::XPath->new(ioref => $self->{xml})
}
}
$self->{dom} = $xpath;
}
sub _set_xml_ref {
my $self = shift;
my $xml = $self->{xml};
if ( $XML_parser eq 'XML::LibXML' ) {
if ( $xml->isa( 'XML::LibXML::Document' ) ) {
$self->{dom} = $xml;
return;
}
if ( $xml->isa( 'XML::LibXML::Node' ) ) {
my $dom = XML::LibXML::Document->new;
$dom->setDocumentElement( $xml );
$self->{dom} = $dom;
return;
}
}
else { # XML::XPath
if ( $xml->isa( 'XML::XPath' ) ) {
$self->{dom} = $xml;
return;
}
if( $xml->isa( 'XML::XPath::Node' ) ) {
# evil hack
my $dom = XML::XPath->new( xml => $xml->toString );
$self->{dom} = $dom;
return;
}
}
# try to read it as an io
$self->{dom} = $XML_parser eq 'XML::LibXML'
? XML::LibXML->new->parse_fh( $xml )->documentElement
: XML::XPath->new(ioref => $xml)
;
return;
}
sub _set_xml_scalar {
my $self = shift;
my $xml = $self->{xml};
# is it a file?
if( index( $xml, "\n" ) == -1 and # quick'n'dirty checks
index( $xml, '<' ) == -1 and # for non-filename characters
index( $xml, '>' ) == -1 and -f $xml ) {
open my $fh, '<', $xml or croak "couldn't open xml file $xml: $!";
$self->{dom} = $XML_parser eq 'XML::LibXML'
? XML::LibXML->new->parse_file( $xml )->documentElement
: XML::XPath->new( filename => $xml )
;
return;
}
# then it must be a string
$self->{dom} = $XML_parser eq 'XML::LibXML'
? XML::LibXML->new->parse_string( $xml )->documentElement
: XML::XPath->new( xml => $xml );
return;
}
=head2 set_stylesheet
$xps->set_stylesheet( $stylesheet )
Sets the processor's stylesheet to $stylesheet.
=cut
sub set_stylesheet {
my ( $self, $stylesheet ) = @_;
$self->{compiledstylesheet} = undef;
$self->{stylesheet} = $stylesheet;
$self->compile if $self->{stylesheet};
}
=pod "
=head2 process
$xps->process
$xps->process( $printer )
$xps->process( $printer, @varvalues )
Processes the document and stylesheet set at construction time, and
prints the result to STDOUT by default. If $printer is set, it must be
either a reference to a filehandle open for output, or a reference to
a string, or a reference to a subroutine which does the output, as in
open my $fh, '>', 'transformed.txt'
or die "can't open file transformed.txt: $!";
$xps->process( $fh );
my $transformed;
$xps->process( \$transformed );
$xps->process( sub {
my $output = shift;
$output =~ y/<>/%%/;
print $output;
} );
If the stylesheet was I<compile()>d with extra I<varname>s, then the
calling code should call I<process()> with a corresponding number of
@varvalues. The corresponding lexical variables will be set
accordingly, so that the stylesheet code can get at them (looking at
L</SYNOPSIS>) is the easiest way of getting the meaning of this
sentence).
=cut "
sub process {
my ($self, $printer, @extravars) = @_;
do { $$printer="" } if (UNIVERSAL::isa($printer, "SCALAR"));
$self->{printer}=$printer if $printer;
croak "xml document not defined" unless $self->{dom};
# FIXME
eval { $self->{dom}->ownerDocument->setEncoding( "UTF-8" ) }
if $XML_parser eq 'XML::LibXML';
{
local *ORIGINAL_STDOUT;
*ORIGINAL_STDOUT = *STDOUT;
local *STDOUT;
# Perl 5.6.1 dislikes closed but tied descriptors (causes SEGVage)
*STDOUT = *ORIGINAL_STDOUT if $^V lt v5.7.0;
tie *STDOUT, __PACKAGE__;
$self->compile unless $self->{compiledstylesheet};
my $retval = $self->{compiledstylesheet}->( $self, @extravars );
untie *STDOUT;
return $retval;
}
}
=head2 extract
$xps->extract( $stylesheet )
$xps->extract( $stylesheet, $filename )
$xps->extract( $stylesheet, @includestack ) # from include_file() only
The embedded dialect parser. Given $stylesheet, which is either a
filehandle reference or a string, returns a string that holds all the
code in real Perl. Unquoted text and C<< <%= stuff %> >> constructs in
the stylesheet dialect are converted into invocations of I<<
XML::XPathScript->current()->print() >>, while C<< <% stuff %> >>
constructs are transcripted verbatim.
C<< <!-- #include --> >> constructs are expanded by passing their
filename argument to L</include_file> along with @includestack (if any)
like this:
$self->include_file($includefilename,@includestack);
@includestack is not interpreted by I<extract()> (except for the first
entry, to create line tags for the debugger). It is only a bandaid for
I<include_file()> to pass the inclusion stack to itself across the
mutual recursion existing between the two methods (see
L</include_file>). If I<extract()> is invoked from outside
I<include_file()>, the last invocation form should not be used.
This method does a purely syntactic job. No special framework
declaration is prepended for isolating the code in its own package,
defining $t or the like (L</compile> does that). It may be overriden
in subclasses to provide different escape forms in the stylesheet
dialect.
=cut "
sub extract {
my ($self,$stylesheet,@includestack) = @_;
my $filename = $self->{stylesheet_dependencies}[0] || "stylesheet";
my $contents = $self->read_stylesheet( $stylesheet );
my @tokens = split /(<%[-=~#@]*|-?%>)/, $contents;
no warnings qw/ uninitialized /;
my $script;
my $line = 1;
TOKEN:
while ( @tokens ) {
my $token = shift @tokens;
if ( -1 == index $token, '<%' ) {
$line += $token =~ tr/\n//;
$token =~ s/\s+$// if -1 < index $tokens[0], '<%'
and -1 < index $tokens[0], '-';
$token =~ s/\|/\\\|/g;
# check for include
$token =~ s{<!--#include.+file=(['"])(.*?)\1.*?-->}
{ '|);'
. $self->include_file( $2, @includestack)
. 'print(q|'}seg;
$script .= 'print(q|'.$token.'|);' if length $token;
next TOKEN;
}
$script .= "\n#line $line $filename\n";
my $opening_tag = $token;
my $code;
my $closing_tag;
my $level = 1;
while( @tokens ) {
my $t = shift @tokens;
$level++ if -1 < index $t, '<%';
$level-- if -1 < index $t, '%>';
if ( $level == 0 ) {
$closing_tag = $t;
last;
}
$code .= $t;
}
die "stylesheet <% %>s are unbalanced: $opening_tag$code\n"
unless $closing_tag;
$line += $code =~ tr/\n//;
if ( -1 < index $opening_tag, '=' ) {
$script .= 'print( '.$code.' );';
}
elsif ( -1 < index $opening_tag, '~' ) {
$code =~ s/^\s+//;
$code =~ s/\s+$//;
$script .= 'print $processor->apply_templates( qq<'. $code .'> );';
}
elsif( -1 < index $opening_tag, '#' ) {
# do nothing
}
elsif( -1 < index $opening_tag, '@' ) {
$code =~ s/^\s+(\S+).*?\n//; # strip first line
my $tag = $1
or die "tag name missing in <%\@ %> at line $line\n";
my $here_delimiter = 'END_TAG';
while ( $code =~ /$here_delimiter/ ) {
$here_delimiter .= 'x';
}
$script .= <<END_SNIPPET;
\$template->set( $tag => { content => <<'$here_delimiter' } );
$code
$here_delimiter
END_SNIPPET
}
else {
# always add a ';', just in case
$script .= $code . ';';
}
if ( -1 < index $closing_tag, '-' ) {
$tokens[0] =~ s/^\s*//;
my $temp = $&;
$line += $temp =~ tr/\n//;
}
}
return $script;
# FIXME not needed anymore
# <%- -%> magic
$contents =~ s#(\s+)<%-([=~]?)#<%$2$1#gs;
$contents =~ s#-%>(\s+)#$1%>#gs;
# <%~ %> magic
$contents =~ s#<%~\s+(\S+)\s+%>#<%= apply_templates( qq<$1> ) %>#gs;
$script="#line 1 $filename\n",
$line = 1;
while ($contents =~ /\G(.*?)(<!--#include|<%[=#]?)/gcs) {
my ($text, $type) = ($1, $2);
$line += $text =~ tr/\n//; # count \n's in text
$text =~ s/\|/\\\|/g;
$script .= "print(q|$text|);";
$script .= "\n#line $line $filename\n";
if ($type eq '<%=') {
$contents =~ /\G(.*?)%>/gcs || die "No terminating '%>' after line $line";
my $perl = $1;
$script .= "print( $perl );\n";
$line += $perl =~ tr/\n//;
}
elsif ($type eq '<!--#include') {
my %params;
while ($contents =~ /\G(\s+(\w+)\s*=\s*(["'])([^\3]*?)\3|\s*-->)/gcs) {
last if $1 eq '-->';
$params{$2} = $4 if (defined $2);
}
die "No matching file attribute in #include at line $line"
unless $params{file};
no warnings qw/ uninitialized /;
$script .= $self->include_file($params{file},@includestack);
}
else {
$contents =~ /\G(.*?)%>/gcs || die "No terminating '%>' after line $line";
my $perl = $1;
if( $type ne '<%#' ) {
$perl =~ s/;?$/;/s; # add on ; if its missing. As in <% $foo = 'Hello' %>
$script .= $perl;
}
$line += $perl =~ tr/\n//;
}
}
if ($contents =~ /\G(.+)/gcs) {
my $text = $1;
$text =~ s/\|/\\\|/g;
$script .= "print(q|$text|);";
}
return $script;
}
=head2 read_stylesheet
$string = $xps->read_stylesheet( $stylesheet )
Read the $stylesheet (which can be a filehandler or a string).
Used by I<extract> and exists such that it can be overloaded in
I<Apache::AxKit::Language::YPathScript>.
=cut
sub read_stylesheet
{
my( $self, $stylesheet ) = @_;
# $stylesheet can be a filehandler
# or a string
if( ref($stylesheet) ) {
local $/;
return <$stylesheet>;
}
else {
return $stylesheet;
}
}
=head2 include_file
$xps->include_file( $filename )
$xps->include_file( $filename, @includestack )
Resolves a C<< <!--#include file="foo" --> >> directive on behalf of
I<extract()>, that is, returns the script contents of
I<$filename>. The return value must be de-embedded too, which means
that I<extract()> has to be called recursively to expand the contents
of $filename (which may contain more C<< <!--#include --> >>s etc.)
$filename has to be slash-separated, whatever OS it is you are using
(this is the XML way of things). If $filename is relative (i.e. does
not begin with "/" or "./"), it is resolved according to the basename
of the stylesheet that includes it (that is, $includestack[0], see
below) or "." if we are in the topmost stylesheet. Filenames beginning
with "./" are considered absolute; this gives stylesheet writers a way
to specify that they really really want a stylesheet that lies in the
system's current working directory.
@includestack is the include stack currently in use, made up of all
values of $filename through the stack, lastly added (innermost)
entries first. The toplevel stylesheet is not in @includestack
(that is, the outermost call does not specify an @includestack).
This method may be overridden in subclasses to provide support for
alternate namespaces (e.g. ``axkit://'' URIs).
=cut "
sub include_file {
my ($self, $filename, @includestack) = @_;
if ( $filename !~ m#^\.?/# ) {
# We guarantee that all values we insert into @includestack begin
# either with "/" or "./". This allows us to do the relative
# directory thing, and at the same time we get to safely ignore
# bizarre URIs inserted by inheriting classes.
my $reldir = $includestack[0] && $includestack[0] =~ m#^\.?/#
? dirname($includestack[0])
: '.'
;
$filename = "$reldir/$filename";
}
# are we going recursive?
if ( grep { $_ eq $filename } @includestack ) {
warn 'loop detected in stylesheet include chain: ',
join( ' => ', reverse(@includestack), $filename ), "\n";
return undef;
}
my $stylesheet;
unless ( $stylesheet = $self->{stylesheet_cache}{$filename} ) {
open my $fh, '<', $filename
or Carp::croak "Can't read include file '$filename': $!";
$stylesheet = $self->{stylesheet_cache}{$filename}
= $self->read_stylesheet( $fh );
}
return $self->extract($stylesheet, $filename, @includestack);
}
=pod "
=head2 I<compile()>
=head2 I<compile(varname1, varname2,...)>
Compiles the stylesheet set at I<new()> time and returns an anonymous
CODE reference.
I<varname1>, I<varname2>, etc. are extraneous arguments that will be
made available to the stylesheet dialect as lexically scoped
variables. L</SYNOPSIS> shows how to use this feature to pass variables
to AxKit XPathScript stylesheets, which explains this
feature better than a lengthy paragraph would do.
The return value is an opaque token that encapsulates a compiled
stylesheet. It should not be used, except as the
I<compiledstylesheet> argument to I<new()> to initiate new objects and
amortize the compilation time. Subclasses may alter the type of the
return value, but will need to overload I<process()> accordingly of
course.
The I<compile()> method is idempotent. Subsequent calls to it will
return the very same token, and calls to it when a
I<compiledstylesheet> argument was set at I<new()> time will return
said argument.
=cut "
# Internal documentation: the return value is an anonymous sub whose
# prototype is
# &$compiledfunc($xpathscriptobj, $val1, $val2,...);
sub compile {
my ($self,@extravars) = @_;
$self->{compiledstylesheet} = undef;
my $stylesheet;
$self->{stylesheet_cache} = {};
if (exists $self->{stylesheet}) {
$stylesheet=$self->{stylesheet};
}
elsif (exists $self->{stylesheetfile}) {
# This hack fails if $self->{stylesheetfile} contains
# double quotes. I think we can ignore this and get
# away.
$stylesheet=qq:<!--#include file="$self->{stylesheetfile}" -->:;
}
else {
die "Cannot compile without a stylesheet\n";
};
my $script = $self->extract($stylesheet);
my $package=gen_package_name();
my $extravars = join ',', @extravars;
my $processor = $self->{processor};
# needs to be eval'ed first for the constants
# to be seen
eval "package $package;"
."\$processor->import_functional();";
my $eval = <<EOT;
package $package;
no strict; # Don't moan on sloppyly
no warnings; # written stylesheets
use $XML_parser;
sub {
my (\$self, $extravars ) = \@_;
my \$processor = processor();
local \$XML::XPathScript::current=\$self;
my \$t = \$processor->{template}
= XML::XPathScript::Template->new();
my \$template = \$t;
local \$XML::XPathScript::trans = \$t;
#\$processor->{doc} = \$self->{dom};
#\$processor->{parser} = '$XML_parser';
#\$processor->{binmode} = \$self->{binmode};
#\$processor->{is_interpolating} = \$self->interpolation;
#\$processor->{interpolation_regex} = \$self->interpolation_regex;
$script
}
EOT
#warn "script ready for compil: $eval";
local $^W;
$self->debug( 10, "Compiling code:\n $eval" );
my $retval = eval $eval;
die $@ unless defined $retval;
return $self->{compiledstylesheet} = $retval;
}
=head2 print
$xps->print($text)
Outputs a chunk of text on behalf of the stylesheet. The default
implementation is to use the second argument to L</process>.
Overloading this
method in a subclass provides yet another method to redirect output.
=cut "
sub print {
no warnings qw/ uninitialized /;
my ($self, @text)=@_;
my $printer=$self->{printer};
if (!defined $printer) {
print ORIGINAL_STDOUT @text;
} elsif (ref($printer) eq 'CODE') {
$printer->(@text);
} elsif (UNIVERSAL::isa($printer, 'SCALAR')) {
$$printer.= join '', @text;
} else {
local $\=undef;
print $printer @text;
};
return;
}
# $self->debug( $level, $message )
# Display debugging information
sub debug {
warn $_[2] if $_[1] <= $debug_level;
}
=head2 get_stylesheet_dependencies
@files = $xps->get_stylesheet_dependencies
Returns the files the loaded stylesheet depends on (i.e., has been
included by the stylesheet or one of its includes). The order in which
files are returned by the function has no special signification.
=cut
sub get_stylesheet_dependencies {
my $self = shift;
$self->compile unless $self->{compiledstylesheet};
return sort keys %{$self->{stylesheet_cache}};
}
=head2 processor
$processor = $xps->processor
Returns the processor object associated with I<$xps>.
=cut
sub processor {
return $_[0]->{processor};
}
=head1 FUNCTIONS
#=head2 gen_package_name
#
#Generates a fresh package name in which we would compile a new
#stylesheet. Never returns twice the same name.
=cut "
do {
my $uniquifier;
sub gen_package_name {
$uniquifier++;
return "XML::XPathScript::STYLESHEET$uniquifier";
}
};
=head2 document
$nodeset = $xps->document( $uri )
Reads XML given in $uri, parses it and returns it in a nodeset.
=cut
sub document {
# warn "Document function called\n";
my( $self, $uri ) = @_;
my( $results, $parser );
if( $XML_parser eq 'XML::XPath' ) {
my $xml_parser = XML::Parser->new(
ErrorContext => 2,
Namespaces => $XML::XPath::VERSION < 1.07 ? 1 : 0,
# ParseParamEnt => 1,
);
$parser = XML::XPath::XMLParser->new(parser => $xml_parser);
$results = XML::XPath::NodeSet->new();
}
elsif ( $XML_parser eq 'XML::LibXML' ) {
$parser = XML::LibXML->new;
$results = XML::LibXML::Document->new;
}
else {
$self->die( "xml parser not valid: $XML_parser" );
}
my $newdoc;
# TODO: must handle axkit: scheme a little more cleverly
if ($uri =~ /^\w\w+:/ and $uri !~ /^axkit:/ ) { # assume it's scheme://foo uri
eval {
$self->debug( 5, "trying to parse $uri" );
eval "use LWP::Simple";
$newdoc = $parser->parse_string( LWP::Simple::get( $uri ) );
$self->debug( 5, "Parsed OK into $newdoc\n" );
};
if (my $E = $@) {
$self->debug("Parse of '$uri' failed: $E" );
}
}
else {
$self->debug(3, "Parsing local: $uri\n");
if( $XML_parser eq 'XML::LibXML' ) {
$newdoc = $parser->parse_file( $uri );
} elsif( $XML_parser eq 'XML::XPath' ) {
$newdoc = XML::XPath->new( filename => $uri );
}
else { die "invalid parser: $XML_parser\n"; }
}
if( $newdoc ) {
if( $XML_parser eq 'XML::LibXML' ) {
$results = $newdoc->documentElement();
}
elsif( $XML_parser eq 'XML::XPath' ) {
$results = $newdoc->findnodes('/')->[0]->getChildNodes->[0];
}
}
$self->debug(8, "XPathScript: document() returning");
return $results;
}
sub TIEHANDLE { my $self = ''; bless \$self, $_[0] }
sub PRINT {
my $self = shift;
return XML::XPathScript::current()->print( @_ );
}
sub BINMODE {
return XML::XPathScript::current()->binmode( @_ );
}
1;
__END__
=head1 BUGS
Please send bug reports to <bug-xml-xpathscript@rt.cpan.org>,
or via the web interface at
http://rt.cpan.org/Public/Dist/Display.html?Name=XML-XPathScript .
=head1 AUTHORS
Current maintainers:
Yanick Champoux <yanick@cpan.org>
and Dominique Quatravaux <domq@cpan.org>
Created by Matt Sergeant <matt@sergeant.org>
=head1 THANKS
Thanks to Tim Nelson for pretty nifty suggestions and
patches. We sure hope the new B<insteadofchildren>
tag will make XSL users flock to XPS like ants to
a melting chocolate bunny, as he promised. ;-)
=head1 LICENSE
This is free software. You may distribute it under the same terms as
Perl itself.
=head1 SEE ALSO
L<XML::XPathScript::Stylesheet>, L<XML::XPathScript::Processor>,
L<XML::XPathScript::Template>, L<XML::XPathScript::Template::Tag>
Guide of the original Axkit XPathScript:
http://axkit.org/wiki/view/AxKit/XPathScriptGuide
XPath documentation from W3C:
http://www.w3.org/TR/xpath
Unicode character table:
http://www.unicode.org/charts/charindex.html
=cut
# Local Variables:
# mode:cperl
# tab-width:8
# End: