The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

PApp::XML - pxml sections and more

SYNOPSIS

 use PApp::XML;

DESCRIPTION

Apart from providing XML convinience functions, the PApp::XML module manages XML templates containing pappxml directives and perl code similar to phtml sections. Together with stylesheets (PApp::XSLT) this can be used to almost totally seperate content from layout. Image a database containing XML documents with customized tags. A stylesheet can then be used to transform this XML document into html + special pappxml directives that can be used to create links etc...

Functions for XML-Generation

xml_quote $string

Quotes (and returns) the given string so that it's contents won't be interpreted by an XML parser (quotes ', ", <, & and > to avoid ]]>). Example:

   print xml_quote q( <xx> & <[[]]> );
   => &lt;xx> &amp; &lt;[[]]&gt;
xml_cdata $string

Does the same thing as xml_quote, but using CDATA constructs, rather than quoting individual characters. Example:

   print xml_cdata q(hi ]]> there);
   => <![CDATA[hi ]]]]><![CDATA[> there ]]>
xml_unquote $string

Unquotes (and returns) an XML string (by resolving it's entities and CDATA sections). Currently, only the named predefined xml entities and numerical character entities are resolved. Everything else is silently ignored. Example:

   print xml_unquote q( <![CDATA[text1]]> &amp; text2&#x21; );
   => text1 & text2!
xml_attr $attr => $value [, $attr2 => $value2, ...]

Returns fully quoted $attr => $value pairs. Example:

   print xml_attr authors => q(Alan Cox & Linus "kubys" Torvalds);
   => authors="Alan Cox & Linus &quot;kubys&quot; Torvalds"
xml_tag $element_name, [$attr => $value, ...] [, $content_or_undef]

Generates a tag from the given element name, content and attribute name => value pairs. If content is undef, an empty tag will be generated. Example:

   print xml_tag "p", align => "center"
   => <p align="center"/>

As a very special courtesy hack for you, if you omit the content argument entirely, only an opening tag will be generated.

Functions for Analyzing XML

($msg, $line, $col, $byte) = xml_check $string [, $prolog, $epilog]

Checks wether the given document is well-formed (as opposed to valid). This merely tries to parse the string as an xml-document. Nothing is returned if the document is well-formed.

Otherwise it returns the error message, line (one-based), column (zero-based) and character-position (zero-based) of the point the error occured.

The optional argument $prolog is prepended to the string, while $epilog is appended (i.e. the document is "$prolog$string$epilog"). The cool thing is that the epilog/prolog strings are not counted in the error position (and yes, they should be free of any errors!).

(Hint: Remember to utf8_upgrade before calling this function or make sure that an encoding is given in the xml declaration).

xml_errorparser $xml, [$offset, $message]

This function takes a slightly damaged XML document or fragment and tries to repair it. During this process it annotates many errors with error messages in <error>-elements. It also offers the option of adding a custom error message around the specified offste in the file.

This function currently works best with HTML or HTML-like input, and tries very hard not to place error messages at places where they won't be visible.

The result should be parseable by XML parsers, but be warned that not every case will be fixed.

xml_encoding xml-string [DEPRECATED]

Convinience function to detect the encoding used by the given xml string. It uses a variety of heuristics (mainly as given in appendix F of the XML specification). UCS4 and UTF-16 are ignored, mainly because I don't want to get into the byte-swapping business (maybe write an interface module for gconv?). The XML declaration itself is being ignored.

Functions for Modifying XML

($version, $encoding, $standalone) = xml_remove_decl $xml[, $encoding]

Remove the xml header, if any, from the given string and return the info. If the declaration is missing, ("1.0", $encoding || xml_encoding(), "yes") is returned.

($version, $encoding, $standalone) = xml2utf8 xml-string[, encoding]

Tries to convert the given string into utf8 (inplace). Currently only supports UTF-8 and ISO-8859-1, but could be extended easily to handle everything Expat can. Uses xml_encoding to autodetect the encoding unless an explicit encoding argument is given.

It returns the xml declaration parameters (where encoding is always utf-8). The xml declaration itself will be removed from the string.

expand_pi $xml, { pi => coderef, pi2 => coderef... }

Takes an xml string and expands all processing instructions given in the second argument by calling the respective coderef. The resulting string is returned.

The (single) argument to the coderef is the (unquoted) argument.

This function uses a regex (without backtracking in the common case) and should be fast.

For example, to execute sql commands using sql processing instructions, use something like this:

   Test xml string: <?sql select id from table where mtime = 7?>

   $expanded =
      expand_pi $xml, {
         sql => sub {
            xml_quote join "", sql_ufetch $_[0];
         },
      };
xml_include $document, $base [, $uri_handler($uri, $base) ]

Expand any xinclude:include elements in the given $document by handing the href attribute and the current base URI to the $uri_handler with this URI (-object). The $uri_handler should fetch the document and return it (or undef on error).

Example (see http://www.w3.org/TR/xinclude/ for the definition of xinclude):

   <document xmlns:xinclude="http://www.w3.org/2001/XInclude">
      <xinclude:include href="http://some.host/otherdoc.xml"/>
      <xinclude:include href="/etc/passwd" parse="text"/>
   </document>

The result of running xml_include on this document will have the first include element replaced by the document element (and it's contents) of http://some.host/otherdoc.xml and the second include element replaced by a (correctly quoted) copy of your /etc/passwd file.

Another common example is embedding stylesheet fragments into larger stylesheets. Using xinclude for these cases is faster than xsl's include/import machanism since xinclude expansion can be done after file loading while, while xsl's include mechanism is evaluated on every parse.

   <include xmlns="http://www.w3.org/2001/XInclude"
            href="style/xtable.xsl"
            parse="verbatim"/>

At the moment this function always returns utf-8 documents, regardless of the input encoding used (included text is inserted as is, any converson must be done in the uri handler).

This function does not conform to http://www.w3.org/TR/xmlbase/.

In addition to parse="xml" and parse="text", this function also supports parse="verbatim" (insert text verbatim, i.e. like xslt's disable-output-escaping="yes") and parse="pxml" (parse xml file as pxml). The types xml-fragment and pxml-fragment are also under consideration.

pod2xml $pod

Converts a POD string (which can be either a fragment or a whole document)

The PApp::XML Factory Class

new PApp::XML parameter => value...

Creates a new PApp::XML template object with the specified behaviour. It can be used as an object factory to create new PApp::XML::Template objects.

 special        a hashref containing special => coderef pairs. If a
                special is encountered, the given coderef will be compiled
                in instead (i.e. it will be called each time the fragment
                is print'ed). The coderef will be called with a reference
                to the attribute hash, the element's contents (as a
                string) and the PApp::XML::Template object used to print
                the string.

                If a reference to a coderef is given (e.g. C<\sub {}>),
                the coderef will be called during parsing and the
                resulting string will be added to the compiled subroutine.
                The arguments are the same, except that the contents are
                not given as string but as a magic token that must be
                inserted into the return value.

                The return value is expected to be in "phtml"
                (L<PApp::Parser>) format, the magic "contents" token must
                not occur in code sections.
                
 html           html output mode enable flag

At the moment there is one predefined special named slink, that maps almost directly into a call to slink (a leading underscore in an attribute name gets changed into a minus (-) to allow for one-shot arguments), e.g:

 <papp:special _special="slink" module="kill" name="Bill" _doit="1">
    Do it to Bill!
 </papp:special>

might get changed to (note that module is treated specially):

 slink "Do it to Bill!", "kill", -doit => 1, name => "Bill";

In a XSLT stylesheet one could define:

  <xsl:template match="link">
     <papp:special _special="slink">
        <xsl:for-each select="@*">
           <xsl:copy/>
        </xsl:for-each>
        <xsl:apply-templates/>
     </papp:special>
  </xsl:template>

Which defines a link element that can be used like this:

  <link module="kill" name="bill" _doit="1">Kill Bill!</link>
$pappxml->dom2template($dom, {special}, key => value...)

Compile the given DOM into a PApp::XML::Template object and returns it. An additional set of specials only used to parse this dom can be passed as a hashref (this argument is optional). Additional key => value pairs will be added to the template's attribute hash. The template will be evaluated in the caller's package (e.g. to get access to __ and similar functions).

On error, nothing is returned. Use the error method to get more information about the problem.

In addition to the syntax accepted by PApp::PCode::pxml2pcode, this function evaluates certain XML Elements (please note that I consider the "papp" namespace to be reserved):

 papp:special _special="special-name" attributes...
   
   Evaluate the special with the name given by the attribute C<_special>
   after evaluating its content. The special will receive two arguments:
   a hashref with all additional attributes and a string representing an
   already evaluated code fragment.
 
 papp:unquote

   Expands ("unquotes") some (but not all) entities, namely lt, gt, amp,
   quot, apos. This can be easily used within a stylesheet to create
   verbatim html or perl sections, e.g.

   <papp:unquote><![CDATA[
      <: echo "hallo" :>
   ]]></papp:unquote>

   A XSLT stylesheet that converts <phtml> sections just like in papp files
   might look like this:

   <xsl:template match="phtml">
      <papp:unquote>
         <xsl:apply-templates/>
      </papp:unquote>
   </xsl:template>
$err = $pappxml->error

Return information about an error as an PApp::Exception object (PApp::Exception).

$template->localvar([content]) [WIZARDRY]

Create a local variable that can be used inside specials and return a string representation of it (i.e. a magic token that represents the lvalue of the variable when compiled). Can only be called during compilation.

$template->gen_surl(<surl-arguments>) [WIZARDY]

Returns a string representing a perl statement returning the surl.

$template->gen_slink(<surl-arguments>) [WIZARDY]

Returns a string representing a perl statement returning the slink.

$template->attr(key, [newvalue])

Return the attribute value for the given key. If newvalue is given, replaces the attribute and returns the previous value.

$template->print

Print (and execute any required specials). You can capture the output using the PApp::capture function.

Wizard Example

In this section I'll try to sketch out a "wizard example" that shows how PApp::XML could be used in the real world.

Consider an application that fetches most or all content (even layout) from a database and uses a stylesheet to map xml content to html, which allows for almost total seperation of layout and content. It would have an init section loading a XSLT stylesheet and defining a content factory:

   use XML::XSLT; # ugly module, but it works great!
   use PApp::XML;

   # create the parser
   my $xsl = "$PApp::Config{LIBDIR}/stylesheet.xsl";
   $xslt_parser = XML::XSLT->new($xsl, "FILE");

   # create a content factory
   $tt_content_factory = new PApp::XML
      html => 1, # we want html output
      special => {
         include => sub {
            my ($attr, $content) = @_;
            get_content($attr->{name})->print;
         },
      };

   # create a cache (XSLT is quite slow)
   use Tie::Cache;
   tie %content_cache, Tie::Cache::, { MaxCount => 30, WriteSync => 0};

Here we define an include special that inserts another document inplace. How does get_content (see the definition of include) look like?

   <macro name="get_content" args="$name $special"><phtml><![CDATA[<:
      my $cache = $content_cache{"$lang\0$name"};
      unless ($cache) {
         $cache = $content_cache{"$lang\0$name"} = [
            undef,
            0,
         ];
      }
      if ($cache->[1] < time) {
         $cache->[0] = fetch_content $name, $special;
         $cache->[1] = time + 10;
      }
      $cache->[0];
   :>]]></phtml></macro>

get_content is nothing more but a wrapper around fetch_content. It's sole purpose is to cache documents since parsing and transforming a xml file is quite slow (please note that I include the current language when caching documents since, of course, the documents get translated). In non-speed-critical applications you could just substitute fetch_content for get_content:

   <macro name="fetch_content" args="$name $special"><phtml><![CDATA[<:
      sql_fetch \my($id, $_name, $ctime, $body),
                "select id, name, unix_timestamp(ctime), body from content where name = ?",
                $name;
      unless ($id) {
         ($id, $_name, $ctime, $body) =
            (undef, undef, undef, "");
      }

      parse_content (gettext$body, {
         special => $special,
         id      => $id,
         name    => $name,
         ctime   => $ctime,
         lang    => $lang,
      });
   :>]]></phtml></macro>

fetch_content actually fetches the content string from the database. In this example, a content object has a name (which is used to reference it) a timestamp and a body, which is the actual document. After fetching the content object it uses parse_content to transform the xml snippet into a perl sub that can be efficiently executed:

   <macro name="parse_content" args="$body $attr"><phtml><![CDATA[<:
      my $content = eval {
         $xslt_parser->transform_document(
             '<?xml version="1.0" encoding="iso-8859-1" standalone="no"?'.'>'.
             "<ttt_fragment>".
             $body.
             "</ttt_fragment>",
             "STRING"
         );
         my $dom = $xslt_parser->result_tree;
         $tt_content_factory->dom2template($dom, %$attr);
      };
      if ($@) {
         my $line = $@ =~ /mismatched tag at line (\d+), column \d+, byte \d+/ ? $1 : -1;
         # create a fancy error message
      }
      $content || parse_content("");
   :>]]></phtml></macro>

As you can see, it uses XSLT's transform_document, which does the string -> DOM translation for us, and also transforms the XML code through the stylesheet. After that it uses dom2template to compile the document into perl code and returns it.

An example stylesheet would look like this:

   <xsl:template match="ttt_fragment">
      <xsl:apply-templates/>
   </xsl:template>

   <xsl:template match="p|em|h1|h2|br|tt|hr|small">
      <xsl:copy>
         <xsl:apply-templates/>
      </xsl:copy>
   </xsl:template>

   <xsl:template match="include">
      <papp:special _special="include" name="{@name}"/>
   </xsl:template>

   # add the earlier XSLT examples here.

This stylesheet would transform the following XML snippet:

   <p>Look at
      <link module="product" productid="7">our rubber-wobber-cake</link>
      before it is <em>sold out</em>!
      <include name="product_description_7"/>
   </p>

Which would be turned into something like this:

   <p>Look at
      <papp:special _special="slink" module="product" productid="7">
         our rubber-wobber-cake
      </apppxml:special>
      before it is <em>sold out</em>!
      <papp:special _special="include" name="product_description_7"/>
   </p>

Now go back and try to understand the above code! But wait! Consider that you had a content editor installed as the module content_editor, as I happen to have. Now lets introduce the editable_content macro:

   <macro name="editable_content" args="$name %special"><phtml><![CDATA[<:

      my $content;

      :>
   #if access_p "admin"
      <table border=1><tr><td>
      <:
         sql_fetch \my($id), "select id from content where name = ?", $name;
         if ($id) {
            :><?sublink [current_locals], __"[Edit the content object \"$name\"]", "content_editor_edit", contentid => $id:><:
         } else {
            :><?sublink [current_locals], __"[Create the content object \"$name\"]", "content_editor_edit", contentname => $name:><:
         }

         $content = get_content($name,\%special);
         $content->print;
      :>
      </table>
   #else
      <:
         $content = get_content($name,\%special);
         $content->print;
      :>
   #endif
      <:

      return $content;
   :>]]></phtml></macro>

What does this do? Easy: If you are logged in as admin (i.e. have the "admin" access right), it displays a link that lets you edit the object directly. As normal user it just displays the content as-is. It could be used like this:

   <perl><![CDATA[
      header;
      my $content = editable_content("homepage");
      footer last_changed => $content->ctime;
   ]]></perl>

Disregarding header and footer, this would create a page fully dynamically out of a database, together with last-modified information, which could be edited on the web. Obviously this approach could be extended to any complexity.

SEE ALSO

PApp.

AUTHOR

 Marc Lehmann <schmorp@schmorp.de>
 http://home.schmorp.de/