The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

PDF::Template - Perl extension for separation of data and PDF document layout.

SYNOPSIS

  use PDF::Template;
  my $rpt = new PDF::Template(FILENAME=>'rpt_allpwps.xml');

  # Set some parameters
  $rpt->param(REPORT_NAME=>'P-9: PWP Booklet');
  $rpt->param(OUTER=>\@arrayofhashrefsofarrayrefsofhashrefsorsomething);

  # Write out a PDF file
  $rpt->write_file('rpt_allpwps.pdf');

DESCRIPTION

Modules for generating PDF files easily from common data structures. Separates layout from programming, to some extent. Like HTML::Template.

Although design is in place for additional providers, currently this module REQUIRES use of PDFLib (pdflib.com).

MOTIVATION

I need to create PDF documents for many of the HTML pages I produce. I immediately adopted a templating tool for my HTML needs; however, there was no similar tool for PDF. After a few iterations of my own tools, I could take it no longer and had to write this.

Programming Reference

The only object you need be concerned about, programatically, is the PDF::Template object.

new()

Produce a new report object. This can take optional parameters:

  • filename

    This is the path to the XML specification for the PDF layout.

  • openaction

    Controls the initial presentation of the PDF when Acrobat opens it. May be set to one of these values: retain, fitpage, fitwidth, fitheight, fitbox. Defaults to 'fitpage'.

  • openmode

    Controls the initial presentation of the PDF when Acrobat opens it. May be set to one of these values: none, bookmarks, thumbnails, fullscreen. Defaults to 'none'.

  • info

    This is a hash reference containing info about the PDF document. The has can contain keys such as Title, Subject, Author, Keywords, and Creator. This information is visible by clicking File->Document Info->General in the Acrobat viewer. If not specified, Creator and Author are set to "PDF::Template".

param()

param() can be called in a number of ways:

1) To set the value of a parameter :

      # For simple TMPL_VARs:
      $self->param(PARAM => 'value');

      # with a subroutine reference that gets called to get the value
      # of the scalar.  The sub will recieve the template object as a
      # parameter.
      $self->param(PARAM => sub { return 'value' });   

      # And TMPL_LOOPs:
      $self->param(LOOP_PARAM => 
                   [ 
                    { PARAM => VALUE_FOR_FIRST_PASS, ... }, 
                    { PARAM => VALUE_FOR_SECOND_PASS, ... } 
                    ...
                   ]
                  );

2) To set the value of a a number of parameters :

     # For simple TMPL_VARs:
     $self->param(PARAM => 'value', 
                  PARAM2 => 'value'
                 );

      # And with some TMPL_LOOPs:
      $self->param(PARAM => 'value', 
                   PARAM2 => 'value',
                   LOOP_PARAM => 
                   [ 
                    { PARAM => VALUE_FOR_FIRST_PASS, ... }, 
                    { PARAM => VALUE_FOR_SECOND_PASS, ... } 
                    ...
                   ],
                   ANOTHER_LOOP_PARAM => 
                   [ 
                    { PARAM => VALUE_FOR_FIRST_PASS, ... }, 
                    { PARAM => VALUE_FOR_SECOND_PASS, ... } 
                    ...
                   ]
                  );

3) To set the value of a a number of parameters using a hash-ref :

      $self->param(
                   { 
                      PARAM => 'value', 
                      PARAM2 => 'value',
                      LOOP_PARAM => 
                      [ 
                        { PARAM => VALUE_FOR_FIRST_PASS, ... }, 
                        { PARAM => VALUE_FOR_SECOND_PASS, ... } 
                        ...
                      ],
                      ANOTHER_LOOP_PARAM => 
                      [ 
                        { PARAM => VALUE_FOR_FIRST_PASS, ... }, 
                        { PARAM => VALUE_FOR_SECOND_PASS, ... } 
                        ...
                      ]
                    }
                   );

write_file(filename)

This method writes a PDF file. "filename" will most likely need to be a fully qualified path, for example '/home/daf/report.pdf'.

get_buffer()

Get a buffer containing the PDF. This is useful if you are going to stream the PDF directly to a browser:

  my $buf = $rpt->get_buffer();
  print "Content-Type: application/pdf\n";
  print "Content-Length: " . length($buf) . "\n";
  print "Content-Disposition: inline; filename=hello.pdf\n\n";
  print $buf;

XML Reference

PDF layout is defined in XML. Programatically, all you need to know is the few functions discussed above. The bulk of things to know about using PDF::Template is the specification of template elements. This section is a reference for those elements.

Example XML code can be found in the examples subdirectory.

All XML objects fall into one of two categories: Containers or Elements.

A Word on Layout

Coordinates

A coordinate is a pair (x,y) of numbers representing a point on a page. The 'x' part of the pair represents the distance from the left edge of the page, while the 'y' component represents the distance from the bottom.

Coordinates for PDF::Template are based on an origin of (0,0) in the lower left corner of the document. Coordinates are measured in points, so a position of (72,72) corresponds to a point one inch from the bottom and one inch from the left of the page.

Pagination

The challenge in writing a PDF template class, as opposed to an HTML or Text based template, is pagination. Simply stated, the pagination problem is that of determining:

  • What is the Y position of a given element?

  • Where should a page break occur?

Some items, such as those found in headers or footers of reports, are fixed and should always appear in the same position on each page.

Containers

There are only a few containers.

<PAGEDEF>

A pagedef can have the following attributes:

  • pagesize

    Indicates the size of the page. Can be A3 or A4.

  • landscape

    Set this parameter to '1' to swap width and height. Default is portrait mode.

  • nopagenumber

    Set this to '1' for these pages not to be counted in the global page number count. This could be useful, for example, in a title page. Defaults to '0'. Page numbers are accessible in the global '__PAGE__' variable.

<LOOP>

This is the standard looping construct.

Within a loop, several additional variables are available:

  • __FIRST__

  • __LAST__

  • __INNER__

  • __ODD__

Loops can have the following attributes:

  • Y

    If the current Y position has not yet been set when this loop is encountered, it will be set to Y.

  • Y2

    The loop will cause a page break when the current Y position exceeds this value.

  • MAXITERS

    If set, this determins the maximum number of rows in a loop that can appear per page. If you want only 3 items to appear per page, set MAXITERS=3.

<ROW H='20'>

A row is a container of elements that has a specific height, specified by the H attribute. Rows typically exist inside loops. A row is rendered at the current Y position. The Y position is then updated by subtracting the row's height.

<IF name='' is=''>

This is the construct necessary for conditional inclusion of elements in the page.

Name is the name of a variable passed in through the param() function

The 'is' parameter can be either 'true' or 'false'. If it is set to 'true', the elements are included if 'name' evaluates to true. If set to 'false', the elements are included if name evaluates to false.

A more traditional if/else structure is not acheivable in XML. An if else can be implemented in PDF::Template as:

  <if name='beavis' is='true'>
    ... beavis stuff here ...
  </if>
  <if name='beavis' is='false'>
    ... Hopefully this is never executed
  </if>
  

I considered nesting <true> and <false> tags in the if, but I think the notation I chose is simpler for the average case.

<ALWAYS>

Use this tag to indicate that the elements in this container will appear on every page. This is mose useful when a LOOP element in a PAGEDEF causes it to span multiple pages. In this case, you could use ALWAYS to make headers and footers appear on every page. Otherwise, items before the LOOP would only appear on the first page and items after the loop would only appear on the last page.

Elements

In general, an element represents a specific item on a PDF.

Bookmark

 <bookmark name="">Bookmark text, possibly with vars...</bookmark>

Inserts a top level bookmark into the document. The text of the bookmark is the text between the two tags. This text may contain <var> objects.

PDF supports nested bookmarks. I have not yet implemented these.

font

 <font face='Courier' size='12'></font>
 <font face="Century Gothic" encoding="host"/>  # On win32: a truetype font

Changes the current font. Size is font size in points (72pts=1 inch). Face is the name of the font. Currently only the PDF core fonts are supported:

  • Courier

  • Courier-Bold

  • Courier-Oblique

  • Courier-BoldOblique

  • Helvetica

  • Helvetica-Bold

  • Helvetica-Oblique

  • Helvetica-BoldOblique

  • Times-Roman

  • Times-Bold

  • Times-Italic

  • Times-BoldItalic

  • Symbol

  • ZapfDingbats

On Windows systems, you may specify truetype fonts by adding encoding="host" to the tag and specifying the name of the font in the face parameter:

Image

 <image type='jpeg' scale='' x='' y=''>/file/name/here.jpg</image>
 <image type='gif'><var name='fname'></var></image>
 <image border='1' color='255,0,0'>something.gif</image>
 

Inserts an image into the document.

Type should be one of 'png','gif','jpeg', or 'tiff'. If type is omitted, it will automatically be set as the lowercase file extension (jpg maps to jpeg). If the file extension cannot be determined, an error will be generated.

The path to the image is between the start and end tags. It may contain text and variables.

You may have to play with the scale parameter. It is passed directly to PDFLib.

Images may have borders by specifying BORDER='1'. The border will be drawn in the current color (probably black) unless you also specify a COLOR attribute. Colors are specified as RGB values.

Automatic scale calculation based on desired image width (W) or height (H). Only one of atributes W, H or SCALE can be specified for an image.

Line

 <line x1='50' y1='50' x2='100' y2='100' width='2' color='0,255,0' />
 

Draws a line from (x1,y1) to (x2,y2). Width is 1 unless specified with the width parameter. The line is drawn in the current color (probably black) unless an RGB color is specified with the COLOR parameter.

Page-Break

 <page-break></page-break>

Inserts a page break. If you are using it within a loop, consider

 <if name="__LAST__" is="false">
 <page-break></page-break>
 </if>

to avoid an extra page break at the end of the loop.

Circle

  <circle x='50' y='50' r='25' color='255,0,0' fillcolor='0,0,100' width='2' />
  

Inserts a circle. The 'x' and 'y' parameters are its center and the 'r' parameter is its radius. If the circle is contained in a loop, Y will act as an offset from the loop's current Y position; otherwise, it will function as an absolute coordinate.

The color parameter is optional and determines the color of the line.

The width parameter is optional and determines the width of the line. It defaults to 1.

The fillcolor parameter is optional and determines the color of the interior of the circle.

TextBox

 <textbox name='' border='' bgcolor='r,g,b' border=0>insert text here</textbox>
 
 <textbox>
   Hello, <var name='username' />, how are you today?
 </textbox>

Places text on the page.

  • border

    Set this to 1 to draw a black border around the text box. If omitted, defaults to no border.

  • bgcolor

    The background color for the box can be set with the bgcolor attribute. This attribute takes r,g, and b values from 0 to 255. Unfortunately, it does not look like PDF supports different foreground colors for text.

  • X

    If an 'X' attribute is specified, it will be used as the X coordinate for the left hand side of this textbox. If X is omitted, the current X position will be used. Omission of X may be useful when you want text to immediately follow a previous text box.

  • Y

    This is the most (potentially) confusing attribute, as it may behave one of two ways.

    If the TextBox is in a container other than a PageDef, the Y attribute is treated as an offset from the current Y position. In this case, it can be omitted (equivalent to an offset of 0) or specified, in which case it is subtracted from the current Y position prior to rendering text.

    If the TextBox is in the PageDef container, the Y position must be specified and is treated as an absolute position.

  • LMARGIN

    If this parameter is used, text drawn in the box is moved to the right. This can be used to keep text from touching the border when border='1' is specified.

  • RMARGIN

    If this parameter is used, the right edge of the text drawn in the box is moved to the left. This can be used to keep text from touching the border when border='1' is specified, especially if text is right justified.

Pos

 <pos Y='400'></pos>

This is still an experimental element. Currently it only takes one parameter, 'Y', which sets the absolute Y position.

I may add X info or relative movement. Let me know if you have an opinion.

AUTHOR

David Ferrance (dave@ferrance.com)

I maintain forums at http://www.ferrance.com for the discussion of modules I have written. I prefer you post questions in the forums (rather than email) because they may be of use to other people.

LICENSE

PDF::Template - Create PDF files from XML Templates.

Copyright (C) 2002 David Ferrance (dave@ferrance.com). All Rights Reserved.

This module is free software. It may be used, redistributed and/or modified under the same terms as perl itself.

SEE ALSO

perl(1), HTML::Template

7 POD Errors

The following errors were encountered while parsing the POD:

Around line 1913:

=back doesn't take any parameters, but you said =back 4

Around line 2039:

=back doesn't take any parameters, but you said =back 4

Around line 2070:

=back doesn't take any parameters, but you said =back 4

Around line 2088:

=back doesn't take any parameters, but you said =back 4

Around line 2109:

=back doesn't take any parameters, but you said =back 4

Around line 2206:

=back doesn't take any parameters, but you said =back 4

Around line 2334:

=back doesn't take any parameters, but you said =back 4