The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::Merge - v.0.36 General purpose text/data merging methods in Perl.

SYNOPSIS

        $merge = new Text::Merge;

        $merge->line_by_line();         # query
        $merge->line_by_line(0);        # turn off
        $merge->line_by_line(1);        # turn on

        $merge->set_delimiters('<<', '>>');            # user defined delims

        $success = $merge->publish($template, \%data);
        $success = $merge->publish($template, \%data, \%actions);
        $success = $merge->publish($template, $item);

        $success = $merge->publish_to($handle, $template, \%data);
        $success = $merge->publish_to($handle, $template, \%data, \%actions);
        $success = $merge->publish_to($handle, $template, $item);

        $text = $merge->publish_text($template, \%data);
        $text = $merge->publish_text($template, \%data, \%actions);
        $text = $merge->publish_text($template, $item);

        $success = $merge->publish_email($mailer, $headers, $template, \%data);
        $success = $merge->publish_email($mailer, $headers, $template, 
                                                             \%data, \%actions);
        $success = $merge->publish_email($mailer, $headers, $template, $item);

        $datahash = $merge->cgi2data();        # if you used "CGI(:standard)"
        $datahash = $merge->cgi2data($cgi);    # if you just used CGI.pm

DESCRIPTION

The Text::Merge package is designed to provide a quick, versatile, and extensible way to combine presentation templates and data structures. The Text::Merge package attempts to do this by assuming that templates are constructed with text and that objects consist of data and functions that operate on that data. Text::Merge is very simple, in that it works on one file and one object at a time, although an extension exists to display lists (Text::Merge::Lists) and Text::Merge itself could easily be extended further.

This is not XML and is intended merely to "flatten" the learning curve for non-programmers who design display pages for programmers or to provide programmers with a quick way of merging page templates with data sets or objects without extensive research.

The templates can be interpreted "line by line" or taken as a whole.

Technical Details

This object is normally inherited and so the new() function is the constructor. It just blesses an anonymous HASH reference, sets two flags within that HASH, and returns it. I'm am acutely aware of the criticisms of the overuse of OOP (Object Oriented Programming). This module needs to be OO because of its extensibility and encapsulation; I wanted to impose classification of the objects to allow the greatest flexibility in context of implementation. Text::Merge is generally used on web servers, and can become integrated quickly into the httpd using mod_perl, hence the encapsulation and inheritance provided by the Perl OO model clearly outweighed the constraints thereby imposed. That's my excuse...what's yours?

There are four public methods for the Text::Merge object: publish(), publish_to(), publish_text(), publish_email(). The first, publish(), sends output to the currently selected file handle (normally STDOUT). The second method, publish_text(), returns the merged output as a text block. The last method, publish_email(), sends the merged output as a formatted e-mail message to the designated mailer.

Support is provided to merge the data and the functions performed on that data with a text template that contains substitution tag markup used to designate the action or data conversion. Data is stored in a HASH that is passed by reference to the publishing methods. The keys of the data hash correspond to the field names of the data, and they are associated with their respective values. Actions (methods) are similarly referenced in a hash, keyed by the action name used in the template.

Here is a good example of a publishing call in Perl:

        $obj = new Text::Merge;
        %data = ( 'Name'=>'John Smith', 'Age'=>34, 'Sex'=>'not enough' );
        %actions = ( 'Mock' => \&mock_person,  'Laud' => \&laud_person );
        $obj->publish($template, \%data, \%actions);

In this example, mock_person() and laud_person() would be subroutines that took a single hash reference, the data set, as an argument. In this way you can create dynamic or complex composite components and reference them with a single tag in the template. The actions HASH has been found to be useful for default constructs that can be difficult to code manually, giving page designers an option to work with quickly.

Markup Tags

Simply put, tags are replaced with what they designate. A tag generally consists of a prefix, followed by a colon, then either an action name or a field name followed by zero or more formatting directives seperated by colons. In addition, blocks of output can be contained within curly brackets in certain contexts for conditional display.

REF: tags

Simple data substitution is achieved with the REF: tag. Here is an example of the use of a REF: tag in context, assume we have a key-value pair in our data HASH associating the key 'Animal' with the value of 'turtle':

        The quick brown REF:Animal jumped over the lazy dog.

when filtered, becomes:

        The quick brown turtle jumped over the lazy dog.

The REF: tag designators may also contain one or more format directives. These are chained left to right, and act to convert the data before it is displayed. For example:

        REF:Animal:lower:trunc3

would result in the first three letters of the SCALAR data value associated with Animal in lower case. See the section, Data Conversions Formats, for a list of the available SCALAR data formatting directives. Note that some conversions may be incompatible or contradictory. The system will not necessarily warn you of such cases, so be forewarned.

Any REF: tag designator can be surrounded by curly brace pairs containing text that would be included in the merged response only if the result of the designator is not empty (has a length). There must be no spaces between the tag and the curly braced text. If line-by-line mode is turned off, then the conditional text block may span multiple lines. For example:

        The {quick brown }REF:Animal{ jumps over where the }lazy dog lies.

Might result in:

        The quick brown fox jumps over where the lazy dog lies.

or, if the value associated with the data key 'Animal' was undefined, empty, or zero:

        The lazy dog lies.
IF: tags

The IF: tag designators performs a conditional display. The syntax is as follows:

        IF:FieldName:formats{Text to display}

This designator would result in the string Text to display being returned if the formatted data value is not empty. The curly braced portion is required, and no curly braces are allowed before the designator.

NEG: tags

The NEG: tag designator is similar to the IF: tag, but the bracketed text is processed only if the formatted data value is empty (zero length) or zero. Effectively the NEG: can be thought of as if not. Here is an example:

        NEG:FieldName:formats{Text to display if the result is empty.}
ACT: tags

The ACT: tag designates that an action is to be performed (a subroutine call) to obtain the result for substition. The key name specified in the designator is used to look up the reference to the appropriate subroutine, and the data HASH reference is passed as the sole argument to that subroutine. The returned value is the value used for the substition.

ACT: is intended to be used to insert programmatic components into the document. It can only specify action key names and has no equivalent tags to IF: and NEG:. The curly brace rules for the ACT: tag are exactly the same as those for the REF: tag.

Conditional Text Braces

All tags support conditional text surrounded by curly braces. If the line_by_line() switch is set, then the entire tag degignator must be on a single line of text, but if the switch is OFF (default) then the conditional text can span multiple lines.

The two conditional tags, IF: and NEG:, require a single conditional text block, surrounded by curly braces, immediately following (suffixing) the field name or format string. For example:

        IF:SomeField{this text will print}

The REF: and ACT: tags allow for curly braces both at the beginning (prefixing) and at the end (suffixing). For example:

        {Some optional text }REF:SomeValue{ more text.}
Command Braces

You may bracket entire constructs (along with any conditional text) with double square brackets to set them off from the rest of the document. The square brackets would be removed during substitution:

        The [[IF:VerboseVar{quick, brown }]]fox jumped over the lazy dog.

assuming that 'VerboseVar' represented some data value, the above example would result in one of:

        The quick, brown fox jumped over the lazy dog.
or
        The fox jumped over the lazy dog.
Data Conversion Formats

Here is a list of the data conversion format and the a summary. Details are undetermined in some cases for exceptions, but all of the conversion to some satisfactory degree. These conversion methods will treat all values as SCALAR values:

        upper   -  converts all lowercase letters to uppercase
        lower   -  converts all uppercase letters to lower
        proper  -  treats the string as a Proper Noun 
        trunc## -  truncate the scalar to ## characters (## is an integer)
        words## -  reduce to ## words seperated by spaces (## is an integer)
        paragraph## -  converts to a paragraph ## columns wide
        indent## - indents plain text ## spaces
        int     -  converts the value to an integer
        float   -  converts the value to a floating point value
        string  -  converts the numeric value to a string (does nothing)
        detab   -  replaces tabs with spaces, aligned to 8-char columns
        html    -  replaces newlines with HTML B<BR> tags
        dollars -  converts the value to 2 decimal places
        percent -  converts the value to a percentage
        abbr    -  converts a time value to m/d/yy format
        short   -  converts a time value to m/d/yy H:MMpm format
        time    -  converts a time value to H:MMpm (localtime am/pm)
        24h     -  converts a time value to 24hour format (localtime)
        dateonly - converts a time value to Jan. 1, 1999 format
        date    - same as 'dateonly' with 'time'
        ext     -  converts a time value to extended format:
                        Monday, Januay 12th, 1999 at 12:20pm
        unix    -  converts a time value to UNIX date string format
        escape  -  performs a browser escape on the value (&#123;)
        unescape - performs a browser unescape (numeric only)
        urlencode - performs a url encoding on the value (%3B)
        urldecode - performs a url decoding (reverse of urlencode)

Most of the values are self-explanatory, however a few may need explanation:

The trunc format must be suffixed with an integer digit to define at most how many characters should be displayed, as in trunc14.

The html format just inserts a <BR> construct at every newline in the string. This allows text to be displayed appropriately in some cases.

The escape format performs an HTML escape on all of the reserved characters of the string. This allows values to be displayed correctly on browsers in most cases. If your data is not prefiltered, it is usually a good idea to use escape on strings where HTML formatting is prohibited. For example a '$' value would be converted to '&#36;'.

The unescape format does the reverse of an escape format, however it does not operate on HTML mnemonic escapes, allowing special characters to remain intact. This can be used to reverse escapes inherent in the use of other packages.

The urlencode and urldecode formats either convert a value (text string) to url encoded format, converting special characters to their %xx equivalent, or converting to the original code by decoding %xx characters respectively from the url encoded value.

Item Support

The publishing methods all require at the very least a template, a data set, and the action set; although either the data set or the action set or both could be empty or null. You may also bundle this information into a single HASH (suitable for blessing as a class) with the key 'Data' associated with the data HASH reference, and the key 'Actions' associated with the action HASH reference. A restatement of a previous example might look like this:

        $obj = new Text::Merge;
        $data = { 'Name'=>'John Smith', 'Age'=>34, 'Sex'=>'not enough' };
        $actions = { 'Mock' => \&mock_person,  'Laud' => \&laud_person };
        $item = { 'Data' => $data,  'Actions' => $actions };
        $obj->publish($template, $item);

In addition, if you specify a key 'ItemType' in your $item and give it a value, then the item reference will be handed to any methods invoked by the ACT: tags, rather than just the data hash. This allows you to construct items that can be merged with templates. For example, the following code is valid:

        %data = ( 'Author' => 'various',  'Title' => 'The Holy Bible' );
        %actions = ( 'Highlight' => \&highlight_item );
        $item = { 'ItemType'=>'book', 'Data'=>\%data, 'Actions'=>\%actions };
        bless $item, Some::Example::Class;
        $obj->publish($template, $item);

In this last example, the designator ACT:Highlight would result in the object $item being passed as the only argument to the subroutine highlight_item() referenced in the action HASH.

Line by Line Mode

By default, the publishing methods slurp in the entire template and process it as a text block. This allows for multi-line conditional text blocks. However, in some cases the resulting output may be very large, or you may want the output to be generated line by line for some other reason (such as unbuffered output). This is accomplished through the line_by_line() method, which accepts an optional boolean value, which sets the current setting if specified or returns the current settingif not. Note that this has the most notable impact on the publish() and publish_email() methods, since the results of the merge operations are sent to a handle. If the line by line switch is set, then the publish_text() method will substitute line by line, but will still return the entire merged document as a single text block (not line by line).

This is turned OFF by default.

Templates

Templates consist of text documents that contain special substitution designators as described previously. The template arguments passed to the publishing functions can take one of three forms:

File Handle

This is a FileHandle object not a glob. You must use the FileHandle package that comes with the Perl distribution for this type of template argument. Processing begins at the current file position and continues until the end of file condition is reached.

File Path

If the argument is a scalar string with no whitespace, it is assumed to be a file path. The template at that location will be used when merging the document.

Text Block

If the argument is a scalar string that contains whitespace, it is assumed to be the actual text template. Substitution will be performed on a locally scoped copy of this argument.

Note that you should not use this type of template argument if your template is very large and you are using line by line mode. In this case you should use a FileHandle or file path argument.

Methods

new()

This method gives us a blessed hash reference, with the following attribute keys:

        _Text_Merge_LineMode

Other keys can be added by objects which inherit Text::Merge.

line_by_line($setting)

This method returns the current setting if the $setting argument is omitted. Otherwise it resets the line-by-line mode to the setting requested. A non-zero value tells the publishing methods to process the template line by line. For those methods that output results to a handle, then those results will also be echoed line by line.

set_delimiters($start, $end)

This method assigns a new command delimiter set for the tags (double square brackets by default). The 'colon' character is not allowed within the delimiter, and the delimiter may not be a single curly bracket. Both the $start and $end delimiters must be provided, and they cannot be identical.

publish($template, $dataref, $actionref)

This is the normal publishing method. It merges the specified template with the data and any provided actions. The output is sent to the currently selected handle, normally STDOUT.

publish_to($handle, $template, $dataref, $actionref)

This is similar to the normal publishing method. It merges the specified template with the data and any provided actions. The output is sent to the specified $handle or to the currently selected handle, normally STDOUT, if the $handle argument is omitted.

publish_text($template, $dataref, $actionref)

This method works similar to the publish_to() method, except it returns the filtered output as text rather than sending it to the currently selected filehandle.

publish_email($mailer, $headers, $filepath, $data, $actions)

This method is similar to publish() but opens a handle to $mailer, and sending the merged data formatted as an e-mail message. $mailer may contain the sequences RECIPIENT and/or SUBJECT. If either does not exists, it will be echoed at the beginning of the email (in the form of a header), allowing e-mail to be passed preformatted. This is the preferred method; use a mailer that can be told to accept the "To:", "Subject:" and "Reply-To:" fields within the body of the passed message and do not specify the RECIPIENT or SUBJECT tags in the $mailer string. Returns false if failed, true if succeeded. The recommended mail program is 'sendmail'. $headers is a HASH reference, containing the header information. Only the following header keys are recognized:

        To
        Subject
        Reply-To
        CC
        From (works for privileged users only)

The values associated with these keys will be used to construct the desired e-mail message header. Secure minded site administrators might put hooks in here, or even better clean the data, to protect access to the system as a precaution, to avoid accidental mistakes perhaps.

Note: the $mailer argument string should begin with the type of pipe required for your request. For sendmail, this argument would look something like (note the vertical pipe):

        '|/usr/bin/sendmail -t'

Be careful not to run this with write permission on the sendmail file and forget the process pipe!!!

cgi2data($cgi)

This method converts CGI.pm parameters to a data hash reference suitable for merging. The $cgi parameter is a CGI object and is optional, but you must have imported the :standard methods from CGI.pm if you omit the $cgi paramter. This method returns a hash reference containing the parameters as data. Basically it turns list values into list references and puts everything in a hash keyed by field name.

PREREQUISITES

This module was written and tested in Perl 5.005 and runs with -Tw set and use strict. It requires use of the package FileHandle which is part of the standard perl distribution.

AUTHOR

This software is released under the Perl Artistic License. Derive what you wish, as you wish, but please attribute releases and include derived source code. (C) 1997-2004 by Steven D. Harris, perl@nullspace.com