Formatter - The Formatter API specification




Formatters are Perl Modules conforming to the following specification. Formatters are intended to assist the conversion between different markup syntaxes.


The basic idea of Formatters is to have a simple and standard way to convert from one format to another. This is a common problem across many applications, and so, a simple API for all applications to use is desireable.

Formatters operate on strings. Formatters can convert any string from any format to any other format. For example, you have a plain text string, possibly with a bit of syntax, and you want to convert it to HTML. You will simply use the appropriate Formatter module, and call the format constructor method on it, with the text string as parameter. You may then call either the document or fragment method to get HTML returned.


Module naming convention

A Formatter module should be named with the format it is converted to first, then the format it is converted from. For example, the module Formatter::HTML::Textile will convert from the Textile syntax to HTML.


format($string [, further parameters])

This is a constructor method and shall initialize the formatter. As argument it must take a string with the text that one wants converted. You may pass additional parameters to the constructor, but the Formatter may not rely on it being present. It must not issue a warning or croak if the parameters are not present and must use a sensible default for any missing parameters.

This method must return the object as a blessed reference.


The document method may be called on the object after it has been initialized with the format method. It takes an optional parameter that specifies the character set of the document. The document method must include the charset declaration as appropriate for the output format. For HTML this is a meta element, as specified in section 5.2.2 of the HTML4 specification http://www.w3.org/TR/REC-html40/charset.html#spec-char-encoding. For XML, it can be set with the encoding parameter in the Prolog.

It must return a full document. In the case where an underlying helper module has no concept of full document, the method must nevertheless make a best effort to return something that can be regarded a standalone document.


The fragment method may be called on the object after it has been initialized with the format method. It shall only return a minimal fragment of the converted text, as little as possible markup shall be added to the fragment. In the case where only a full document is available from an underlying helper module, it should make a best effort to strip down to a minimal fragment.


This method should return all links found the input plain text string as an arrayref where each element is a hash, with keys url and title, the former containing the URL, the latter the text of the link. If none can be found, an empty list should be returned. If no title can be found, the title key should have an empty string. The title must contain only character data, no markup.


This method should return the title of the document or undef if none can be found.

Return Perl character strings

The methods that return string must make sure those strings are proper character-oriented strings, not byte sequences as they were prior to Perl 5.6.

Inheritance from other modules

A Formatter module may inherit methods from other modules. It may inherit all the methods mentioned above if they exist in a suitable parent class, and also other methods, to aid setting syntax-specific parameters.

Formatter module implementors are encouraged to contact the API author(s) to discuss methods that should be included in the API.

Meaning of fragment vs. document

It is to be anticipated that not all formats have a concept of full document and others not a fragment. To save the user the trouble of dealing with an error situation, the Formatter must make a best effort to return both. What is meant by a fragment and a full document varies from format to format, and must be dealt with on a per format basis.

In the case where it really doesn't make sense to return either a fragment or document, the Formatter may produce a warning, but must nevertheless return a best effort fragment or document.

For HTML, a full document is understood to be a complete and valid HTML document. The largest possible HTML fragment consists of the child elements of the body element, excluding body itself.

For XML, any well-formed XML document can be a full document, and any well-balanced XML region can be a fragment. An XML fragment should not contain a Prolog or Document Type Declaration.


Kjetil Kjernsmo, <kjetilk@cpan.org>


This specification is currently maintained in a Subversion repository. The trunk can be checked out anonymously using e.g.:

  svn checkout http://svn.kjernsmo.net/Formatter/trunk Formatter


The Formatter API was originally conceived on the openguides channel on irc.perl.org. In particular, Tom Insam was an important architect of the API.


The module Formatter::HTML::Preformatted contains a minimal Formatter by the author of the specification.


Copyright (C) 2005 by Kjetil Kjernsmo

This specification can be redistributed and/or modified under the same terms as Perl itself. The author asks that only modules conformant with the specification uses the Formatter:: namespace.

