The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DocSet::Doc - A Base Document Class

SYNOPSIS

   # e.g. a subclass would do
   use DocSet::Doc::HTML2HTML ();
   my $doc = DocSet::Doc::HTML2HTML->new(%args);
   $doc->scan();
   my $meta = $doc->meta();
   my $toc  = $doc->toc();
   $doc->render();

   # internal methods
   $doc->src_read();
   $doc->src_filter();

DESCRIPTION

This super class implement core methods for scanning a single document of a given format and rendering it into another format. It provides sub-classes with hooks that can change the default behavior. Note that this class cannot be used as it is, you have to subclass it and implement the required methods listed later.

METHODS

  • new

  • init

  • scan

    scan the document into a parsed tree and retrieve its meta and toc data if possible.

  • render

    render the output document and write it to its final destination.

  • src_read

    Fetches the source of the document. The source can be read from different media, i.e. a file://, http://, relational DB or OCR :) (but these are left for subclasses to implement :)

    A subclass may implement a "source" filter. For example if the source document is written in an extended POD the source filter may convert it into a standard POD. If the source includes some template directives these can be pre-processed as well.

    The document's content is coming out of this class ready for parsing and converting into other formats.

  • meta

    a simple set/get-able accessor to the meta attribute.

  • toc

    a simple set/get-able accessor to the toc attribute

  • transform_src_doc

      my $doc_src_path = $self->transform_src_doc($path);

    search for the source doc with path of $path at the search paths defined by the configuration file search_paths attribute (similar to the @INC search in Perl) and if found resolve it to a relative to abs_doc_root path and return it. If not found return the undef value.

ABSTRACT METHODS

These methods must be implemented by the sub-classes:

retrieve_meta_data

Retrieve and set the meta data that describes the input document into the meta object attribute. Various documents may provide different meta information. The only required meta field is title.

These methods can be implemented by the sub-classes:

src_filter

A subclass may want to preprocess the source document before it'll be processed. This method is called after the source has been read. By default nothing happens.

AUTHORS

Stas Bekman <stas (at) stason.org>