Kathryn Andersen > html2dbk > HTML::ToDocBook

Download:
html2dbk-0.03.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  1
Open  0
View/Report Bugs
Module Version: 0.03   Source  

NAME ^

HTML::ToDocBook - Converts an XHTML file into DocBook.

VERSION ^

This describes version 0.03 of HTML::ToDocBook.

SYNOPSIS ^

    use HTML::ToDocBook;

    my $obj = HTML::ToDocBook->new(%args);

    $obj->convert(infile=>$filename);

    # convert HTML file
    $obj->convert(infile=>$filename, html=>1);

DESCRIPTION ^

This module converts an XHTML file into DocBook format using both heuristics and XSLT processing. By default, this expects the input file to be correct XHTML -- there are other programs such as html tidy (http://tidy.sourceforge.net/) which can correct files for you; this does not do that.

Note also this is very simple; it doesn't deal with things like <div> or <span> which it has no way of guessing the meaning of. (For some, however, if they have class names which match DocBook tags, they will be turned into those tags) This does not merge multiple XHTML files into a single document, so this converts each XHTML file into a <chapter>, with each header being a section (sect1 to sect5). The <title> tag is used for the chapter title.

There will likely to be validity errors, depending on how good the original HTML was. There may be broken links, <xref> elements that should be <link>s, and overuse of <emphasis> and <emphasis role="bold">.

METHODS ^

new

    my $conv = HTML::ToDocBook->new();

    my $conv = HTML::ToDocBook->new(stylesheet=>$stylesheet);

Arguments:

stylesheet

A replacement XSLT stylesheet to use for conversions instead of the built-in one. This can either be a file name or a string containing the entire stylesheet.

convert

    $obj->convert(infile=>$filename,
                html=>1);

Arguments:

infile

The name of the file to convert.

html

Parse the input as HTML rather than XML.

Private Methods ^

These are not guaranteed to be stable.

insert_sections

    $my str = $obj->insert_sections($string);

This inserts <div class="sectN"> tags to enclose all levels of header. These will then be picked up by the XSLT stylesheet and converted into section tags.

REQUIRES ^

    Cwd
    File::Basename
    File::Spec
    XML::LibXML
    XML::LibXSLT
    HTML::SimpleParse
    Test::More

INSTALLATION ^

To install this module, run the following commands:

    perl Build.PL
    ./Build
    ./Build test
    ./Build install

Or, if you're on a platform (like DOS or Windows) that doesn't like the "./" notation, you can do this:

   perl Build.PL
   perl Build
   perl Build test
   perl Build install

In order to install somewhere other than the default, such as in a directory under your home directory, like "/home/fred/perl" go

   perl Build.PL --install_base /home/fred/perl

as the first step instead.

This will install the files underneath /home/fred/perl.

You will then need to make sure that you alter the PERL5LIB variable to find the modules, and the PATH variable to find the script.

Therefore you will need to change: your path, to include /home/fred/perl/script (where the script will be)

        PATH=/home/fred/perl/script:${PATH}

the PERL5LIB variable to add /home/fred/perl/lib

        PERL5LIB=/home/fred/perl/lib:${PERL5LIB}

SEE ALSO ^

perl(1).

BUGS ^

Please report any bugs or feature requests to the author.

AUTHOR ^

    Kathryn Andersen (RUBYKAT)
    perlkat AT katspace dot com
    http://www.katspace.org/tools

COPYRIGHT AND LICENCE ^

XSLT stylesheet based on the one at http://wiki.docbook.org/topic/Html2DocBook by Jeff Beal

Copyright (c) 2006 by Kathryn Andersen

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: