HTML::WikiConverter - Convert HTML to wiki markup
use HTML::WikiConverter; my $wc = new HTML::WikiConverter( dialect => 'MediaWiki' ); print $wc->html2wiki( $html );
HTML::WikiConverter is an HTML to wiki converter. It can convert HTML source into a variety of wiki markups, called wiki "dialects". The following dialects are supported:
DokuWiki Kwiki MediaWiki MoinMoin Oddmuse PhpWiki PmWiki SlipSlap TikiWiki UseMod WakkaWiki WikkaWiki
Note that while dialects usually produce satisfactory wiki markup, not all features of all dialects are supported. Consult individual dialects' documentation for details of supported features. Suggestions for improvements, especially in the form of patches, are very much appreciated.
my $wc = new HTML::WikiConverter( dialect => $dialect, %attrs );
Returns a converter for the specified wiki dialect. Dies if
$dialect is not provided or its dialect module is not installed on your system. Attributes may be specified in
%attrs; see "ATTRIBUTES" for a list of recognized attributes.
$wiki = $wc->html2wiki( $html ); $wiki = $wc->html2wiki( html => $html ); $wiki = $wc->html2wiki( file => $file ); $wiki = $wc->html2wiki( file => $file, slurp => $slurp );
Converts HTML source to wiki markup for the current dialect. Accepts either an HTML string
$html or an HTML file
$file to read from.
You may optionally bypass
HTML::Parser's incremental parsing of HTML files (thus slurping the file in all at once) by giving
$slurp a true value.
my $html = $wc->parsed_html;
Returns HTML::TreeBuilder's string representation of the last-parsed syntax tree, showing how the input HTML was parsed internally. Useful for debugging.
my @dialects = HTML::WikiConverter->available_dialects;
Returns a list of all available dialects by searching the directories in
You may configure
HTML::WikiConverter using a number of attributes. These may be passed as arguments to the
new constructor, or can be called as object methods on a
Some dialects allow other attributes in addition to those below. Consult individual dialect documentation for details.
(Required) Dialect to use for converting HTML into wiki markup. See the "DESCRIPTION" section above for a list of dialects.
new will fail if the dialect given is not installed on your system.
URI to use for converting relative URIs to absolute ones. This effectively ensures that the
href attributes of image and anchor tags, respectively, are absolute before converting the HTML to wiki markup, which is necessary for wiki dialects that handle internal and external links separately. Relative URLs are only converted to absolute ones if the
base_uri argument is present. Defaults to
URI or a reference to a list of URIs used in determining which links are wiki links. This assumes that URLs to wiki pages are created by joining the
wiki_uri with the (possibly escaped) wiki page name. For example, the English Wikipedia might use
my $wc = new HTML::WikiConverter( dialect => $dialect, wiki_uri => [ 'http://en.wikipedia.org/wiki/', 'http://en.wikipedia.org/w/index.php?action=edit&title=' ] );
Ward's wiki might use
my $wc = new HTML::WikiConverter( dialect => $dialect, wiki_uri => 'http://c2.com/cgi/wiki?' );
The default is
undef, meaning that all links will be treated as external links.
See also the
wiki_page_extractor method, which provides a more flexible way of specifying how to extract page titles from URLs.
wiki_page_extractor can be used instead of
wiki_uri, giving you a more flexible way to extract page titles from URLs.
The attribute takes a coderef that extracts a wiki page title from the given URL. If
undef (the default), the built-in extractor (which attempts to extract wiki page titles from URIs based on the value of the
wiki_uri attribute) will be used instead.
The extractor subroutine will be passed two arguments, the current HTML::WikiConverter object and a URI object. The return value should be the title of the wiki page extracted from the URI given. If no page title can be found or the URI does not refer to a wiki page, then the extractor should return
undef, which will fallback to the built-in extractor (which functions as mentioned previously).
HTML::TreeBuilder parse HTML fragments by wrapping HTML in
</html> before passing it through
html2wiki. Boolean, enabled by default.
Specifies the encoding used by the HTML to be converted. Also determines the encoding of the wiki markup returned by the
html2wiki method. Defaults to
Removes HTML comments from the input before conversion to wiki markup. Boolean, enabled by default.
Removes the HTML
head element from the input before converting. Boolean, enabled by default.
Removes all HTML
script elements from the input before converting. Boolean, enabled by default.
Consult HTML::WikiConverter::Dialects for documentation on how to write your own dialect module for
HTML::WikiConverter. Or if you're not up to the task, drop me an email and I'll have a go at it when I get a spare moment.
David J. Iberri,
Please report any bugs or feature requests to
bug-html-wikiconverter at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HTML-WikiConverter. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
You can also look for information at:
Thanks to Tatsuhiko Miyagawa for suggesting Bundle::HTMLWikiConverter as well as providing code for the
available_dialects() class method.
Copyright 2006 David J. Iberri, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.