MediaWiki::CleanupHTML - cleanup the MediaWiki-generated HTML from MediaWiki embellishments.
version 0.0.3
use MediaWiki::CleanupHTML; open my $fh, '<:encoding(UTF-8)', $filename or die "Cannot open '$filename' - $!"; my $cleaner = MediaWiki::CleanupHTML->new({ fh => $fh }); open my $out_fh, '>:encoding(UTF-8)', $processed_filename or die "Cannot open '$processed_filename' for output - $!"; $cleaner->print_into_fh($out_fh); $cleaner->destroy_resources();
The HTML rendered on MediaWiki pages is full of MediaWiki-specific embellishments such as edit sections. This module attempts to clean it up and return a more straightforward HTML. Note that the HTML returned by MediaWiki APIs may not always available (for instance if the wiki is down), so this module should be considered a fallback.
Version 0.0.3
The constructor - accepts the filehandle from which to read the XHTML.
Output to a filehandle. The filehandle should be able to process UTF-8 output.
Destroy the allocated resources (of the HTML::TreeBuilder tree, etc.). Must be called before destruction.
Shlomi Fish, http://www.shlomifish.org/ .
Please report any bugs or feature requests to bug-mediawiki-cleanuphtml at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=MediaWiki-CleanupHTML. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-mediawiki-cleanuphtml at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc MediaWiki::CleanupHTML
You can also look for information at:
MetaCPAN
http://metacpan.org/release/MediaWiki-CleanupHTML
RT: CPAN's request tracker (report bugs here)
http://rt.cpan.org/NoAuth/Bugs.html?Dist=MediaWiki-CleanupHTML
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/MediaWiki-CleanupHTML
CPAN Ratings
http://cpanratings.perl.org/d/MediaWiki-CleanupHTML
The developers of HTML::TreeBuilder::XPath, HTML::TreeBuilder and related modules for their helpful code.
Copyright 2012 Shlomi Fish.
This program is distributed under the MIT (X11) License: http://www.opensource.org/licenses/mit-license.php
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Shlomi Fish <shlomif@cpan.org>
This software is Copyright (c) 2012 by Shlomi Fish.
This is free software, licensed under:
The MIT (X11) License
Please report any bugs or feature requests on the bugtracker website http://rt.cpan.org/NoAuth/Bugs.html?Dist=MediaWiki-CleanupHTML or by email to bug-mediawiki-cleanuphtml@rt.cpan.org.
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.
A modern, open-source CPAN search engine, useful to view POD in HTML format.
Search CPAN
The default CPAN search engine, useful to view POD in HTML format.
http://search.cpan.org/dist/MediaWiki-CleanupHTML
RT: CPAN's Bug Tracker
The RT ( Request Tracker ) website is the default bug/issue tracking system for CPAN.
AnnoCPAN
The AnnoCPAN is a website that allows community annotations of Perl module documentation.
The CPAN Ratings is a website that allows community ratings and reviews of Perl modules.
CPAN Forum
The CPAN Forum is a web forum for discussing Perl modules.
http://cpanforum.com/dist/MediaWiki-CleanupHTML
CPANTS
The CPANTS is a website that analyzes the Kwalitee ( code metrics ) of a distribution.
http://cpants.perl.org/dist/overview/MediaWiki-CleanupHTML
CPAN Testers
The CPAN Testers is a network of smokers who run automated tests on uploaded CPAN distributions.
http://www.cpantesters.org/distro/M/MediaWiki-CleanupHTML
CPAN Testers Matrix
The CPAN Testers Matrix is a website that provides a visual overview of the test results for a distribution on various Perls/platforms.
http://matrix.cpantesters.org/?dist=MediaWiki-CleanupHTML
CPAN Testers Dependencies
The CPAN Testers Dependencies is a website that shows a chart of the test results of all dependencies for a distribution.
http://deps.cpantesters.org/?module=MediaWiki::CleanupHTML
Please report any bugs or feature requests by email to bug-mediawiki-cleanuphtml at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=MediaWiki-CleanupHTML. You will be automatically notified of any progress on the request by the system.
The code is open to the world, and available for you to hack on. Please feel free to browse it and play with it, or whatever. If you want to contribute patches, please send me a diff or prod me to pull from your repository :)
http://bitbucket.org/shlomif/perl-mediawiki-cleanuphtml
hg clone ssh://hg@bitbucket.org/shlomif/perl-mediawiki-cleanuphtml
To install MediaWiki::CleanupHTML, copy and paste the appropriate command in to your terminal.
cpanm
cpanm MediaWiki::CleanupHTML
CPAN shell
perl -MCPAN -e shell install MediaWiki::CleanupHTML
For more information on module installation, please visit the detailed CPAN module installation guide.