HTML2XHTML - Wrapper to command-line program that converts from HTML 3.x/4.x to XHTML 1.0
To convert an HTML file (and no external CSS file) to XHTML:
my $foo = new HTML2XHTML(encoding => 'UTF-8', file_name => '..//foo//foo.html'); my $foo = new HTML2XHTML(encoding => 'ISO 8859-1', file => '..\foo\foo.html'); my $foo = new HTML2XHTML(encoding => 'ISO 8859-1', file => 'C://My Documents//foo//foo.html');
my $foo = new HTML2XHTML(encoding => 'ISO 8859-1', file_name => '../foo/foo.html');
To convert HTML files and external CSS files to XHTML:
my $foo = new HTML2XHTML(encoding => 'UTF-16', file_name => 'foo.html, foo.css, foo2.html, foo3.html, bar.css');
To convert a directory to XHTML:
my $foo = new HTML2XHTML(encoding => 'UTF-8', dir => '../foo');
To convert the current directory of HTML files and external CSS files to XHTML:
my $foo = new HTML2XHTML(encoding => 'UTF-8', dir => 'current');
HTML2XHTML is my first attempt at writing a module and distribution. It is a pure Perl implementation written to convert an HTML 4.01 Transitional page to XHTML 1.0 Strict (there is no check for usage of deprecated tags). I had been writing for almost a year now validating/conforming HTML 4.01 code, but had never bothered to go the extra steps to use XHTML. This module also supports converting external CSS documents and a directory of HTML and/or CSS documents.
There is support for approximately 75 HTML 3.x/4.x tags, including tags that have been deprecated in versions 3.x or 4.x (to maintain compatibility with the current document structure). In the next version, the deprecated tags may be replaced programmatically with the updated tag or module to (really) conform to XHTML 1.0 standards.
The original script was designed to be ran from the command-line with option flags for either a directory or files to be converted, and 2 days after completing the original, it was decided to create a module to act as a wrapper for the command-line program.
For those interested in command-line options, see below:
You can specify a whole directory to be converted either by putting 'dir' or 'directory' after the name of the command-line script followed by the name of the directory.
perl convert_xhtml2.pl encoding UTF-8 dir|directory [relative|absolute directory path] foo perl convert_xhtml2.pl encoding UTF-8 dir current perl convert_xhtml2.pl encoding UTF-8 directory ../foo/
You can also specify an HTML document, CSS document, or multiple documents (including the directory path if not in the same directory as the program) to convert followed by commas.
perl convert_xhtml2.pl encoding ISO 8859-1 [relative|absolute directory path] foo.html perl convert_xhtml2.pl encoding ISO 8859-1 ../bar/foo.html, bar.css
my $foo = new HTML2XHTML(encoding => 'ISO 8859-1', file_name => '..//foo//foo.html');
This method invokes the command-line program with the options passed. The options can either be a list of file names (either absolute or relative path) separated by commas, or the name of a directory.
For files, there is an abbreviated form of the key, which is 'file', if you do not wish to write 'file_name'.
This module requires Cwd and Perl 5.001 or later.
None by default.
http://www.w3.org/TR/xhtml1/ XHTML 1.0 The Extensible HyperText Markup Language (Second Edition),
http://en.wikipedia.org/wiki/Character_encoding (Character) Encoding
Copyright (C) 2006-2007 by Obiora Embry.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.