
@<Biblio::Document::Parser::Utils> - utility module for handling International characters and document conversion

Biblio::Document::Parser::Utils provides some utility functions for handling international characters and for conversion of documents to plaintext.

use Biblio::Document::Parser::Utils qw( normalise_multichars );
print normalise_multichars( $str );

Convert multi-char international characters into single UTF-8 chars, e.g.: ¨o => ö These appear in pdftotext output from PDFs generated by pdflatex.
This function takes either a filename or a URL as a parameter, and aims to return a string containing the lines in the file. A hash of converters is provided in ParaTools/Utils.pm, which should be customised for your system.
For URLs, the file is first downloaded to a temporary directory, then converted, whereas local files are copied straight into the temporary directory. For this reason, some care should be taken when handling very large files.
Simple function to convert a string into an encoded URL (i.e. spaces to %20, etc). Takes the unencoded URL as a parameter, and returns the encoded version.

Tim Brody <tdb01r@ecs.soton.ac.uk> Mike Jewell <moj@ecs.soton.ac.uk> (packaging)