@<Biblio::Document::Parser::Utils> - utility module for handling International characters and document conversion


Biblio::Document::Parser::Utils provides some utility functions for handling international characters and for conversion of documents to plaintext.


        use Biblio::Document::Parser::Utils qw( normalise_multichars );

        print normalise_multichars( $str );


$str = normalise_multichar( $str )

Convert multi-char international characters into single UTF-8 chars, e.g.: ¨o => ö These appear in pdftotext output from PDFs generated by pdflatex.

$content = ParaTools::Utils::get_content($location)

This function takes either a filename or a URL as a parameter, and aims to return a string containing the lines in the file. A hash of converters is provided in ParaTools/, which should be customised for your system.

For URLs, the file is first downloaded to a temporary directory, then converted, whereas local files are copied straight into the temporary directory. For this reason, some care should be taken when handling very large files.

$escaped_url = ParaTools::Utils::url_escape($string)

Simple function to convert a string into an encoded URL (i.e. spaces to %20, etc). Takes the unencoded URL as a parameter, and returns the encoded version.


Tim Brody <> Mike Jewell <> (packaging)

