Mike Jewell > Biblio-Document-Parser-1.10 > @<Biblio::Document::Parser::Utils>

Download:
docparser/Biblio-Document-Parser-1.10.tar.gz

Annotate this POD

View/Report Bugs
Source  

NAME ^

@<Biblio::Document::Parser::Utils> - utility module for handling International characters and document conversion

DESCRIPTION ^

Biblio::Document::Parser::Utils provides some utility functions for handling international characters and for conversion of documents to plaintext.

SYNOPSIS ^

        use Biblio::Document::Parser::Utils qw( normalise_multichars );

        print normalise_multichars( $str );

METHODS ^

$str = normalise_multichar( $str )

Convert multi-char international characters into single UTF-8 chars, e.g.: ¨o => ö These appear in pdftotext output from PDFs generated by pdflatex.

$content = ParaTools::Utils::get_content($location)

This function takes either a filename or a URL as a parameter, and aims to return a string containing the lines in the file. A hash of converters is provided in ParaTools/Utils.pm, which should be customised for your system.

For URLs, the file is first downloaded to a temporary directory, then converted, whereas local files are copied straight into the temporary directory. For this reason, some care should be taken when handling very large files.

$escaped_url = ParaTools::Utils::url_escape($string)

Simple function to convert a string into an encoded URL (i.e. spaces to %20, etc). Takes the unencoded URL as a parameter, and returns the encoded version.

AUTHOR ^

Tim Brody <tdb01r@ecs.soton.ac.uk> Mike Jewell <moj@ecs.soton.ac.uk> (packaging)

syntax highlighting: