Mike Jewell > Biblio-Document-Parser-1.10 > Biblio::Document::Parser::Standard

Download:
docparser/Biblio-Document-Parser-1.10.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Source  

NAME ^

Biblio::Document::Parser::Standard - document parsing functionality

SYNOPSIS ^

  use Biblio::Document::Parser::Standard;
  use Biblio::Document::Parser::Utils;
  # First read a file into an array of lines.
  my $content = Biblio::Document::Parser::Utils::get_content("http://www.foo.com/myfile.pdf");
  my $doc_parser = new Biblio::Document::Parser::Standard();
  my @references = $doc_parser->parse($content);
  # Print a list of the extracted references.
  foreach(@references) { print "-> $_\n"; } 

DESCRIPTION ^

Biblio::Document::Parser::Standard provides a fairly simple implementation of a system to extract references from documents.

Various styles of reference are supported, including numeric and indented, and documents with two columns are converted into single-column documents prior to parsing. This is a very experimental module, and still contains a few hard-coded constants that can probably be improved upon.

METHODS ^

$parser = Biblio::Document::Parser::Standard->new()

The new() method creates a new parser instance.

@references = $parser->parse($lines, [%options])

The parse() method takes a string as input (see the get_content() function in Biblio::Document::Parser::Utils for a way to obtain this), and returns a list of references in plain text suitable for passing to a CiteParser module.

CHANGES ^

- 2003/05/13 Removed Perl warnings generated from parse() by adding checks on the regexps

AUTHOR ^

Mike Jewell <moj@ecs.soton.ac.uk> Tim Brody <tdb01r@ecs.soton.ac.uk>

syntax highlighting: