Jeff Kubina > Text-Corpus-Inspec-1.00 > Text::Corpus::Inspec::Document

Download:
Text-Corpus-Inspec-1.00.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 1.00   Source  

NAME ^

Text::Corpus::Inspec::Document - Parse Inspec abstract for research.

SYNOPSIS ^

  use Text::Corpus::Inspec;
  use Text::Corpus::Inspec::Document;
  use Data::Dump qw(dump);
  my $corpus = Text::Corpus::Inspec->new (corpusDirectory => $corpusDirectory);
  my $document = $corpus->getDocument (index => 0);
  dump $document->getBody;
  dump $document->getCategories;
  dump $document->getContent;
  dump $document->getTitle;
  dump $document->getUri;

DESCRIPTION ^

Text::Corpus::Inspec::Document provides methods for accessing specific portions of Inspec abstracts for researching and testing of information processing methods.

CONSTRUCTOR ^

new

The method new creates an instance of the Text::Corpus::Inspec class with the following parameters:

filename or uri
 filename => '...' or uri => '...'

filename or uri must be the path name to the corpus document to be parsed. If the file does not exist, undef is returned. The path provided is returned by getUri.

METHODS ^

getBody

  getBody ()

getBody returns an array reference of strings of sentences that are the body of the article.

getCategories

 getCategories (type => 'all')

The method getCategories returns an array reference of strings that are the categories assigned to the document. The type must be either 'all', 'controlled', or 'uncontrolled', which specify the set of categories to be returned. 'uncontrolled' categories are those assigned to the document by an editor without machine assistance; whereas 'controlled' categories were assigned with machine assistance. The option 'all' returns the union of the categories under 'controlled' and 'uncontrolled'. The default is 'all'.

getContent

  getContent ()

getContent returns an array reference of strings of sentences that form the content of the article, the title and body of the article.

getTitle

  getTitle ()

getTitle returns an array reference of strings, usually one, of the title of the article.

getUri

  getUri ()

getUri returns the URI of the document.

INSTALLATION ^

For installation instructions see Text::Corpus::Inspec.

AUTHOR ^

 Jeff Kubina<jeff.kubina@gmail.com>

COPYRIGHT ^

Copyright (c) 2009 Jeff Kubina. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

KEYWORDS ^

inspec, english corpus, information processing

SEE ALSO ^

File::Slurp, Lingua::EN::Sentence, Text::Corpus::Inspec

syntax highlighting: