Dušan Variš > Treex-Unilang > Treex::Block::Read::BaseAlignedReader

Download:
Treex-Unilang-0.13095.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.13095   Source  

NAME ^

Treex::Block::Read::BaseAlignedReader - abstract ancestor for parallel-corpora document readers

VERSION ^

version 0.13095

SYNOPSIS ^

  # in scenarios
  Read::MyAlignedFormat en=english.txt de=german.txt

  # Zones can differ also in selectors, any number of zones can be read
  Read::MyAlignedFormat en_ref=ref1,ref2 en_moses=mos1,mos2 en_tectomt=tmt1,tmt2

DESCRIPTION ^

This class serves as a common ancestor for document readers that read more zones at once -- usually parallel sentences in two (or more) languages. The readers take parameters named as the zones and values of the parameters is a space or comma separated list of filenames to be loaded into the given zone. The class is designed to implement the Treex::Core::DocumentReader interface.

In derived classes you need to define the next_document method, and you can use next_filenames and new_document methods.

ATTRIBUTES ^

any parameter in a form of a valid zone_label

space or comma separated list of filenames, or - for STDIN.

file_stem (optional)

How to name the loaded documents. This attribute will be saved to the same-named attribute in documents and it will be used in document writers to decide where to save the files.

METHODS ^

next_document

This method must be overriden in derived classes. (The implementation in this class just issues fatal error.)

next_filenames

Returns a hashref of filenames (full paths) to be loaded. The keys of the hash are zone labels, the values are the filenames.

new_document($load_from?)

Returns a new empty document with pre-filled attributes loaded_from, file_stem, file_number and path which are guessed based on current_filenames.

current_filenames

returns the last filenames returned by next_filenames

number_of_documents

Returns the number of documents that will be read by this reader.

SEE ALSO ^

Treex::Block::Read::BaseReader Treex::Block::Read::BaseAlignedTextReader

AUTHOR ^

Martin Popel

COPYRIGHT AND LICENSE ^

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: