Treex::Core::DocumentReader - interface for all document readers
Document readers are a Treex concept how to load documents to be processed by Treex.
The documents can be stored in files (in various formats) or read from
STDIN or retrieved from a socket etc.
These methods must be implemented in classes that consume this role.
Return next document (Treex::Core::Document).
Total number of documents that will be produced by this reader.
If the number is unknown in advance,
undef should be returned.
Is the document that was most recently returned by
$self-next_document()> supposed to be processed by this job?
Job indices and document numbers are 1-based,
jobs = 5,
jobindex = 3 we want to load documents with numbers 3,8,13,18,...
jobs = 5,
jobindex = 5 we want to load documents with numbers 5,10,15,20,...
those documents where
(doc_number-1) % jobs == (jobindex-1).
Returns a next document which should be processed by this job.
jobindex is set,
returns "modulo number of jobs".
Total number of documents that will be produced by this reader for this job.
It's computed based on
Start reading again from the first document.
This implementation just sets the attribute
doc_number to zero.
You can add additional behavior using the Moose
after 'restart' construct.
Martin Popel <email@example.com>
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.