Dušan Variš > Treex-Unilang > Treex::Block::Read::CoNLLX

Download:
Treex-Unilang-2.20151102.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 2.20151102   Source  

NAME ^

Treex::Block::Read::CoNLLX

VERSION ^

version 2.20151102

DESCRIPTION ^

Document reader for CoNLL format. Each token is on separated line in the following format: ord<tab>form<tab>lemma<tab>cpos<tab>pos<tab>features<tab>head<tab>deprel Sentences are separated with blank line. The sentences are stored into bundles in the document.

See http://ilk.uvt.nl/conll/#dataformat.

ATTRIBUTES ^

from

space or comma separated list of filenames

lines_per_doc

number of sentences (!) per document

feat_is_iset

1 if the features field is a serialization of Interset (e.g. pos=adj|prontype=dem|number=plu|case=dat|person=3) to read it directly into the Interset represenation for each node. 0 by default.

deprel_is_afun

1 if the deprel field is an afun (e.g. Sb, Obj_M, Pnom) to read it directly into the afun field for each node (also strips _M and sets is_member to 1). 0 by default.

METHODS ^

next_document

Loads a document.

SEE ^

Treex::Block::Read::BaseTextReader Treex::Core::Document Treex::Core::Bundle

AUTHOR ^

David Mareček

COPYRIGHT AND LICENSE ^

Copyright © 2011-2013 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: