Tomáš Kraut > Treex-EN > Treex::Tool::Segment::EN::RuleBased

Download:
Treex-EN-0.08171.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.08171   Source   Latest Release: Treex-EN-0.13095

NAME ^

Treex::Tool::Segment::EN::RuleBased - rule based sentence segmenter for English

VERSION ^

version 0.08171

DESCRIPTION ^

Sentence boundaries are detected based on a regex rules that detect end-sentence punctuation ([.?!]) followed by a uppercase letter. This class adds a English specific list of "unbreakers", i.e. tokens that usually do not end a sentence even if they are followed by a period and a capital letter.

See Treex::Block::W2A::Segment

AUTHOR ^

Martin Popel <popel@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE ^

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: