Treex::Block::W2A::EN::FixTokenization - fix some issues in output of tokenizer
version 2.20151102
Some abbreviations (with periods) are merged into one token. For example "e. g." is in Penn Treebank one token (with tag FW). Using only Treex::Block::W2A::EN::Tokenize we get four tokens: e . g . which may be distributed by the parser into different clauses. And this is hard to fix afterwards.
Treex::Core::Block
Martin Popel <popel@ufal.mff.cuni.cz>
Copyright © 2009 - 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Treex::EN, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Treex::EN
CPAN shell
perl -MCPAN -e shell install Treex::EN
For more information on module installation, please visit the detailed CPAN module installation guide.