Treex::Core::Run + treex - applying Treex blocks and/or scenarios on data
version 0.08664
In bash:
> treex myscenario.scen -- data/*.treex > treex My::Block1 My::Block2 -- data/*.treex
In Perl:
use Treex::Core::Run q(treex); treex([qw(myscenario.scen -- data/*.treex)]); treex([qw(My::Block1 My::Block2 -- data/*.treex)]);
Treex::Core::Run allows to apply a block, a scenario, or their mixture on a set of data files. It is designed to be used primarily from bash command line, using a thin front-end script called treex. However, the same list of arguments can be passed by an array reference to the function treex() imported from Treex::Core::Run.
Treex::Core::Run
treex
treex()
Note that this module supports distributed processing, simply by adding switch -p. Then there are two ways to process the data in a parallel fashion. By default, SGE cluster\'s qsub is expected to be available. If you have no cluster but want to make the computation parallelized at least on a multicore machine, add the --local switch.
-p
qsub
--local
create new runner and runs scenario given in parameters
usage: treex [-?dEegjLpqSsv] [long options...] scenario [-- treex_files] scenario is a sequence of blocks or *.scen files options: -? --usage --help Prints this usage information. -s --save save all documents -q --quiet Warning, info and debug messages are suppressed. Only fatal errors are reported. --cleanup Delete all temporary files. -e --error_level Possible values: ALL, DEBUG, INFO, WARN, FATAL -E --forward_error_level messages with this level or higher will be forwarded from the distributed jobs to the main STDERR -L --language --lang shortcut for adding "Util::SetGlobal language=xy" at the beginning of the scenario -S --selector shortcut for adding "Util::SetGlobal selector=xy" at the beginning of the scenario -g --glob Input file mask whose expansion is to Perl, e.g. --glob '*.treex' -p --parallel Parallelize the task on SGE cluster (using qsub). -j --jobs Number of jobs for parallelization, default 10. Requires -p. --jobindex Not to be used manually. If number of jobs is set to J and modulo set to M, only I-th files fulfilling I mod J == M are processed. --outdir Not to be used manually. Dictory for collecting standard and error outputs in parallelized processing. --qsub Additional parameters passed to qsub. Requires -p. --local Run jobs locally (might help with multi-core machines). Requires -p. --priority Priority for qsub (an integer in the range -1023 to 1024, default=0). Requires -p. --watch re-run when the given file is changed TODO better doc --workdir working directory for temporary files in parallelized processing (if not specified, directories such as 001-cluster-run, 002-cluster-run etc. are created) -d --dump_scenario Just dump (print to STDOUT) the given scenario and exit. --survive Continue collecting jobs' outputs even if some of them crashed (risky, use with care!). -v --version Print treex and perl version
Zdeněk Žabokrtský <zabokrtsky@ufal.mff.cuni.cz>
Martin Popel <popel@ufal.mff.cuni.cz>
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Treex::Core, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Treex::Core
CPAN shell
perl -MCPAN -e shell install Treex::Core
For more information on module installation, please visit the detailed CPAN module installation guide.