Uplug - a toolbox for processing (parallel) text corpora
$module = 'pre/basic'; %args = ( '-in' => $input_file_name, '-ci' => $input_char_encoding ); my $uplug=Uplug->new($module, %args); # create a new uplug module $uplug->load(); # load it $uplug->run(); # and run it
This library provides the main methods for loading Uplug modules and running them. Configuration files describe the module and its parameters (see Uplug::Config). Each module may contain a number of sub-modules. Each of them can usually calls the uplug scripts provided in the package.
More information on how to use the Uplug toolkit with the provided modules can be found here: uplug
Add-ons and language-specific modules can be downloaded from the Uplug project website at bitbucket: https://bitbucket.org/tiedemann/uplug
$uplug = new Uplug ( $module, %args )
Construct a new Uplug object for the given Uplug module ($module refers to a configuration file). Module arguments are specified in
%args and depend on the module. For more information about specific Uplug modules, use the Uplug startup script:
uplug -h module-name
Load the module given in the constructor and all its sub-modules. This also creates temporary configuration files with adjusted parameters in
Run all commands specified in all sub-modules. Pipeline commands will be constructed according to the sequence of sub-modules and the specifications in the Uplug configuration files. The will be simply executed as external system calls.
Load all sub-modules and adjust input and output according to the configuration files and the current pipe-line. Output streams will be used as input streams with the same name for the next sub-module. This method tries to find possible pipelines for combining commands.
$cmd = $uplug->command()
Return a sequence of system commands for the entire pipeline. Commands are separated by ';'. System command may include several pipelines through STDIN/STDOUT.
Change the input settings in a particular configuration.
Change the output settings in a particular configuration.
Set/return data files available in the current pipeline.
Check whether their is an input stream that can read from STDIN.
Check whether their is an output stream that can write to STDOUT.
Checks whether a particular file is already in use in the current pipeline (avoids writing over files that a command still reads from).
Return a new temporary file (in data/runtime).
For the latest sources, language packs, additional modules and tools: Please, have a look at the project website at https://bitbucket.org/tiedemann/uplug