nat-initmat - initialize a sparse matrix with words co-occurrence.
nat-initmat <crp1> <crp2> [<exc1> <exc2>] <matrix>
This tool is used internally by nat-these and is not intended to be used independently. Basically, this tool takes two corpora files created by nat-pre and allocates a sparse matrix, where rows indexes correspond to word identifiers on the source corpus, and column indexes correspond to word identifiers on the target corpus. Cells count the words co-occurrence on the same sentence. The matrix file is then created with the matrix information.
nat-these
nat-pre
matrix
Optionally, you can pass to the system two exclude lists, as returned by the nat-words2id tool. These words will be ignored, and counting will not be done for them.
nat-words2id
The matrix is saved and can be processed later by EM-Algorithm methods IPFP (nat-ipfp), Sample A (nat-samplea) and Sample B (nat-sampleb).
nat-ipfp
nat-samplea
nat-sampleb
nat-words2id, nat-pre, NATools documentation
Copyright (C)2002-2009 Alberto Simoes and Jose Joao Almeida Copyright (C)1998 Djoerd Hiemstra GNU GENERAL PUBLIC LICENSE (LGPL) Version 2 (June 1991)
To install Lingua::NATools, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::NATools
CPAN shell
perl -MCPAN -e shell install Lingua::NATools
For more information on module installation, please visit the detailed CPAN module installation guide.