Text::Perfide::PartialAlign - Split large bitexts into smaller files.
Perhaps a little code snippet.
use Text::Perfide::PartialAlign; my $foo = Text::Perfide::PartialAlign->new(); ...
A list of functions that can be exported. You can delete this section if you don't export anything, such as for a purely object-oriented module.
Writes subcorpora to files.
Prints a short description and usage details.
Receives an array of lines of a text (each line is an array of words). Calculates the frequency of each word.
Receives hash token => freq. Returns hash with elements with freq == 1
Builds an hash with term => positions, where position is the number of the sentence in which term occurs.
Sorts an array of pairs and removes duplicated pairs.
Receives two pairs. Checks if both coordinates of the first pair are lower than the second pair.
Receives two pairs...
Receives two pairs. Checks if both coordinates of the first pair are lower or equal than the second pair's.
Receives an array of pairs. Using dynamic programming, selects the maximal chain.
Finds unique terms common to both corpora. Notion of equality can be extended with two lists of correspondences.
Returns a reference to a hash containing the elements common to the hashes pointed by the references $l1Hap and $l2Hap.
$l1_to_l2 and $l2_to_l1 are references to hashes containing correspondences between words in language1 and language2 and vice-versa.
Selects a chain trying to obbey the maximalChunkSize constraint.
Given a file name, splits the segments and words into an array of arrays.
Returns: a reference to the array of arrays, a reference to an array of pairs with the offsets of the start and end of each segment, a reference to the full text
Given a corpus and a start and end positions, returns a string with the contents within the given range.
Concatenates all the words in the lines comprised in the $first..$last-1 range from corpus.
Retrieves from the original text the substring from the begining of the segment $first to the end of the segment $last;
Parses a given file with correspondences between two given languages. File must follow the following DSL: file : header correspondence* header: 'langs:' L1, L2 correspondence : term (',' term)* '=' term (',' term)* term : word (\s word)*
Does not yet support multi-word terms nor multi-term correspondences!
<andrefs at cpan.org>
Please report any bugs or feature requests to
bug-text-perfide-partialalign at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Perfide-PartialAlign. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
You can also look for information at:
Based on the original script partialAlign.py bundled with hunalign -- http://mokk.bme.hu/resources/hunalign/ .
Thanks to Daniel Varga for helping us to understand how partialAlign.py works.
Copyright 2012 Andre Santos.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.