Search results for "distribution:OPUS-Tools TIEDEMANN"
OPUS::Tools - a collection of tools for processing OPUS corpora
This is not a library but just a collection of scripts for processing/converting OPUS corpora. Download corpus data in XML from <http://opus.lingfil.uu.se>...
TIEDEMANN/OPUS-Tools-0.2.2 - 26 Aug 2020 09:34:28 UTC
opus-cat - read a document from OPUS and print to STDOUT
"opus-cat" prints a file from OPUS to STDOUT. It is able to read from ZIP archives and finds files in the common OPUS file structure....
TIEDEMANN/OPUS-Tools-0.2.2 - 26 Aug 2020 09:34:28 UTC
xml2opus - convert XML files to OPUS (add sentence boundary markers)
"xml2opus" adds sentence boundaries to XML files within any of the XML tags....
TIEDEMANN/OPUS-Tools-0.2.2 - 26 Aug 2020 09:34:28 UTC
tmx2opus - convert TMX into OPUS XML
"tmx2opus" converts TMX files into OPUS format. It handles translation units with several languages and it also does sentence-splitting based on Lingua::Sentence. Regional codes can be removed from the language attribute. If the "outfile" has the ext...
TIEDEMANN/OPUS-Tools-0.2.2 - 26 Aug 2020 09:34:28 UTC
tmx2moses - convert TMX into Moses plain text
"tmx2opus" converts TMX files into OPUS format. It handles translation units with several languages. Regional codes can be removed from the language attribute....
TIEDEMANN/OPUS-Tools-0.2.2 - 26 Aug 2020 09:34:28 UTC
opus-read - read sentence alignment in XCES align format
"opus-read" is a simple script to read sentence alignments stored in XCES align format and prints the aligned sentences to STDOUT. It requires monolingual alignments (ascending order, no crossing links) of sentences in linked XML files. Linked XML fi...
TIEDEMANN/OPUS-Tools-0.2.2 - 26 Aug 2020 09:34:28 UTC
opus-index
script for indexing parallel corpora from OPUS using CWB...
TIEDEMANN/OPUS-Tools-0.2.2 - 26 Aug 2020 09:34:28 UTC
opus2multi
opus2multi [OPTIONS] xmldir pivot [lang-ids]* combine sentence alignments for several language pairs using a pivot language as intermediate language for all other languages <xmldir> should be the path to the XML directory that contains sentence align...
TIEDEMANN/OPUS-Tools-0.2.2 - 26 Aug 2020 09:34:28 UTC
opus2moses - convert aligned plain text files into OPUS XML
"moses2opus" converts a sentence-aligned corpus in plain text format (Moses format or fast_align format) into the OPUS XML-based format. It also does sentence-splitting based on Lingua::Sentence...
TIEDEMANN/OPUS-Tools-0.2.2 - 26 Aug 2020 09:34:28 UTC
opus-swap-align
Swap languages in XCES sentence alignment files....
TIEDEMANN/OPUS-Tools-0.2.2 - 26 Aug 2020 09:34:28 UTC