Enrique Nell > WWW-Translate-Apertium-0.10 > WWW::Translate::Apertium

Download:
WWW-Translate-Apertium-0.10.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
Report a bug
Module Version: 0.10   Source   Latest Release: WWW-Translate-Apertium-0.12

NAME ^

WWW::Translate::Apertium - Open source machine translation

VERSION ^

Version 0.10 April 11, 2009

SYNOPSIS ^

    use WWW::Translate::Apertium;
    
    my $engine = WWW::Translate::Apertium->new();
    
    my $translated_string = $engine->translate($string);
    
    # default language pair is Catalan -> Spanish
    # change to Spanish -> Galician:
    $engine->from_into('es-gl');
    
    # check current language pair:
    my $current_langpair = $engine->from_into();
    
    # get available language pairs:
    my %pairs = $engine->get_pairs();
    
    # default output format is 'plain_text'
    # change to 'marked_text':
    $engine->output_format('marked_text');
    
    # check current output format:
    my $current_format = $engine->output_format();
    
    # configure a new Apertium object to store unknown words:
    my $engine = WWW::Translate::Apertium->new(
                                                output => 'marked_text',
                                                store_unknown => 1,
                                              );
    
    # get unknown words for source language = Aranese
    my $es_unknown_href = $engine->get_unknown('oc_aran');

DESCRIPTION ^

Apertium is an open source shallow-transfer machine translation engine designed to translate between related languages (and less related languages). It is being developed by the Department of Software and Computing Systems at the University of Alicante. The linguistic data is being developed by research teams from the University of Alicante, the University of Vigo and the Pompeu Fabra University. For more details, see http://www.apertium.org/.

WWW::Translate::Apertium provides an object oriented interface to the Apertium online machine translation web service, based on Apertium 3.0.

Currently, Apertium supports the following language pairs:

- Bidirectional

- Single Direction

NOTE: The underlying translation retrieval method changed in version 0.06. The current module is based on the Apertium web service, which serves the translations faster than the previous web scraping approach.

Summary of changes since version 0.05 that may have an impact on legacy code:

- This module expects UTF-8 text and returns UTF-8 text. You can also send text encoded in Latin-1, but the support for Latin-1 will be phased out soon.

- Some language codes have changed: The code for Brazilian Portuguese is now pt_BR and the code for Aranese is oc_aran (used to be oc, which is now the language code for Occitan).

CONSTRUCTOR ^

new()

Creates and returns a new WWW::Translate::Apertium object.

    my $engine = WWW::Translate::Apertium->new();

WWW::Translate::Apertium recognizes the following parameters:

The default parameter values can be overridden when creating a new Apertium engine object:

    my %options = (
                    lang_pair => 'es-ca',
                    output => 'marked_text',
                    store_unknown => 1,
                  );

    my $engine = WWW::Translate::Apertium->new(%options);

METHODS ^

$engine->translate($string)

Returns the translation of $string generated by Apertium, encoded as UTF-8. In case the server is down, the translate method will show a warning and return undef.

The input $string must be an UTF-8 encoded string (for this task you can use the Encode module or the PerlIO layer, if you are reading the text from a file).

If you are going to translate a string literal included in the code and then display the result in the output window of the code editor, then you should add the following statement to your code in order to avoid a "Wide character in print" warning:

    binmode(STDOUT, ':utf8');

$engine->from_into($lang_pair)

Changes the engine language pair to $lang_pair. When called with no argument, it returns the value of the current engine language pair.

$engine->get_pairs()

Returns a hash containing the available language pairs. The hash keys are the language codes, and the values are the corresponding language names.

$engine->output_format($format)

Changes the engine output format to $format. When called with no argument, it returns the value of the current engine output format.

$engine->get_unknown($lang_code)

If the engine was configured to store unknown words, it returns a reference to a hash containing the unknown words (keys) detected during the current machine translation session for the specified source language, along with their frequencies (values).

The valid values of $lang_code for the source language are (in alphabetical order):

DEPENDENCIES ^

LWP::UserAgent

URI::Escape

HTML::Entities

SEE ALSO ^

WWW::Translate::interNOSTRUM

REFERENCES ^

Apertium project website:

http://www.apertium.org/

If you want to get the real thing, you can download the Apertium code and build it on your local machine. You will find detailed setup instructions in the Apertium wiki:

http://wiki.apertium.org/wiki/Installation

ACKNOWLEDGEMENTS ^

Many thanks to Mikel Forcada Zubizarreta, coordinator of the Transducens research team of the Department of Software and Computing Systems at the University of Alicante, who kindly answered my questions during the development of this module, and to Xavier Noria and João Albuquerque for useful suggestions. The author is also grateful to Francis Tyers, a member of the Apertium team, who provided essential feedback for the latest versions of this module.

AUTHOR ^

Enrique Nell, <perl_nell@telefonica.net>

COPYRIGHT AND LICENSE ^

Copyright (C) 2007-2009 by Enrique Nell, all rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.