The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

MetaTrans::Base - Abstract base class for creating meta-translator plug-ins

SYNOPSIS

    # This is not a working example. It serves for illustration only.
    # For a working one see MetaTrans::UltralinguaNet source code.

    package MetaTrans::MyPlugin;

    use MetaTrans::Base;
    use vars qw(@ISA);
    @ISA = qw(MetaTrans::Base);

    use HTTP::Request;
    use URI::Escape;

    sub new
    {
        my $class   = shift;
        my %options = @_;

        $options{host_server} = "www.some-online-translator.com"
            unless (defined $options{host_server});

        my $self = new MetaTrans::Base(%options);
        $self = bless $self, $class;

        # supported translation directions:
        #   English <-> German
        #   English <-> French
        #   English <-> Spanish

        $self->set_languages('eng', 'ger', 'fre', 'spa');

        $self->set_dir_1_to_all('eng');
        $self->set_dir_all_to_1('eng');

        return $self;
    }

    sub create_request
    {
        my $self           = shift;
        my $expression     = shift;
        my $src_lang_code  = shift;
        my $dest_lang_code = shift;

        # our-language-codes-to-server-language-codes conversion table
        my %table = (eng => 'eng', ger => 'deu', fre => 'fra', spa => 'esp');

        return new HTTP::Request('GET',
            'http://www.some-online-translator.com/translate.cgi?' .
            'expr=' . uri_escape($expression) . '&' .
            'src='  . $table{$src_lang_code}  . '&' .
            'dst='  . $table{$dest_lang_code}
        );
    }

    sub process_response
    {
        my $self           = shift;
        my $contents       = shift;

        # we don't care about these here, but 
        # in some cases we might need to care
        my $src_lang_code  = shift;
        my $dest_lang_code = shift;

        my @result;
        while ($contents =~ m|
            <td class="expr">([^<]*)</td>
            <td class="trns">([^<]*)</td>
        |gsix)
        {
            my $expression  = $1;
            my $translation = $2;

            # add some $expression and $translation normalization code here

            push @result, ($expression, $translation);
        }
        
        return @result;
    }

    1;

DESCRIPTION

This class serves as a base for creating MetaTrans plug-ins, especially those ones, which extract data from online translators. Please see MetaTrans first. MetaTrans::Base already contains many features a MetaTrans plug-in must have and makes creating new plug-ins really easy.

To perform a translation using an online translator (e.g. http://www.ultralingua.net/) one needs to do two things:

1. Emulate sending a form.
2. Process the HTML output webserver sends in response.

To create a MetaTrans plug-in using MetaTrans::Base one only needs to do a bit more. The first step is to derrive from MetaTrans::Base and "override" following two abstract methods:

$plugin->create_request($expression, $src_lang_code, $dest_lang_code)

Should return a HTTP::Request object to be used by LWP::UserAgent for retrieving HTML output, which contains translation of $expression from the language with $src_lang_code to the language with $dest_lang_code. This basicaly emulates sending a form.

$plugin->process_response($contents, $src_lang_code, $dest_lang_code)

This method should extract translations from the HTML code ($contents) returned by webserver in response to the request. The translations must be returned in an array of following form:

    (expression_1, translation_1, expression_2, translation_2, ...)

Character encoding must be UTF-8! In addition all expressions and their translations should be normalized in a way so that all the grammar and meaning information were in parenthesis or behind a semi-colon. For example, if you request a English to French translation of "dog" from the http://www.ultralingua.net/ translator, the first line of the result is

    dog n. : 1. chien n.m.,f. chienne 2. pitou n.m. (Familier) (Québécisme)

The MetaTrans::UltralinguaNet module returns it as

    ('dog (n.)', 'chien (n.m.,f.)', 'dog (n.)', 'pitou (n.m.)')

The next step is specifying list of languages supported by the plug-in. We have to say, which languages we are able to translate from and which to. This can be done easily by calling appropriate methods inherrited from MetaTrans::Base. Please see "SPECIFYING SUPPORTED LANGUAGES".

The last step is setting the host_server attribute to the name of the online translator used by the plug-in. See ATTRIBUTES.

The MetaTrans::UltralinguaNet source code should serve as a good example on how to create a MetaTrans plug-in derrived from MetaTrans::Base.

CONSTRUCTOR METHODS

MetaTrans::Base->new(%options)

This method constructs a new MetaTrans::Base object and returns it. Key/value pair arguments may be provided to set up the initial state. The following options correspond to attribute methods described below:

   KEY                  DEFAULT
   ---------------      ----------------    
   host_server          'unknown.server'
   script_name          undef
   timeout              5
   matching             M_START
   match_at_bounds      1

Please note that as long as the MetaTrans::Base is an abstract class, calling the constructor method only makes sense in the derrived classes.

ATTRIBUTES

$plugin->host_server
$plugin->host_server($name)

Get/set the name of the online translator used by the plug-in. Is is only used to inform the user where the translation comes from and hence can be set to any meaningful value. It is a convention to set this to the online translator base URL with the 'http://' stripped. For example, the MetaTrans::UltralinguaNet sets host_server to 'www.ultralingua.net'.

$plugin->script_name
$plugin->script_name($name)

Get/set the name of the script, which runs this plug-in as a command line application. The script uses this to identify itself when printing usage. If unset, the script name is extracted from $0 variable. See the run method.

$plugin->timeout
$plugin->timeout($secs)

Get/set the time in seconds we want to wait for a reply from the online translator before timing out.

$plugin->matching
$plugin->matching($type)

Get/set the way of matching the found translations to the searched expression. Some online translators in addition to the translation of the searched expression also return translations of related expressions. For example, we want to translate "dog" from English to French and we also get translations of "dog days" or "every dog has his day". If this is not what we want we can help ourselves by setting matching to appropriate value:

MetaTrans::Base::M_EXACT

Match only those expressions which are the same as the searched one. Matching is incasesensitive and ignores grammar information, i.e. everything in parenthesis or after semi-colon. The same applies bellow.

Examples:

    'Dog'  matches        'dog'      (incasesensitive)
    'Hund' matches        'Hund; r'  (grammar information ignored)
    'dog'  does not match 'dog bite' (not an exact match)
MetaTrans::Base::M_START

Match those expressions which are prefixed with the searched expression.

Examples:

    'Dog'  matches        'dog bite'      (incasesensitive)
    'Hund' matches        'Hund is los'
    'Hund' does not match 'bissiger Hund' ('Hund' is not a prefix)
MetaTrans::Base::M_EXPR

Match those expressions which contain the searched expression, no matter where.

Examples:

    'Big Dog' matches        'very big dog'
    'big dog' does not match 'big angry dog' ('big dog' is not a substring)
MetaTrans::Base::M_WORDS

Match those expressions which contain all the words of the searched expression.

Examples:

    'big dog' matches        'big angry dog'
    'big dog' does not match 'angry dog'     (not all words are contained)
MetaTrans::Base::M_ALL

Return all without any filtering.

You can

    use MetaTrans::Base qw(:match_consts);

to import matching constant names (M_EXACT, M_START, ...) into your program's namespace.

$plugin->match_at_bounds
$plugin->match_at_bounds($bool)

Get/set the match-at-boundaries flag. Setting it to true value makes matching behave in a slightly different way. Subexpressions and words are matched at word boundaries only. In practice this means that with matching set to M_WORDS the expression "big dog" won't be matched to "big angry doggie" while it would be with match-at-boundaries set to false value. The same applies to M_START and M_EXPR. The option has no effect when matching is set to M_EXACT or M_ALL.

$plugin->default_dir
$plugin->default_dir($src_lang_code, $dest_lang_code)

Get/set the default translation direction. May only be set to supported one, see "SPECIFYING SUPPORTED LANGUAGES". Returns old value as an array of two language codes.

SPECIFYING SUPPORTED LANGUAGES

Every MetaTrans plug-in has to specify supported languages and translation directions. MetaTrans::Base provides several methods for doing so. The first step is specifying list of all languages, which appear on the left or right side of any of supported translation directions. Consider your plug-in supports following ones:

    English -> French
    English -> German
    French  -> Spanish

Then the list of supported languages is simply English, French, German and Spanish.

The arguments passed to particular methods need to be language codes, not language names. Please see MetaTrans::Languagues for a complete list.

$plugin->set_languages(@language_codes)

Set supported languages to the ones specified by @language_codes. In the above exapmle one would call:

    $plugin->set_languages('eng', 'fre', 'ger', 'spa');
$plugin->set_dir_1_to_1($src_lang_code, $dest_lang_code)

Add support for translating from language with $src_lang_code to language with $dest_lang_code. Both languages need to be previously declared as supported. The method returns true value on success, false value on error. To specify we support directions from the above example we would simply call:

    $plugin->set_dir_1_to_1('eng', 'fre');
    $plugin->set_dir_1_to_1('eng', 'ger');
    $plugin->set_dir_1_to_1('fre', 'spa');
$plugin->unset_dir_1_to_1($src_lang_code, $dest_lang_code)

Remove support for translating from language with $src_lang_code to language with $dest_lang_code. Both languages need to be previously declared as supported. The method returns true value on success, false value on error.

$plugin->set_dir_1_to_spec($src_lang_code, @dest_lang_codes)

Add support for translating from language with $src_lang_code to all languages whichs codes are in @dest_lang_codes. The direction from $src_lang_code language to itself won't be set as supported even if $src_lang_code is specified in @dest_lang_codes. However, calling

    $plugin->set_dir_1_to_1($src_lang_code, $src_lang_code);

will do the job if this is what you want. It only results in warning messages if some of the @dest_lang_codes are unsupported. Only the supported ones will be used, others are ignored. The method returns number of directions set as supported on (partial) success, 0 on error.

Example:

    my @all_languages = ('eng', 'fre', 'ger', 'spa');
    $plugin->set_languages(@all_languages);
    $plugin->set_dir_1_to_spec('eng', @all_languages);

... will result in following supported translation directions:

    English -> French
    English -> German
    English -> Spanish
$plugin->set_dir_1_to_all($src_lang_code)

This is just a shorter way for writting:

    $plugin->set_dir_1_to_spec($src_lang_code, @all_codes);

where @all_codes is an array of codes of all supported languages.

$plugin->set_dir_spec_to_1($dest_lang_code, @src_lang_codes)

This works exactly as set_dir_1_to_spec with reversed sides.

$plugin->set_dir_all_to_1($dest_lang_code)

This is just a shorter way for writting:

    $plugin->set_dir_spec_to_1($dest_lang_code, @all_codes);

where @all_codes is an array of codes of all supported languages. Example:

    my @src_lang_codes = ('ger', 'fre', 'spa');
    $plugin->set_languages('eng', 'por', @src_lang_codes);
    $plugin->set_dir_spec_to_1('eng', @src_lang_codes);

... will result in following supported translation directions:

    German  -> English
    French  -> English
    Spanish -> English

But if we replaced the last line with

    $plugin->set_dir_all_to_1('eng');

the result would have been:

    Portuguese -> English
    German     -> English
    French     -> English
    Spanish    -> English

PLUG-IN REQUIRED METHODS

These are the methods MetaTrans expects every plug-in to provide. You only need to worry about this if you are writting a plug-in from a scratch. If you are derriving from MetaTrans::Base all these methods are inherited. They make use of the abstract methods create_request and process_response, attribute values and supported translation directions specified using set_dir_* methods. If you only want to use MetaTrans::Base as a base class for your plug-in you can stop reading here. Everything you need to know was written above.

If you are writting a plug-in from a scratch you have to make sure it provides all the methods with appropriate functionality specified in this section. In addition, every MetaTrans plug-in has to provide attribute methods as specified in ATTRIBUTES section.

$plugin->is_supported_dir($src_lang_code, $dest_lang_code)

Returns true value if the translation direction is supported from language with $src_lang_code to language with $dest_lang_code, false value otherwise.

$plugin->get_all_src_lang_codes

Returns a list of all language codes, which the plug-in is able to translate from. For example, ('eng', 'fre') will be returned if supported translation directions are:

    English -> French
    English -> Spanish
    French  -> Spanish
$plugin->get_dest_lang_codes_for_src_lang_code($src_lang_code)

Returns a list of all language codes, which the plug-in is able to translate to from the language with $src_lang_code. If called with 'eng' as an parameter in the above example, returned value would be ('fre', 'spa').

$plugin->translate($expression [, $src_lang_code, $dest_lang_code])

Returns translation of $expression as an array of expression-translation pairs in one string separated by " = " in UTF-8 character encoding. An example output is:

    ("dog = chien", "dog = pitou", "dog days = canicule")

undef value is returned and an error printed if $src_lang_code -> $dest_lang_code is an unsupported translation direction. 'timeout' string is returned if timeout occurs when querying online translator, 'error' string is returned on any other error.

Default translation direction (see default_dir attribute) is used if the method is called with first argument only.

$plugin->get_trans_command($expression, $src_lang_code, $dest_lang_code, $append)

This method is a very ugly hack, for which writting MetaTrans plug-ins from a scratch is discouraged. See MetaTrans for more information on why this it is required.

The get_trans_command method is expected to return an array containing command, which if run using Proc::SyncExec::sync_popen_noshell function will print translations of $expression from $src_lang_code language to $dest_lang_code language (the first element of the array is the program name, list of arguments follows). The command also needs to contain options correspondent to current plug-in attribute values and ensure appropriate behaviour. Each line of the output must correspond to one translation and have following form:

    expression = translation

In addition, the $append string, if specified, should be appendet to each line of the output.

STATIC FUNCTIONS

is_exact_match($in_expr, $found_expr)

Returns true value if the $found_expr expression matches input expression $in_expr when using M_EXACT matching options (see matching attribute).

is_match_at_start($in_expr, $found_expr, $at_bounds)

Returns true value if the $found_expr expression matches input expression $in_expr when using M_START matching options (see matching attribute). The $at_bounds argument corresponds to the match_at_bounds attribute.

is_match_expr($in_expr, $found_expr, $at_bounds)

Returns true value if the $found_expr expression matches input expression $in_expr when using M_EXPR matching options (see matching attribute). The $at_bounds argument corresponds to the match_at_bounds attribute.

is_match_words($in_expr, $found_expr, $at_bounds)

Returns true value if the $found_expr expression matches input expression $in_expr when using M_WORDS matching options (see matching attribute). The $at_bounds argument corresponds to the match_at_bounds attribute.

strip_grammar_info($expression)

Returns the $expression with all the grammar and meaning information deleted (everything in parantheses or behind a semicolon) in perl's internal UTF-8 format (see Encode).

convert_to_utf8($input_encoding, $string)

Converts $string from $input_encoding to UTF-8 encoding. In addition all HTML entities contained in the $string are converted to corresponding UTF-8 characters. This may sometimes be very useful when writting the process_response method.

OTHER METHODS

$plugin->run

Run the plug-in as a command line application. Very useful for testing and debugging. Try executing following script to see what this does:

    #!perl

    # load a plug-in class derrived from MetaTrans::Base
    use MetaTrans::UltralinguaNet;

    # instantiate an object
    my $plugin = new MetaTrans::UltralinguaNet;

    # run it
    $plugin->run;

BUGS

Please report any bugs or feature requests to bug-metatrans@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

AUTHOR

Jan Pomikalek, <xpomikal@fi.muni.cz>

COPYRIGHT & LICENSE

Copyright 2004 Jan Pomikalek, All Rights Reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

MetaTrans, MetaTrans::Languages, MetaTrans::UltralinguaNet, HTTP::Request, URI::Escape

1 POD Error

The following errors were encountered while parsing the POD:

Around line 138:

Non-ASCII character seen before =encoding in '(Québécisme)'. Assuming CP1252