The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::Lid - Interface to the language and encoding identifier "lid"

SYNOPSIS

    use Lingua::Lid qw/:all/;

    # Identify the language and character encoding of...

    # ...a string
    $result = lid_fstr("This is a short English sentence.");

    # ...a plain text file
    $result = lid_ffile("/path/to/a/file.txt");

    # ...if $result is undef, an error occurred:
    die Lingua::Lid::errstr() unless $result;

    print "Lingua::Lid v$Lingua::Lid::VERSION, using lid v",
        lid_version(), "\n";

DESCRIPTION

The Perl extension Lingua::Lid provides a Perl interface to Lingua-Systems' language and character encoding identification library lid, which is required to build and use this extension.

The interface is implemented using the XS language and makes the functionality of the lid C library functions available to Perl applications and modules in a simple to use way.

Lingua::Lid is thread-safe an can be used my more than one thread simultaneously, if compiled with lid v3.0.0 or above.

This man page covers the usage of the Lingua::Lid Perl extension only - for more information on lid and a list on supported languages and character encodings, have a look at its manual, which is both included in its distribution and freely available under http://www.lingua-systems.com/language-identifier/lid-library/.

Lingua::Lid aims to stick with the C interface as close as reasonable - but with respect to common Perl conventions. Have a look at "COMPARISON TO THE C INTERFACE" for details.

EXPORTS

No symbols are exported by default.

Any function needed must either be requested for import explicitly or the export tag :all may be used to import symbols for all provided functions:

  use Lingua::Lid qw/lid_ffile lid_fstr/; # or
  use Lingua::Lid qw/:all/;

The function Lingua::Lid::errstr() is not exportable and has to be called with its full package name.

FUNCTIONS

lid_fstr( $string )

Mnemonic: "Language and encoding identification... from string"

This function takes a $string as an argument and identifies its language and encoding. It returns a hash reference containing the results. See IDENTIFICATION RESULTS DATA STRUCTURE for details.

If an error occurs, the function returns undef. Use Lingua::Lid::errstr() to obtain an appropriate message describing the error.

lid_ffile( $file )

Mnemonic: "Language and encoding identification... from file"

This function takes a plain text $file's path as an argument and identifies its language and encoding. It returns a hash reference containing the results. See IDENTIFICATION RESULTS DATA STRUCTURE for details.

If an error occurs, the function returns undef. Use Lingua::Lid::errstr() to obtain an appropriate message describing the error.

lid_version( )

This function returns the version of the lid C library that is currently loaded (runtime version).

lid_version_ct( )

This function returns the version of the lid C library that Lingua::Lid has been compiled with (compile time version).

IDENTIFICATION RESULTS DATA STRUCTURE

The functions lid_fstr() and lid_ffile() return a hash reference containing the results of the language and encoding identification.

The hash reference contains the following keys:

language

The language's name (in English), i.e. "German", "French", "English".

isocode

The language's ISO 639-3 code, i.e. "deu", "fra", "eng".

encoding

The character encoding, i.e. "UTF-8", "ISO-8859-1", "UTF-32BE".

  $result = {
                'language'  =>  'English',
                'isocode'   =>  'eng',
                'encoding'  =>  'ASCII'
            };

ERROR HANDLING

The functions lid_fstr() and lid_ffile() return undef if an error occurs. Lingua::Lid::errstr() can be used to obtain an appropriate message describing the last occurred error.

Have a look at lid's manual for a list of all error messages.

NOTE:

The $Lingua::Lid::errstr variable is still supported and thread-safe, too. Internally it is tied to Lingua::Lid::errstr() using Lingua::Lid::Errstr. However, as of Lingua::Lid v0.02 Lingua::Lid::errstr() is preferred and should be used in any new code. $Lingua::Lid::errstr may be removed in a future release.

COMPARISON TO THE C INTERFACE

Lingua::Lid's function lid_fstr() and lid_ffile() behave exactly as their lid counterparts in C.

The C functions lid_fnstr() and lid_fwstr() are not needed, use the Lingua::Lid function lid_fstr() in any Perl code instead.

The C function lid_strerror() and the per-thread pseudo-variable lid_errno are not needed. Rather than returning a pointer to NULL, Lingua::Lid's lid_fstr() and lid_ffile() return undef on errors. Lingua::Lid::errstr() can be used to obtain an appropriate message describing the last occurred error.

lid's function lid_version_string() is available as lid_version() in Lingua::Lid.

The C defines LID_VERSION_STRING (LID_VERSION in lid v2.x.x) is not available in Lingua::Lid, use lid_version_ct() instead.

Lingua::Lid's results data structure sticks to the C lid_t * structure as close as possible. See "IDENTIFICATION RESULTS DATA STRUCTURE" above.

EXAMPLES

  use strict;
  use Lingua::Lid qw/lid_fstr lid_version/;

  print "Lingua::Lid v$Lingua::Lid::VERSION, using lid v",
    lid_version(), "\n";

  my @strings =
  (
      "This is a short English sentence.",
      "Dies ist ein kurzer deutscher Satz.",
      " "
  );

  foreach my $string (@strings)
  {
      if (my $r = lid_fstr($string))
      {
          print join(" - ", $r->{language}, $r->{isocode},
                            $r->{encoding}), "\n";
      }
      else
      {
          print "lid_fstr() failed: ", Lingua::Lid::errstr(), "\n";
      }
  }

The program above produces the following output:

  Lingua::Lid v0.02, using lid v3.0.0
  English - eng - ASCII
  German - deu - ASCII
  lid_fstr() failed: Insufficient input length

BUGS

None known.

Please report bugs either using CPAN's bug tracker or to <perl@lingua-systems.com>.

SEE ALSO

AUTHOR

Alex Linke <alinke@lingua-systems.com>

COPYRIGHT AND LICENSE

Copyright (C) 2009-2014 Lingua-Systems Software GmbH

This extension is free software. It may be used, redistributed and/or modified under the terms of the zlib license. For details, see the full text of the license in the file LICENSE.