NAME

langident - identifies the language files are written in

SYNOPSIS

  langident [OPTIONS] file1 [file2 ...]

DESCRIPTION

Identifies the language files are written in using Perl module Lingua::Identify.

OPTIONS

-a

Show all results (not just the most probable language).

-c

Show confidence level for most probable language (it will be the first value right after the most probable language).

-d

Debug (development only).

-E ENCODING

Select an input encoding. Defaults to UTF-8.

  # use ISO-8859-1 (latin1)
  langident -E ISO-8859-1 file

-e METHODS

Select the method(s) to use. There are three ways of doing this:

  # simply using a method
  langident -e ngrams3 file

  # using several methods (separate them with a comma)
  langident -e prefixes3,suffixes3

  # using several methods and assign different weights to each of them
  langident -e smallwords=2,prefixes=1,ngrams3=1.3

The available methods are the following: smallwords, prefixes1, prefixes2, prefixes3, prefixes4, suffixes1, suffixes2, suffixes3, suffixes4, ngrams1, ngrams2, ngrams3 and ngrams4.

-h

Display help message and exit.

-l

List all available languages and exit.

-m NUMBER

Set maximum number of results (languages) to display (shows the N most probable languages, by descending order of probability).

Overrides the -a switch.

-o LANGUAGES

Only work with specified languages.

  # identify between Portuguese and English only
  langident -o pt,en *

-p

Also show percentages.

-s SIZE

Maximum size to examine.

-v

Show version and exit.

EXAMPLES

Use methods ngrams2 and ngrams1, assigning the double of importance to ngrams2 (-e switch); output will include the three most probable languages (-m switch) with its percentages (-p switch) and also the confidence level (-c switch) of the first result.

  $ langident -e ngrams2=2,ngrams1 -c -p -m 3 README 
  README:en 65.7209505939491 7.8971987481393 ga 4.11905889385895 tr 4.08487011400505
  $

TO DO

Add a switch to ignore HTML tags (and maybe other formats too)

AUTHOR

Jose Alves de Castro, <cog@cpan.org>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install Lingua::Identify, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lingua::Identify

CPAN shell

perl -MCPAN -e shell
install Lingua::Identify

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)