Changes for version 1.7 - 2010-05-14

  • Tests now work with the data files located at either ../data or data.
  • The make test now always generates the data/data.* files--this didn't work on Darwin and MSWin32.
  • Added calculate() method, which returns all probabilities. identify () now just calls calculate() and returns the most probable language.
  • When neither a trigram nor a bigram is found, use the average alphabet size instead of the individual language's alphabet size, as this penalizes Asian languages.

Documentation

build transition matrix for Lingua::Ident module

Modules

Statistical language identification