The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Changes for version 0.06

  • case- and mark-sensitivity control introduced: Following what is used in the Perl module "Text::Levenshtein". By a "matching level" set in new() and via "set_eq()" method. The default remains to search case (and mark) insensitively.
  • pos_all method added
  • pos_ methods have added argument 'conform' to transliterate the file-given strings into common code.
  • set_lang() and get_lang() methods introduced to query the datafile being used and to change it.
  • frq_count, frq_opm, cd_count and cd_pct now return 0 rather than empty-string if the looked-up string was not found in the language file.
  • all_strings now culls duplicates with uniq() after firstly ensuring have a non-empty string; given some empty lines and duplicate strings for different POS in some files; and then alphabetically sorts them.
  • method "list_strings" renamed "select_strings" to avoid confusion with "all_strings", which also returns a list of strings.
  • select_strings checks that value is defined for a language file, and that if retrieved it is numeric ahead of checking its range--in case the field is empty for a file (as seems to happen for cd_count with UK file).
  • cv_pattern regex check in select_strings transliterates tested strings to ASCII to capture, say, 'é' in the string with just 'e' in the pattern.
  • frq_sum method added and POD for related methods indicate that they all can be used to obtain descriptives of frq_count as well.
  • POD documentation for stats methods corrected: "raw" should have been "opm", and there is no argument named "log".
  • added a croak if "The requested value is not defined for the SUBTLEX-x corpus" of a particular language x.
  • Dependency on File::Slurp removed in place of Path::Tiny.
  • croak messages expanded to include statement of the method called.
  • NL lang: need to specify _all or _min files; see table in POD

Modules

Retrieve word frequencies and related values and lists from subtitles corpora