NAME

Lingua::LO::NLP - Various Lao text processing functions

SYNOPSIS

    use utf8;
    use 5.10.1;
    use open qw/ :std :encoding(UTF-8) /;
    use Lingua::LO::NLP;
    use Data::Dumper;

    my $lao = Lingua::LO::NLP->new;

    my @syllables = $lao->split_to_syllables("ສະບາຍດີ"); # qw( ສະ ບາຍ ດີ )
    print Dumper(\@syllables);

    for my $syl (@syllables) {
        my $analysis = $lao->analyze_syllable($syl);
        printf "%s: %s\n", $analysis->syllable, $analysis->tone;
        # ສະ: TONE_HIGH_STOP
        # ບາຍ: TONE_LOW
        # ດີ: TONE_LOW
    }

    say $lao->romanize("ສະບາຍດີ", variant => 'PCGN', hyphen => "\N{HYPHEN}");  # sa‐bay‐di
    say $lao->romanize("ສະບາຍດີ", variant => 'IPA');                           # saʔ baːj diː

DESCRIPTION

This module provides various functions for processing Lao text. Currently it can

split Lao text (usually written without blanks between words) into syllables
analyze syllables with regards to core and end consonants, vowels, tone and other properties
romanize Lao text according to the PCGN standard or to IPA (experimental)

These functions are basically just shortcuts to the functionality of some specialized modules: Lingua::LO::NLP::Syllabify, Lingua::LO::NLP::Analyze and Lingua::LO::NLP::Romanize. If you need only one of them, you can shave off a little overhead by using those directly.

METHODS

new

    new(option => value, ...)

Options

normalize: passed to "split_to_syllables" and "analyze_syllable".

split_to_syllables

    my @syllables = $object->split_to_syllables( $text, %options );

Split Lao text into its syllables using a regexp modelled after PHISSAMAY, DALALOY and DURRANI: Syllabification of Lao Script for Line Breaking. Takes as its only mandatory parameter a character string to split and optionally a number of named options; see "new" in Lingua::LO::NLP::Syllabify for those. Returns a list of syllables.

analyze_syllable

    my $classified = $object->analyze_syllable( $syllable, %options );

Returns a Lingua::LO::NLP::Analyze object that allows you to query various syllable properties such as core consonant, tone mark, vowel length and tone. See there for details.

romanize

    $object->romanize( $lao, %options );

Returns a romanized version of the text passed in as $lao. See "new" in Lingua::LO::NLP::Romanize for options. The default variant is 'PCGN'.

analyze_text

    my @syllables = $object->analyze_text( $text, %options );

Split Lao text into its syllables and analyze them, returning an array of hashes. Each hash has at least a key 'analysis' with a Lingua::LO::NLP::Analyze object as a value. If the romanizeoption is set to a true value, it also has a "romanization" key. In this case, the variant option (see "new" in Lingua::LO::NLP::Romanize) is also required.

AUTHOR

Matthias Bethke, <matthias@towiski.de>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available. Significant portions of the code are (C) PostgreSQL Global Development Group and The Regents of the University of California. All modified versions must retain the file COPYRIGHT included in the distribution.

To install Lingua::LO::NLP, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lingua::LO::NLP

CPAN shell

perl -MCPAN -e shell
install Lingua::LO::NLP

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)