André Fernandes dos Santos > Lingua-EN-Sentences-Offsets > Lingua::EN::Sentence::Offsets

Download:
Lingua-EN-Sentences-Offsets-0.01_05.tar.gz

Dependencies

Annotate this POD

Website

View/Report Bugs
Module Version: 0.01_05   Source  

NAME ^

Lingua::EN::Sentence::Offsets - Finds sentence boundaries, and returns their offsets.

VERSION ^

version 0.01_05

SYNOPSIS ^

        use Lingua::EN::Sentence::Offsets qw/get_offsets get_sentences/;
         
        my $offsets = get_offsets($text);     ## Get the offsets.
        foreach my $o (@$offsets) {
                my $start  = $o->[0];
                my $length = $o->[1]-$o->[0];

                my $sentence = substr($text,$start,$length)  ## Get a sentence.
                # ...
        }

        ### or

        my $sentences = get_sentences($text);     
        foreach my $sentence (@$sentences) {
                ## do something with $sentence
        }

METHODS ^

get_offsets

Takes text input and returns reference to array containin pairs of character offsets, corresponding to the sentences start and end positions.

get_sentences

Takes text input and splits it into sentences.

add_acronyms

user can add a list of acronyms/abbreviations.

get_acronyms

get defined list of acronyms.

set_acronyms

run over the predefined acronyms list with your own list.

remove_false_eos

split_unsplit_stuff

Finds additional split points in the middle of previously defined sentences.

adjust_offsets

Minor adjusts to offsets (leading/trailing whitespace, etc)

initial_offsets

First naive delimitation of sentences

offsets2sentences

Given a list of sentence boundaries offsets and a text, returns an array with the text split into sentences.

ACKNOWLEDGEMENTS ^

Based on the original module Lingua::EN::Sentence, from Shlomo Yona (SHLOMOY)

SEE ALSO ^

Lingua::EN::Sentence, Text::Sentence

AUTHOR ^

Andre Santos <andrefs@cpan.org>

COPYRIGHT AND LICENSE ^

This software is copyright (c) 2012 by Andre Santos.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

syntax highlighting: