David James > Lingua-EN-Segmenter-0.1 > Lingua::EN::Segmenter::TextTiling

Download:
Lingua-EN-Segmenter-0.1.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  4
Open  0
View/Report Bugs
Module Version: 0.1   Source  

NAME ^

Lingua::EN::Segmenter::TextTiling - Segment text using the TextTiling method

SYNOPSIS ^

  use Lingua::EN::Segmenter::TextTiling qw(segments);
  use lib '.';
  
  my $text = <<EOT;
  Lingua::EN::Segmenter is a useful module that allows text to be split up 
  into words, paragraphs, segments, and tiles.
  
  Paragraphs are by default indicated by blank lines. Known segment breaks are
  indicated by a line with only the word "segment_break" in it.
  
  The module detects paragraphs that are unrelated to each other by comparing 
  the number of words per-paragraph that are related. The algorithm is designed
  to work only on long segments. 
  
  SOUTH OF BAGHDAD, Iraq (CNN) -- Seven U.S. troops freed Sunday after being 
  held by Iraqi forces arrived by helicopter at a base south of Baghdad and were 
  transferred to a C-130 transport plane headed for Kuwait, CNN's Bob Franken 
  reported from the scene. 
  
  EOT
    
  my $num_segment_breaks = 1;
  my @segments = segments($num_segment_breaks,$text);
  print $segments[0]; # Prints the first three paragraphs of the above text
  print "\n----------SEGMENT_BREAK----------\n";
  print $segments[1]; # Prints the last paragraph of the above text
  
  # This module can also be used in an object-oriented fashion
  my $splitter = new Lingua::EN::Splitter;
  @words = $splitter->words($text);

DESCRIPTION ^

See synopsis.

EXTENDING ^

This module is designed to be easily extendable. Feel free to extend from this module when designing alternate methods for text segmentation.

AUTHORS ^

David James <splice@cpan.org>

SEE ALSO ^

Lingua::EN::Segmenter::Baseline, Lingua::EN::Segmenter::Evaluator, http://www.cs.toronto.edu/~james

LICENSE ^

  Copyright (c) 2002 David James
  All rights reserved.
  This program is free software; you can redistribute it and/or
  modify it under the same terms as Perl itself.
syntax highlighting: