The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Search::Tools::Snipper - extract keywords in context

SYNOPSIS

 my $query = [ qw/ quick dog / ];
 my $text  = 'the quick brown fox jumped over the lazy dog';
 
 my $s = Search::Tools::Snipper->new(
            occur       => 3,
            context     => 8,
            word_len    => 5,
            max_chars   => 300,
            query       => $query
            );
            
 print $s->snip( $text );
 
 

DESCRIPTION

Search::Tools::Snipper extracts keywords and their context from a larger block of text. The larger block may be plain text or HTML/XML.

METHODS

new( query => query )

Instantiate a new object. query must be either a scalar string, an array of strings, or a Search::Tools::RegExp::Keywords object.

Many of the following methods are also available as key/value pairs to new().

occur

The number of snippets that should be returned by snip().

Available via new().

context

The number of context words to include in the snippet.

Available via new().

max_chars

The maximum number of characters (not bytes! under Perl >= 5.8) to return in a snippet. NOTE: This is only used to test whether test is worth snipping at all, or if no keywords are found (see show()).

Available via new().

word_len

The estimated average word length used in combination with context(). You can usually ignore this value.

Available via new().

show

Boolean flag indicating whether snip() should succeed no matter what, or if it should give up if no snippets were found. Default is 1 (true).

Available via new().

escape

Boolean flag indicating whether snip() should escape any HTML/XML markup in the resulting snippet or not. Default is 0 (false).

Available via new().

snipper

The CODE ref used by the snip() method for actually extracting snippets. You can use your own snipper function if you want (though if you have a better snipper algorithm than the ones in this module, why not share it?). If you go this route, have a look at the source code for snip() to see how snipper() is used.

Available via new().

snipper_name

The name of the internal snipper function used. In case you're curious.

snipper_force

Boolean flag indicating whether the snipper() value should always be used, regardless of the type of query keyword. Default is 0 (false).

Available via new().

count

The number of snips made by the Snipper object.

collapse_whitespace

Boolean flag indicating whether multiple whitespace characters should be collapsed into a single space. A whitespace character is defined as anything that Perl's \s pattern matches, plus the nobreak space (\xa0). Default is 1 (true).

Available via new().

snip( text )

Return a snippet of text from text that matches query plus context() words of context. Matches are case insensitive.

rekw

Returns the internal Search::Tools::RegExp::Keywords object.

AUTHOR

Peter Karman perl@peknet.com

Based on the HTML::HiLiter regular expression building code, originally by the same author, copyright 2004 by Cray Inc.

Thanks to Atomic Learning www.atomiclearning.com for sponsoring the development of this module.

COPYRIGHT

Copyright 2006 by Peter Karman. This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

SWISH::HiLiter