Search::Tools::Snipper - extract keywords in context
my $query = [ qw/ quick dog / ]; my $text = 'the quick brown fox jumped over the lazy dog'; my $s = Search::Tools::Snipper->new( occur => 3, context => 8, word_len => 5, max_chars => 300, query => $query ); print $s->snip( $text );
Search::Tools::Snipper extracts keywords and their context from a larger block of text. The larger block may be plain text or HTML/XML.
Instantiate a new object. query must be either a scalar string, an array of strings, or a Search::Tools::RegExp::Keywords object.
Many of the following methods are also available as key/value pairs to new().
The number of snippets that should be returned by snip().
Available via new().
The number of context words to include in the snippet.
The maximum number of characters (not bytes! under Perl >= 5.8) to return in a snippet. NOTE: This is only used to test whether test is worth snipping at all, or if no keywords are found (see show()).
The estimated average word length used in combination with context(). You can usually ignore this value.
Boolean flag indicating whether snip() should succeed no matter what, or if it should give up if no snippets were found. Default is 1 (true).
Boolean flag indicating whether snip() should escape any HTML/XML markup in the resulting snippet or not. Default is 0 (false).
The CODE ref used by the snip() method for actually extracting snippets. You can use your own snipper function if you want (though if you have a better snipper algorithm than the ones in this module, why not share it?). If you go this route, have a look at the source code for snip() to see how snipper() is used.
The name of the internal snipper function used. In case you're curious.
Boolean flag indicating whether the snipper() value should always be used, regardless of the type of query keyword. Default is 0 (false).
The number of snips made by the Snipper object.
Boolean flag indicating whether multiple whitespace characters should be collapsed into a single space. A whitespace character is defined as anything that Perl's \s pattern matches, plus the nobreak space (\xa0). Default is 1 (true).
\s
\xa0
Return a snippet of text from text that matches query plus context() words of context. Matches are case insensitive.
Returns the internal Search::Tools::RegExp::Keywords object.
Peter Karman perl@peknet.com
perl@peknet.com
Based on the HTML::HiLiter regular expression building code, originally by the same author, copyright 2004 by Cray Inc.
Thanks to Atomic Learning www.atomiclearning.com for sponsoring the development of this module.
www.atomiclearning.com
Copyright 2006 by Peter Karman. This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SWISH::HiLiter
To install Search::Tools, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Search::Tools
CPAN shell
perl -MCPAN -e shell install Search::Tools
For more information on module installation, please visit the detailed CPAN module installation guide.