View on
MetaCPAN is shutting down
For details read Perl NOC. After June 25th this page will redirect to
Peter Karman > Search-Tools > Search::Tools::TokenListUtils



Annotate this POD



Open  0
View/Report Bugs
Module Version: 1.007   Source  


Search::Tools::TokenListUtils - mixin methods for TokenList and TokenListPP


 my $tokens = $tokenizer->tokenize( $string );
 if ( $tokens->str eq $string) {
    print "string is same, before and after tokenize()\n";
 else {
    warn "I'm filing a bug report against Search::Tools right away!\n";
 my ($start_pos, $end_pos) = $tokens->get_window( 5, 20 );
 # $start_pos probably == 0
 # $end_pos probably   == 25
 my $slice = $tokens->get_window_pos( 5, 20 );
 for my $token (@$slice) {
    print "token = $token\n";


Search::Tools::TokenListUtils contains pure-Perl methods inhertited by both Search::Tools::TokenList and Search::Tools::TokenListPP.



Returns a serialized version of the TokenList. If you haven't altered the TokenList since you got it from tokenize(), then str() returns a scalar string identical to (but not the same as) the string you passed to tokenize().

Both Search::Tools::TokenList and TokenListPP are overloaded to stringify to the str() value.

get_window( pos [, size, as_sentence] )

Returns array with two values: start and end positions for the array of length size on either side of pos. Like taking a slice of the TokenList.

Note that size is the number of tokens not matches. So if you're looking for the number of "words", think about size*2.

Note too that size is the number of tokens on one side of pos. So the entire window width (length of the returned slice) is size*2 +/-1. The window is guaranteed to be bounded by matches.

If as_sentence is true, the window is shifted to try and match the first token prior to pos that returns true for is_sentence_start().

get_window_tokens( pos [, size] )

Like get_window() but returns an array ref of a slice of the TokenList containing Tokens.


Returns a reference to an array of arrays, where each child array is a "sentence" worth of Token objects. You can stringify each sentence array like:

 my $sentences = $tokenlist->as_sentences;
 for my $s (@$sentences) {
     printf("sentence: %s\n", join("", map {"$_"} @$s));

If you pass a single true value to as_sentences(), then the array returned will consist of plain scalar strings with whitespace normalized.


Peter Karman <>


Please report any bugs or feature requests to bug-search-tools at, or through the web interface at I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.


You can find documentation for this module with the perldoc command.

    perldoc Search::Tools

You can also look for information at:


Copyright 2009 by Peter Karman.

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: