
Search::Tools::QueryParser - convert string queries into objects

use Search::Tools::QueryParser;
my $qparser = Search::Tools::QueryParser->new(
# regex to define a query term (word)
term_re => qr/\w+(?:'\w+)*/,
# or assemble a definition from the following
word_characters => q/\w\'\-/,
ignore_first_char => q/\+\-/,
ignore_last_char => q/\+\-/,
term_min_length => 1,
# words to ignore
stopwords => [qw( the )],
# query operators
and_word => q(and),
or_word => q(or),
not_word => q(not),
phrase_delim => q("),
treat_uris_like_phrases => 1,
ignore_fields => [qw( site )],
wildcard => quotemeta(q(*)),
# language-specific settings
stemmer => &your_stemmer_here,
charset => 'iso-8859-1',
lang => 'en_US',
locale => 'en_US.iso-8859-1',
# development help
debug => 0,
);
my $query = $qparser->parse(q(the quick color:brown "fox jumped"));
my $terms = $query->terms; # ['quick', 'brown', '"fox jumped"']
# a Search::Tools::RegEx object
my $regexp = $query->regexp_for($terms->[0]);
# the Search::Query::Dialect tree()
my $tree = $query->tree;
print "$query\n"; # the quick color:brown "fox jumped"
print $query->str . "\n"; # same thing

Search::Tools::QueryParser turns search queries into objects that can be applied for highlighting, spelling, and extracting matching snippets from source documents.

The new() method instantiates a QueryParser object. With the exception of parse(), all the following methods can be passed as key/value pairs in new().
Called internally by new().
The parse() method parses query and returns a Search::Tools::Query object.
query must be a scalar string.
NOTE: All queries are converted to UTF-8. See the charset param.
The stemmer function is used to find the root 'stem' of a word. There are many stemming algorithms available, including many on CPAN. The stemmer function should expect to receive two parameters: the QueryParser object and the word to be stemmed. It should return exactly one value: the stemmed word.
Example stemmer function:
use Lingua::Stem;
my $stemmer = Lingua::Stem->new;
sub mystemfunc {
my ($parser, $word) = @_;
return $stemmer->stem($word)->[0];
}
# and pass to the new() method:
my $qparser = Search::Tools::QueryParser->new(stemmer => \&mystemfunc);
A list of common words that should be ignored in parsing out keyword terms. May be either a string that will be split on whitespace, or an array ref.
NOTE: If a stopword is contained in a phrase, then the phrase will be tokenized into words based on whitespace, then the stopwords removed.
String of characters to strip from the beginning of all words.
String of characters to strip from the end of all words.
All queries are run through Perl's built-in lc() function before parsing. The default is 1 (true). Set to 0 (false) to preserve case.
Value may be a hash or array ref of field names to ignore in query parsing. Example:
ignore_fields => [qw( site )]
would parse the query:
site:foo.bar AND baz # terms = baz
Set the default field to be used in parsing the query, if no field is specified. The default is the empty string (the Search::Query::Parser default).
Boolean (default true (1)).
If set to true, queries like foo@bar.com will be treated like a single phrase "foo bar com" instead of being split into three separate terms.
Default: and|near\d*
Default: or
Default: not
Default: *
Set a locale explicitly. If not set, the locale is inherited from the LC_CTYPE environment variable.
Base language. If not set, extracted from locale or defaults to en_US.
Base charset used for converting queries to UTF-8. If not set, extracted from locale or defaults to iso-8859-1.
The default is Search::Tools::Query but you can set your own to subclass the Query object.
The default is Search::Query::Dialect::Native but you can set your own. See the Search::Query::Dialect documentation.

The special HTML chars &, < and > can pose problems in regexps against markup, so they are ignored in creating regular expressions if you include them in word_characters in new().

Peter Karman <karman@cpan.org>

Please report any bugs or feature requests to bug-search-tools at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

You can find documentation for this module with the perldoc command.
perldoc Search::Tools
You can also look for information at:

Copyright 2009 by Peter Karman.
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
