Search::Tools::QueryParser - convert string queries into objects
use Search::Tools::QueryParser; my $qparser = Search::Tools::QueryParser->new( # regex to define a query term (word) term_re => qr/\w+(?:'\w+)*/, # or assemble a definition from the following word_characters => q/\w\'\-/, ignore_first_char => q/\+\-/, ignore_last_char => q/\+\-/, term_min_length => 1, # words to ignore stopwords => [qw( the )], # query operators and_word => q(and), or_word => q(or), not_word => q(not), phrase_delim => q("), treat_uris_like_phrases => 1, ignore_fields => [qw( site )], wildcard => quotemeta(q(*)), # language-specific settings stemmer => &your_stemmer_here, charset => 'iso-8859-1', lang => 'en_US', locale => 'en_US.iso-8859-1', # development help debug => 0, ); my $query = $qparser->parse(q(the quick color:brown "fox jumped")); my $terms = $query->terms; # ['quick', 'brown', '"fox jumped"'] # a Search::Tools::RegEx object my $regexp = $query->regexp_for($terms->[0]); # the Search::Query::Dialect tree() my $tree = $query->tree; print "$query\n"; # the quick color:brown "fox jumped" print $query->str . "\n"; # same thing
Search::Tools::QueryParser turns search queries into objects that can be applied for highlighting, spelling, and extracting matching snippets from source documents.
The new() method instantiates a QueryParser object. With the exception of parse(), all the following methods can be passed as key/value pairs in new().
Called internally by new().
The parse() method parses query and returns a Search::Tools::Query object.
query must be a scalar string.
NOTE: All queries are converted to UTF-8. See the charset param.
charset
The stemmer function is used to find the root 'stem' of a word. There are many stemming algorithms available, including many on CPAN. The stemmer function should expect to receive two parameters: the QueryParser object and the word to be stemmed. It should return exactly one value: the stemmed word.
Example stemmer function:
use Lingua::Stem; my $stemmer = Lingua::Stem->new; sub mystemfunc { my ($parser, $word) = @_; return $stemmer->stem($word)->[0]; } # and pass to the new() method: my $qparser = Search::Tools::QueryParser->new(stemmer => \&mystemfunc);
A list of common words that should be ignored in parsing out keyword terms. May be either a string that will be split on whitespace, or an array ref.
NOTE: If a stopword is contained in a phrase, then the phrase will be tokenized into words based on whitespace, then the stopwords removed.
String of characters to strip from the beginning of all words.
String of characters to strip from the end of all words.
All queries are run through Perl's built-in lc() function before parsing. The default is 1 (true). Set to 0 (false) to preserve case.
1
0
Value may be a hash or array ref of field names to ignore in query parsing. Example:
ignore_fields => [qw( site )]
would parse the query:
site:foo.bar AND baz # terms = baz
Set the default field to be used in parsing the query, if no field is specified. The default is the empty string (the Search::Query::Parser default).
Boolean (default true (1)).
If set to true, queries like foo@bar.com will be treated like a single phrase "foo bar com" instead of being split into three separate terms.
Default: and|near\d*
and|near\d*
Default: or
or
Default: not
not
Default: *
*
Set a locale explicitly. If not set, the locale is inherited from the LC_CTYPE environment variable.
LC_CTYPE
Imported function by locale pragma. Documented only to satisfy pod tests.
Base language. If not set, extracted from locale or defaults to en_US.
locale
en_US
Base charset used for converting queries to UTF-8. If not set, extracted from locale or defaults to iso-8859-1.
iso-8859-1
The default is Search::Tools::Query but you can set your own to subclass the Query object.
Search::Tools::Query
The default is Search::Query::Dialect::Native but you can set your own. See the Search::Query::Dialect documentation.
Search::Query::Dialect::Native
The special HTML chars &, < and > can pose problems in regexps against markup, so they are ignored in creating regular expressions if you include them in word_characters in new().
word_characters
Peter Karman <karman@cpan.org>
<karman@cpan.org>
Please report any bugs or feature requests to bug-search-tools at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-search-tools at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Search::Tools
You can also look for information at:
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Search-Tools
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Search-Tools
CPAN Ratings
http://cpanratings.perl.org/d/Search-Tools
Search CPAN
http://search.cpan.org/dist/Search-Tools/
Copyright 2009 by Peter Karman.
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Search::Query::Parser
To install Search::Tools, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Search::Tools
CPAN shell
perl -MCPAN -e shell install Search::Tools
For more information on module installation, please visit the detailed CPAN module installation guide.