Peter Karman > Search-Query > Search::Query::Parser

Download:
Search-Query-0.304.tar.gz

Dependencies

Annotate this POD

Website

CPAN RT

Open  0
View/Report Bugs
Module Version: 0.304   Source  

NAME ^

Search::Query::Parser - convert query strings into query objects

SYNOPSIS ^

 use Search::Query;
 my $parser = Search::Query->parser(
    term_regex  => qr/[^\s()]+/,
    field_regex => qr/\w+/,
    op_regex    => qr/==|<=|>=|!=|=~|!~|[:=<>~#]/,

    # ops that admit an empty left operand
    op_nofield_regex => qr/=~|!~|[~:#]/,

    # case insensitive
    and_regex        => qr/\&|AND|ET|UND|E/i,
    or_regex         => qr/\||OR|OU|ODER|O/i,
    not_regex        => qr/NOT|PAS|NICHT|NON/i,

    default_field  => 'myfield',  # or ['myfield', 'myfield2']
    phrase_delim   => q/"/,
    default_boolop => '+',
    query_class    => 'Search::Query::Dialect::Native',
    field_class    => 'Search::Query::Field',
    query_class_opts => {
        default_field => 'foo', # or ['foo', 'bar']
    },
    
    # a generous mode, overlooking boolean-parser syntax errors
    sloppy              => 0,
    sloppy_term_regex   => qr/[\.\w]+/,
    fixup               => 0,
    
    # if set, this special term indicates a NULL query
    null_term           => 'NULL',
 );

 my $query = $parser->parse('+hello -world now');
 print $query;

DESCRIPTION ^

Search::Query::Parser is a fork of Search::QueryParser that supports multiple query dialects.

The Parser class transforms a query string into a Dialect object structure to be handled by external search engines.

The query string can contain simple terms, "exact phrases", field names and comparison operators, '+/-' prefixes, parentheses, and boolean connectors.

The parser can be customized using regular expressions for specific notions of "term", "field name" or "operator" -- see the new method.

The Dialect object resulting from a parsed query is a tree of terms and operators. Each Dialect can be re-serialized as a string using the stringify() method, or simply by printing the Dialect object, since the string-related Perl operations are overloaded using stringify().

QUERY STRING ^

The query string is decomposed into Clause objects, where each Clause has an optional sign prefix, an optional field name and comparison operator, and a mandatory value.

Sign prefix

Prefix '+' means that the item is mandatory. Prefix '-' means that the item must be excluded. No prefix means that the item will be searched for, but is not mandatory.

See also section "Boolean connectors" below, which is another way to combine items into a query.

Field name and comparison operator

Internally, each query item has a field name and comparison operator; if not written explicitly in the query, these take default values '' (empty field name) and ':' (colon operator).

Operators have a left operand (the field name) and a right operand (the value to be compared with); for example, foo:bar means "search documents containing term 'bar' in field 'foo'", whereas foo=bar means "search documents where field 'foo' has exact value 'bar'".

Here is the list of admitted operators with their intended meaning:

:

treat value as a term to be searched within field. This is the default operator.

~ or =~

treat value as a regex; match field against the regex.

Note that ~ after a phrase indicates a proximity assertion:

 "foo bar"~5

means "match 'foo' and 'bar' within 5 positions of each other."

!~

negation of above

== or =, <=, >=, !=, <, >

classical relational operators

#

Inclusion in the set of comma-separated integers supplied on the right-hand side.

Operators :, ~, =~, !~ and # admit an empty left operand (so the field name will be ''). Search engines will usually interpret this as "any field" or "the whole data record". But see the default_field feature.

Value

A value (right operand to a comparison operator) can be

Boolean connectors

Queries can contain boolean connectors 'AND', 'OR', 'NOT' (or their equivalent in some other languages -- see the *_regex features in new()). This is mere syntactic sugar for the '+' and '-' prefixes : a AND b is equivalent to +a +b; a OR b is equivalent to (a b); NOT a is equivalent to -a. +a OR b does not make sense, but it is translated into (a b), under the assumption that the user understands "OR" better than a '+' prefix. -a OR b does not make sense either, but has no meaningful approximation, so it is rejected.

Combinations of AND/OR clauses must be surrounded by parentheses, i.e. (a AND b) OR c or a AND (b OR c) are allowed, but a AND b OR c is not.

The NEAR connector is treated like the proximity phrase assertion.

 foo NEAR5 bar

is treated as if it were:

 "foo bar"~5

See the near_regex option.

METHODS ^

new

The following attributes may be initialized in new(). These are also available as get/set methods on the returned Parser object.

default_boolop
term_regex
field_regex
op_regex
op_nofield_regex
and_regex
or_regex
not_regex
near_regex
range_regex
default_field

Applied to all terms where no field is defined. The default value is undef (no default).

default_op

The operator used when default_field is applied.

fields
phrase_delim
query_class

dialect is an alias for query_class.

field_class
clause_class
query_class_opts

Will be passed to query_class new() method each time a query is parse()'d.

dialect_opts

Alias for query_class_opts.

croak_on_error

Default value is false (0). Set to true to automatically throw an exception via Carp::croak() if parse() would return undef.

term_expander

A function reference for transforming query terms after they have been parsed. Examples might include adding alternate spellings, synonyms, or expanding wildcards based on lexicon listings.

Example:

 my $parser = Search::Query->parser(
    term_expander => sub {
        my ($term, $field) = @_;
        return ($term) if ref $term;    # skip ranges
        return ( qw( one two three ), $term );
    }
 );

 my $query = $parser->parse("foo=bar")
 print "$query\n";  # +foo=(one OR two OR three OR bar)

The term_expander reference should expect two arguments: the term value and, if available, the term field name. It should return an array of values.

The term_expander reference is called internally during the parse() method, before any field alias expansion or validation is performed.

sloppy( 0|1 )

If the string passed to parse() has any incorrect or unsupported syntax in it, the default behavior is for parsing to stop immediately, error() to be set, and for parse() to return undef.

In certain cases (as on a web form) this is undesirable. Set sloppy mode to true to fallback to non-boolean evaluation of the string, which in most cases should still return a Dialect object.

Example:

 $parser->parse('foo -- OR bar');  # if sloppy==0, returns undef
 $parser->parse('foo -- OR bar');  # if sloppy==1, equivalent to 'foo bar'
sloppy_term_regex

The regex definition used to match a term when sloppy==1.

fixup( 0|1 )

Attempt to fix syntax errors like the lack of a closing parenthesis or a missing double-quote. Different than sloppy() which will not attempt to fix broken syntax, but should probably be used together if you really do not care about strict syntax checking.

null_term

If set to term, the null_term feature will treat field value of term as if it was undefined. Example:

 $parser->parse('foo=');     # throws fatal error
 $parser->null_term('NULL');
 $parser->parse('foo=NULL'); # field foo has NULL value

This feature is most useful with the SQL dialect, where you might want to find NULL values. Use it like:

 my $parser = Search::Query->parser(
     dialect    => 'SQL',
     null_term  => 'NULL'
 );
 my $query = $parser->parse('foo!=NULL');
 print $query;  # prints "foo is not NULL"

BUILDARGS

Internal method for mangling constructor params.

BUILD

Called internally to initialize the object.

error

Returns the last error message.

clear_error

Sets error message to undef.

get_field( name )

Returns Field object for name or undef if there isn't one defined.

set_fields( fields )

Set the fields structure. Called internally by BUILD() if you pass a fields key/value pair to new().

The structure of fields may be one of the following:

 my $fields = {
    field1 => 1,
    field2 => { alias_for => 'field1' },
    field3 => Search::Query::Field->new( name => 'field3' ),
    field4 => { alias_for => [qw( field1 field3 )] },
 };

 # or

 my $fields = [
    'field1',
    { name => 'field2', alias_for => 'field1' },
    Search::Query::Field->new( name => 'field3' ),
    { name => 'field4', alias_for => [qw( field1 field3 )] },
 ];

set_field( name => field_object )

Sets field name to Field object field_object.

parse( string )

Returns a Search::Query::Dialect object of type query_class.

If there is a syntax error in string, parse() will return undef and set error().

AUTHOR ^

Peter Karman, <karman at cpan.org>

BUGS ^

Please report any bugs or feature requests to bug-search-query at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Query. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT ^

You can find documentation for this module with the perldoc command.

    perldoc Search::Query

You can also look for information at:

ACKNOWLEDGEMENTS ^

This module started as a fork of Search::QueryParser by Laurent Dami.

COPYRIGHT & LICENSE ^

Copyright 2010 Peter Karman.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

syntax highlighting: