The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::Query::Simple - Match text against simple query expression and return relevance value for ranking

SYNOPSIS

    use Text::Query::Simple;
    
    # Constructor
    $query = Text::Query::Simple->new([QSTRING] [OPTIONS]);

    # Methods
    $query->prepare(QSTRING [OPTIONS]);
    $query->match([TARGET]);
    $query->matchscalar([TARGET]);

DESCRIPTION

This module provides an object that tests a string or list of strings against a query expression similar to an AltaVista "simple query" and returns a "relevance value." Elements of the query expression may be regular expressions or literal text, and may be assigned weights.

Query expressions are compiled into an internal form when a new object is created or the prepare method is called; they are not recompiled on each match.

Query expressions consist of words (sequences of non-whitespace), regexps or phrases (quoted strings) separated by whitespace. Words or phrases prefixed with a + must be present for the expression to match; words or phrases prefixed with a - must be absent for the expression to match.

A successful match returns a count of the number of times any of the words (except ones prefixed with -) appeared in the text. This type of result is useful for ranking documents according to relevance.

Words or phrases may optionally be followed by a number in parentheses (no whitespace is allowed between the word or phrase and the parenthesized number). This number specifies the weight given to the word or phrase; it will be added to the count each time the word or phrase appears in the text. If a weight is not given, a weight of 1 is assumed.

EXAMPLES

  use Text::Query::Simple;
  my $q=new Text::Query::Simple('+hello world');
  die "bad query expression" if not defined $q;
  $count=$q->match;
  ...
  $q->prepare('goodbye adios -"ta ta",-litspace=>1);
  #requires single space between the two ta's
  if ($q->match($line,-case=>1)) {
  #doesn't match "Goodbye"
  ...
  $q->prepare('\\bintegrate\\b',-regexp=>1);
  #won't match "disintegrated"
  ...
  $q->prepare('information(2) retrieval');
  #information has twice the weight of retrieval

CONSTRUCTOR

new ([QSTRING] [OPTIONS])

This is the constructor for a new Text::Query::Simple object. If a QSTRING is given it will be compiled to internal form.

OPTIONS are passed in a hash like fashion, using key and value pairs. Possible options are:

-case - If true, do case-sensitive match.

-litspace - If true, match spaces (except between operators) in QSTRING literally. If false, match spaces as \s+.

-regexp - If true, treat patterns in QSTRING as regular expressions rather than literal text.

-whole - If true, match whole words only, not substrings of words.

The constructor will return undef if a QSTRING was supplied and had illegal syntax.

METHODS

prepare (QSTRING [OPTIONS])

Compiles the query expression in QSTRING to internal form and sets any options (same as in the constructor). prepare may be used to change the query expression and options for an existing query object. If OPTIONS are omitted, any options set by a previous call to the constructor or prepare remain in effect.

This method returns a reference to the query object if the syntax of the expression was legal, or undef if not.

match ([TARGET])

If TARGET is a scalar, match returns the number of words in the string specified by TARGET that match the query object's query expression. If TARGET is not given, the match is made against $_.

If TARGET is an array, match returns a list of references to anonymous arrays consisting of each element followed by its match count. The list is sorted in descending order by match count. If the elements of TARGET were anonymous arrays, the match count is appended to each element. This allows arbitrary information (such as a filename) to be associated with each element.

If TARGET is a reference to an array, match returns a reference to a sorted list of matching items, with counts, for all elements.

matchscalar ([TARGET])

Behaves just like MATCH when TARGET is a scalar or is not given. Slightly faster than MATCH under these circumstances.

RESTRICTIONS

This module requires Perl 5.005 or higher due to the use of evaluated expressions in regexes

AUTHOR

Eric Bohlman (ebohlman@netcom.com)

CREDITS

The parse_tokens routine was adapted from the parse_line routine in Text::Parsewords.

COPYRIGHT

Copyright (c) 1998 Eric Bohlman. All rights reserved. This program is free software; you can redistribute and/or modify it under the same terms as Perl itself.