Nick Wellnhofer > Lucy > Lucy::Search::Compiler

Download:
Lucy-0.6.0.4.tar.gz

Dependencies

Annotate this POD

Website

View/Report Bugs
Module Version: 0.006000004   Source  

NAME ^

Lucy::Search::Compiler - Query-to-Matcher compiler.

SYNOPSIS ^

    # (Compiler is an abstract base class.)
    package MyCompiler;
    use base qw( Lucy::Search::Compiler );

    sub make_matcher {
        my $self = shift;
        return MyMatcher->new( @_, compiler => $self );
    }

DESCRIPTION ^

The purpose of the Compiler class is to take a specification in the form of a Query object and compile a Matcher object that can do real work.

The simplest Compiler subclasses – such as those associated with constant-scoring Query types – might simply implement a make_matcher() method which passes along information verbatim from the Query to the Matcher’s constructor.

However it is common for the Compiler to perform some calculations which affect it’s “weight” – a floating point multiplier that the Matcher will factor into each document’s score. If that is the case, then the Compiler subclass may wish to override get_weight(), sum_of_squared_weights(), and apply_norm_factor().

Compiling a Matcher is a two stage process.

The first stage takes place during the Compiler’s construction, which is where the Query object meets a Searcher object for the first time. Searchers operate on a specific document collection and they can tell you certain statistical information about the collection – such as how many total documents are in the collection, or how many documents in the collection a particular term is present in. Lucy’s core Compiler classes plug this information into the classic TF/IDF weighting algorithm to adjust the Compiler’s weight; custom subclasses might do something similar.

The second stage of compilation is make_matcher(), method, which is where the Compiler meets a SegReader object. SegReaders are associated with a single segment within a single index on a single machine, and are thus lower-level than Searchers, which may represent a document collection spread out over a search cluster (comprising several indexes and many segments). The Compiler object can use new information supplied by the SegReader – such as whether a term is missing from the local index even though it is present within the larger collection represented by the Searcher – when figuring out what to feed to the Matchers’s constructor, or whether make_matcher() should return a Matcher at all.

CONSTRUCTORS ^

new

    my $compiler = MyCompiler->SUPER::new(
        parent     => $my_query,
        searcher   => $searcher,
        similarity => $sim,        # default: undef
        boost      => undef,       # default: see below
    );

Abstract constructor.

ABSTRACT METHODS ^

make_matcher

    my $matcher = $compiler->make_matcher(
        reader     => $reader,      # required
        need_score => $need_score,  # required
    );

Factory method returning a Matcher.

Returns: a Matcher, or undef if the Matcher would have matched no documents.

METHODS ^

get_weight

    my $float = $compiler->get_weight();

Return the Compiler’s numerical weight, a scoring multiplier. By default, returns the object’s boost.

get_similarity

    my $similarity = $compiler->get_similarity();

Accessor for the Compiler’s Similarity object.

get_parent

    my $query = $compiler->get_parent();

Accessor for the Compiler’s parent Query object.

sum_of_squared_weights

    my $float = $compiler->sum_of_squared_weights();

Compute and return a raw weighting factor. (This quantity is used by normalize()). By default, simply returns 1.0.

apply_norm_factor

    $compiler->apply_norm_factor($factor);

Apply a floating point normalization multiplier. For a TermCompiler, this involves multiplying its own weight by the supplied factor; combining classes such as ORCompiler would apply the factor recursively to their children.

The default implementation is a no-op; subclasses may wish to multiply their internal weight by the supplied factor.

normalize

    $compiler->normalize();

Take a newly minted Compiler object and apply query-specific normalization factors. Should be invoked by Query subclasses during make_compiler() for top-level nodes.

For a TermQuery, the scoring formula is approximately:

    (tf_d * idf_t / norm_d) * (tf_q * idf_t / norm_q)

normalize() is theoretically concerned with applying the second half of that formula to a the Compiler’s weight. What actually happens depends on how the Compiler and Similarity methods called internally are implemented.

INHERITANCE ^

Lucy::Search::Compiler isa Lucy::Search::Query isa Clownfish::Obj.

syntax highlighting: