lib/Lucy/Index/Similarity.pod

# ***********************************************
# 
# !!!! DO NOT EDIT !!!!
# 
# This file was auto-generated by Build.PL.
# 
# ***********************************************
# 
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
# 
#     http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

=encoding utf8

=head1 NAME

Lucy::Index::Similarity - Judge how well a document matches a query.

=head1 SYNOPSIS

    package MySimilarity;

    sub length_norm { return 1.0 }    # disable length normalization

    package MyFullTextType;
    use base qw( Lucy::Plan::FullTextType );

    sub make_similarity { MySimilarity->new }

=head1 DESCRIPTION

After determining whether a document matches a given query, a score must be
calculated which indicates how I<well> the document matches the query.  The
Similarity class is used to judge how “similar” the query and the document
are to each other; the closer the resemblance, they higher the document
scores.

The default implementation uses Lucene’s modified cosine similarity
measure.  Subclasses might tweak the existing algorithms, or might be used
in conjunction with custom Query subclasses to implement arbitrary scoring
schemes.

Most of the methods operate on single fields, but some are used to combine
scores from multiple fields.

=head1 CONSTRUCTORS

=head2 new

    my $sim = Lucy::Index::Similarity->new;

Constructor. Takes no arguments.

=head1 METHODS

=head2 length_norm

    my $float = $similarity->length_norm($num_tokens);

Dampen the scores of long documents.

After a field is broken up into terms at index-time, each term must be
assigned a weight.  One of the factors in calculating this weight is
the number of tokens that the original field was broken into.

Typically, we assume that the more tokens in a field, the less
important any one of them is – so that, e.g. 5 mentions of “Kafka” in
a short article are given more heft than 5 mentions of “Kafka” in an
entire book.  The default implementation of length_norm expresses this
using an inverted square root.

However, the inverted square root has a tendency to reward very short
fields highly, which isn’t always appropriate for fields you expect to
have a lot of tokens on average.

=head1 INHERITANCE

Lucy::Index::Similarity isa Clownfish::Obj.

=cut

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)