NAME

Lucy::Analysis::Token - Unit of text.

SYNOPSIS

        my $token = Lucy::Analysis::Token->new(
            text         => 'blind',
            start_offset => 8,
            end_offset   => 13,
        );

        $token->set_text('mice');

DESCRIPTION

Token is the fundamental unit used by Apache Lucy’s Analyzer subclasses. Each Token has 5 attributes: text, start_offset, end_offset, boost, and pos_inc.

The text attribute is a Unicode string encoded as UTF-8.

start_offset is the start point of the token text, measured in Unicode code points from the top of the stored field; end_offset delimits the corresponding closing boundary. start_offset and end_offset locate the Token within a larger context, even if the Token’s text attribute gets modified – by stemming, for instance. The Token for “beating” in the text “beating a dead horse” begins life with a start_offset of 0 and an end_offset of 7; after stemming, the text is “beat”, but the start_offset is still 0 and the end_offset is still 7. This allows “beating” to be highlighted correctly after a search matches “beat”.

boost is a per-token weight. Use this when you want to assign more or less importance to a particular token, as you might for emboldened text within an HTML document, for example. (Note: The field this token belongs to must be spec’d to use a posting of type RichPosting.)

pos_inc is the POSition INCrement, measured in Tokens. This attribute, which defaults to 1, is a an advanced tool for manipulating phrase matching. Ordinarily, Tokens are assigned consecutive position numbers: 0, 1, and 2 for "three blind mice". However, if you set the position increment for “blind” to, say, 1000, then the three tokens will end up assigned to positions 0, 1, and 1001 – and will no longer produce a phrase match for the query "three blind mice".

CONSTRUCTORS

new

    my $token = Lucy::Analysis::Token->new(
        text         => $text,          # required
        start_offset => $start_offset,  # required
        end_offset   => $end_offset,    # required
        boost        => 1.0,            # optional
        pos_inc      => 1,              # optional
    );

text - A string.
start_offset - Start offset into the original document in Unicode code points.
start_offset - End offset into the original document in Unicode code points.
boost - Per-token weight.
pos_inc - Position increment for phrase matching.

METHODS

get_text

    my $text = $token->get_text;

Get the token's text.

set_text

    $token->set_text($text);

Set the token's text.

get_start_offset

    my $int = $token->get_start_offset();

get_end_offset

    my $int = $token->get_end_offset();

get_boost

    my $float = $token->get_boost();

get_pos_inc

    my $int = $token->get_pos_inc();

get_len

    my $int = $token->get_len();

INHERITANCE

Lucy::Analysis::Token isa Clownfish::Obj.

To install Lucy, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lucy

CPAN shell

perl -MCPAN -e shell
install Lucy

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)