Marvin Humphrey > KinoSearch1-1.01 > KinoSearch1::Analysis::TokenBatch

Download:
KinoSearch1-1.01.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Source  

Add many tokens to the batch, by supplying the string to be tokenized, and arrays of token starts and token ends (specified in bytes).

NAME ^

KinoSearch1::Analysis::TokenBatch - a collection of tokens

SYNOPSIS ^

    while ( $batch->next ) {
        $batch->set_text( lc( $batch->get_text ) );
    }

EXPERIMENTAL API ^

TokenBatch's API should be considered experimental and is likely to change.

DESCRIPTION ^

A TokenBatch is a collection of Tokens which you can add to, then iterate over.

METHODS ^

new

    my $batch = KinoSearch1::Analysis::TokenBatch->new;

Constructor.

append

    $batch->append( $text, $start_offset, $end_offset, $pos_inc );

Add a Token to the end of the batch. Accepts either three or four arguments: text, start_offset, end_offset, and an optional position increment which defaults to 1 if not supplied. For a description of what these arguments mean, see the docs for Token.

next

    while ( $batch->next ) {
        # ...
    }

Proceed to the next token in the TokenBatch. Returns true if the TokenBatch ends up located at valid token.

ACCESSOR METHODS ^

All of TokenBatch's accessor methods affect the current Token. Calling any of these methods when the TokenBatch is not located at a valid Token will trigger an exception.

set_text get_text

Set/get the text of the current Token.

set_start_offset get_start_offset

Set/get the start_offset of the current Token.

set_end_offset get_end_offset

Set/get the end_offset of the current Token.

set_pos_inc get_pos_inc

Set/get the position increment of the current Token.

COPYRIGHT ^

Copyright 2005-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc. ^

See KinoSearch1 version 1.01.

syntax highlighting: