The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lucy::Index::DataWriter - Write data to an index.

SYNOPSIS

    # Abstract base class.

DESCRIPTION

DataWriter is an abstract base class for writing index data, generally in segment-sized chunks. Each component of an index – e.g. stored fields, lexicon, postings, deletions – is represented by a DataWriter/DataReader pair.

Components may be specified per index by subclassing Architecture.

CONSTRUCTORS

new

    my $writer = MyDataWriter->new(
        snapshot   => $snapshot,      # required
        segment    => $segment,       # required
        polyreader => $polyreader,    # required
    );

Abstract constructor.

  • snapshot - The Snapshot that will be committed at the end of the indexing session.

  • segment - The Segment in progress.

  • polyreader - A PolyReader representing all existing data in the index. (If the index is brand new, the PolyReader will have no sub-readers).

ABSTRACT METHODS

add_segment

    $data_writer->add_segment(
        reader  => $reader,   # required
        doc_map => $doc_map,  # default: undef
    );

Add content from an existing segment into the one currently being written.

  • reader - The SegReader containing content to add.

  • doc_map - An array of integers mapping old document ids to new. Deleted documents are mapped to 0, indicating that they should be skipped.

finish

    $data_writer->finish();

Complete the segment: close all streams, store metadata, etc.

format

    my $int = $data_writer->format();

Every writer must specify a file format revision number, which should increment each time the format changes. Responsibility for revision checking is left to the companion DataReader.

METHODS

delete_segment

    $data_writer->delete_segment($reader);

Remove a segment’s data. The default implementation is a no-op, as all files within the segment directory will be automatically deleted. Subclasses which manage their own files outside of the segment system should override this method and use it as a trigger for cleaning up obsolete data.

  • reader - The SegReader containing content to merge, which must represent a segment which is part of the the current snapshot.

merge_segment

    $data_writer->merge_segment(
        reader  => $reader,   # required
        doc_map => $doc_map,  # default: undef
    );

Move content from an existing segment into the one currently being written.

The default implementation calls add_segment() then delete_segment().

  • reader - The SegReader containing content to merge, which must represent a segment which is part of the the current snapshot.

  • doc_map - An array of integers mapping old document ids to new. Deleted documents are mapped to 0, indicating that they should be skipped.

metadata

    my $hashref = $data_writer->metadata();

Arbitrary metadata to be serialized and stored by the Segment. The default implementation supplies a hash with a single key-value pair for “format”.

get_snapshot

    my $snapshot = $data_writer->get_snapshot();

Accessor for “snapshot” member var.

get_segment

    my $segment = $data_writer->get_segment();

Accessor for “segment” member var.

get_polyreader

    my $poly_reader = $data_writer->get_polyreader();

Accessor for “polyreader” member var.

get_schema

    my $schema = $data_writer->get_schema();

Accessor for “schema” member var.

get_folder

    my $folder = $data_writer->get_folder();

Accessor for “folder” member var.

INHERITANCE

Lucy::Index::DataWriter isa Clownfish::Obj.