The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Elastic::Model::Index - Create and administer indices in Elasticsearch

VERSION

version 0.50

SYNOPSIS

    $index = $model->namespace('myapp')->index;
    $index = $model->namespace('myapp')->index('index_name');

    $index->create( settings => \%settings );

    $index->reindex( 'old_index' );

See also "SYNOPSIS" in Elastic::Model::Role::Index.

DESCRIPTION

Elastic::Model::Index objects are used to create and administer indices in an Elasticsearch cluster.

See Elastic::Model::Role::Index for more about usage. See Elastic::Manual::Scaling for more about how indices can be used in your application.

METHODS

create()

    $index = $index->create();
    $index = $index->create( settings => \%settings, types => \@types );

Creates an index called name (which defaults to $namespace->name).

The type mapping is automatically generated from the attributes of your doc classes listed in the namespace. Similarly, any custom analyzers required by your classes are added to the index \%settings that you pass in:

    $index->create( settings => {number_of_shards => 1} );

To create an index with a sub-set of the types known to the namespace, pass in a list of @types.

    $index->create( types => ['user','post' ]);

reindex()

    # reindex $domain_name to $index->name
    $index->reindex( $domain_name );

    # more options
    $index->reindex(
        $domain,

        repoint_uids    => 1,
        size            => 1000,
        bulk_size       => 1000,
        scan            => '2m',
        quiet           => 0,

        transform       => sub {...},

        on_conflict     => sub {...} | 'IGNORE'
        on_error        => sub {...} | 'IGNORE'
        uid_on_conflict => sub {...} | 'IGNORE'
        uid_on_error    => sub {...} | 'IGNORE'
    );

While you can add to the mapping of an index, you can't change what is already there. Especially during development, you will need to reindex your data to a new index.

"reindex()" reindexes your data from domain $domain_name into an index called $index->name. The new index is created if it doesn't already exist.

See Elastic::Manual::Reindex for more about reindexing strategies. The documentation below explains what each parameter does:

size

The size parameter defaults to 1,000 and controls how many documents are pulled from $domain in each request. See "size" in Elastic::Model::View.

Note: documents are pulled from the domain/view using "scan()" in Elastic::Model::View, which can pull a maximum of size * number_of_primary_shards in a single request. If you have large docs or underpowered servers, you may want to change the size parameter.

bulk_size

The bulk_size parameter defaults to size and controls how many documents are indexed into the new domain in a single bulk-indexing request.

scan

scan is the same as "scan" in Elastic::Model::View - it controls how long Elasticsearch should keep the "scroll" live between requests. Defaults to '2m'. Increase this if the reindexing process is slow and you get scroll timeouts.

repoint_uids

If true (the default), "repoint_uids()" will be called automatically to update any UIDs (which point at the old index) in indices other than the ones currently being reindexed.

transform

If you need to change the structure/data of your doc while reindexing, you can pass a transform coderef. This will be called before any changes have been made to the doc, and should return the new doc. For instance, to convert the single-value tag field to an array of tags:

    $index->reindex(
        'new_index',
        'transform' => sub {
            my $doc = shift;
            $doc->{_source}{tags} = [ delete $doc->{_source}{tag} ];
            return $doc
        }
    );
on_conflict / on_error

If you are indexing to the new index at the same time as you are reindexing, you may get document conflicts. You can handle the conflicts with a coderef callback, or ignore them by by setting on_conflict to 'IGNORE':

    $index->reindex( 'myapp_v2', on_conflict => 'IGNORE' );

Similarly, you can pass an on_error handler which will handle other errors, or all errors if no on_conflict handler is defined.

See "Using-callbacks" in Search::Elasticsearch::Bulk for more.

uid_on_conflict / uid_on_error

These work in the same way as the on_conflict or on_error handlers, but are passed to "repoint_uids()" if repoint_uids is true.

quiet

By default, "reindex()" prints out progress information. To silence this, set quiet to true:

    $index->reindex( 'myapp_v2', quiet   => 1 );

repoint_uids()

    $index->repoint_uids(
        uids        => [ ['myapp_v1','user',10],['myapp_v1','user',12]...],
        exclude     => ['myapp_v2'],
        scan        => '2m',
        size        => 1000,
        bulk_size   => 1000,
        quiet       => 0,

        on_conflict => sub {...} | 'IGNORE'
        on_error    => sub {...} | 'IGNORE'
    );

The purpose of "repoint_uids()" is to update stale UID attributes to point to a new index. It is called automatically from "reindex()".

Parameters:

uids

uids is a hash ref the stale UIDs which should be updated.

For instance: you have reindexed myapp_v1 to myapp_v2, but domain other has documents with UIDs which point to myapp_v1. You can updated these by passing a list of the old UIDs, as follows:

    $index = $namespace->index('myapp_v2');
    $index->repoint_uids(
        uids    => {                        # index
            myapp_v1 => {                   # type
                user => {
                    1 => 1,                 # ids
                    2 => 1,
                }
            }
        }
    );
exclude

By default, all indices known to the model are updated. You can exclude indices with:

    $index->repoint_uids(
        uids    => \@uids,
        exclude => ['index_1', ...]
    );
size

This is the same as the size parameter to "reindex()".

bulk_size

This is the same as the bulk_size parameter to "reindex()".

scan

This is the same as the scan parameter to "reindex()".

quiet

This is the same as the quiet parameter to "reindex()".

on_conflict / on_error

These are the same as the uid_on_conflict and uid_on_error handlers in "reindex()".

doc_updater()

    $coderef = $index->doc_updater( $doc_updater, $uid_updater );

"doc_updater()" is used by "reindex()" and "repoint_uids()" to update the top-level doc and any UID attributes with callbacks.

The $doc_updater receives the $doc as its only attribute, and should return the $doc after making any changes:

    $doc_updater = sub {
        my ($doc) = @_;
        $doc->{_index} = 'foo';
        return $doc
    };

The $uid_updater receives the UID as its only attribute:

    $uid_updater = sub {
        my ($uid) = @_;
        $uid->{index} = 'foo'
    };

IMPORTED ATTRIBUTES

Attributes imported from Elastic::Model::Role::Index

namespace

name

IMPORTED METHODS

Methods imported from Elastic::Model::Role::Index

close()

open()

refresh()

delete()

update_analyzers()

update_settings()

delete_mapping()

is_alias()

is_index()

SEE ALSO

AUTHOR

Clinton Gormley <drtech@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Clinton Gormley.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.