The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

ElasticSearch::Transport - Base class for communicating with ElasticSearch

DESCRIPTION

ElasticSearch::Transport is a base class for the modules which communicate with the ElasticSearch server.

It handles failover to the next node in case the current node closes the connection.

All requests are round-robin'ed to all live servers as returned by /_cluster/nodes, except we shuffle the server list when we retrieve it, and thus avoid having all our instances make their first request to the same server.

On the first request and every max_requests after that (default 10,000), the list of live nodes is automatically refreshed. This can be disabled by setting max_requests to 0.

Regardless of the max_requests setting, a list of live nodes will still be retrieved on the first request. This may not be desirable behaviour if, for instance, you are connecting to remote servers which use internal IP addresses, or which don't allow remote nodes() requests.

If you want to disable this behaviour completely, set no_refresh to 1, in which case the transport module will round robin through the servers list only. Failed nodes will be removed from the list (but added back in every max_requests or when all nodes have failed):

The HTTP clients check that the post body content length is not greater than the max_content_length, which defaults to 104,857,600 bytes (100MB) - the default that is configured in Elasticsearch. From version 0.19.12, when no_refresh set to false, the HTTP transport clients will auto-detect the minimum max_content_length from the cluster.

Currently, the available backends are:

You shouldn't need to talk to the transport modules directly - everything happens via the main ElasticSearch class.

SYNOPSIS

    use ElasticSearch;
    my $e = ElasticSearch->new(
        servers            => 'search.foo.com:9200',
        transport          => 'httplite',
        timeout            => '10',
        no_refresh         => 0 | 1,
        deflate            => 0 | 1,
        max_content_length => 104_857_600,
    );

    my $t = $e->transport;

    $t->max_requests(5)             # refresh_servers every 5 requests
    $t->protocol                    # eg 'http'
    $t->next_server                 # next node to use
    $t->current_server              # eg '127.0.0.1:9200' ie last used node
    $t->default_servers             # seed servers passed in to new()

    $t->servers                     # eg ['192.168.1.1:9200','192.168.1.2:9200']
    $t->servers(@servers);          # set new 'live' list

    $t->refresh_servers             # refresh list of live nodes

    $t->clear_clients               # clear all open clients

    $t->no_refresh(0|1)             # don't retrieve the live node list
                                    # instead, use just the nodes specified

    $t->deflate(0|1);               # should ES deflate its responses
                                    # useful if ES is on a remote network.
                                    # ES needs compression enabled with
                                    #     http.compression: true

    $t->max_content_length(1000);   # set the max HTTP body content length

    $t->register('foo',$class)      # register new Transport backend

WHICH TRANSPORT SHOULD YOU USE

Although the thrift interface has the right buzzwords (binary, compact, sockets), the generated Perl code is very slow. Until that is improved, I recommend one of the http backends instead.

The HTTP backends in increasing order of speed are:

  • http - LWP based

  • httplite - HTTP::Lite based, about 30% faster than http

  • httptiny - HTTP::Tiny based, about 1% faster than httplite

  • curl - WWW::Curl based, about 60% faster than httptiny!

See also: http://www.elasticsearch.org/guide/reference/modules/http.html and http://www.elasticsearch.org/guide/reference/modules/thrift.html

SUBCLASSING TRANSPORT

If you want to add a new transport backend, then these are the methods that you should subclass:

init()

    $t->init($params)

By default, a no-op. Receives a HASH ref with the parameters passed in to new(), less servers, transport and timeout.

Any parameters specific to your module should be deleted from $params

send_request()

    $json = $t->send_request($server,$params)

    where $params = {
        method  => 'GET',
        cmd     => '/_cluster',
        qs      => { pretty => 1 },
        data    => '{ "foo": "bar"}',
    }

This must be overridden in the subclass - it is the method called to actually talk to the server.

See ElasticSearch::Transport::HTTP for an example implementation.

protocol()

    $t->protocol

This must return the protocol in use, eg "http" or "thrift". It is used to extract the list of bound addresses from ElasticSearch, eg http_address or thrift_address.

client()

    $client = $t->client($server)

Returns the client object used in "send_request()". The server param will look like "192.168.5.1:9200". It should store its clients in a PID specific slot in $t->{_client} as clear_clients() deletes this key.

See "client()" in ElasticSearch::Transport::HTTP and "client()" in ElasticSearch::Transport::Thrift for an example implementation.

Registering your Transport backend

You can register your Transport backend as follows:

    BEGIN {
        ElasticSearch::Transport->register('mytransport',__PACKAGE__);
    }

SEE ALSO

LICENSE AND COPYRIGHT

Copyright 2010 - 2011 Clinton Gormley.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.