The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Build Status Coverage Status

NAME

Elastijk - A specialized ElasticSearch client.

SYNOPSIS

use Elastijk;

my ($status, $response) = Elastijk::request({
    host => "localhost",
    port => "9200",
    method => "GET",

    index => "blog",
    type => "article",
    command => "_search",

    uri_param => { search_type => "dfs_query_then_fetch" }
    body => {
        query => { match => { "body" => "cpan" } }
    }
});

if ($status eq "200") {
    for my $hit (@{ $response->{hits}{hits} }) {
        say $hit->{url};
    }
}

DESCRIPTION

Elastijk isn a ElasticSearch client library. It uses Hijk, a HTTP client that implements a tiny subset of HTTP/1.1 just enough to talk to ElasticSearch via HTTP.

Elastijk provided low-level functions that are almost identical as using HTTP client, and an object-oriented sugar layer to make it a little bit easier to use. The following documentation describe the low-level function first.

FUNCTIONS

Elastijk::request( $args :HashRef ) : ($status :Int, $response :HashRef)

Making a request to the ElasticSearch server specified in $args. It returns 2 values. $status is the HTTP status code of the response, and the $response decoded as HashRef. ElasticSearch API always respond a single HashRef as JSON text, this might or might not be changed in the future, if it is changed then this function will be adjusted accordingly.

The $args is a HashRef takes contains the following key-value pairs:

host  => Str
port  => Str
index => Str
type  => Str
id    => Str
command => Str
uri_param => HashRef
body  => HashRef | ArrayRef | Str
method => "GET" | "POST" | "HEAD" | "PUT" | "DELETE"

The 4 values of index, type, id, command are used to form the URI path following ElasticSearch's routing convention:

/${index}/${type}/${id}/${command}

All these path parts are optional, when that is the case, Elstaijk properly remove / in between to form the URL that makes sense, for example:

/${index}/${type}/${id}
/${index}/${command}

The value of uri_param is used to form the query_string part in the URI, some common ones for ElasticSearch are q, search_type, and timeout. But the accepted list is different for different commands.

The value of method corresponds to HTTP verbs, and is hard-coded to match ElasticSearch API. Users generally do not need to provide this value, unless you are calling request directly, in which case, the default value is 'GET'.

For all cases, Elastijk simply bypass the value it recieve to the server without doing any parameter validation. If that generates some errors, it'll be on server side.

Elastijk::request_raw( $args :HashRef ) : ($status :Int, $response :Str)

Making a request to the ElasticSearch server specified in $args. The main difference between this function and Elastijk::request is that $args-{body}> s expected to be a String scalar, rather then a HashRef. And the $response is not decoded from JSON. This function can be used if users wish to use their own JSON parser to parse response, or if they wish to delay the parsing to be done latter in some bulk-processing pipeline.

OBJECT

PROPERTIES

An Elastijk object is constructed like this:

my $es = Elastijk->new(
    host => "es1.example.com",
    port => "9200"
);

Under the hood, it is only a blessed hash, while all key-value pairs in the hash are the properties. Users could break the pacaking and modify those values, but it is fine. All key-value pairs are shallow-copied from `new` method.

Here's a full list of key-value pairs that are consumed:

host  => Str "localhost"
port  => Str "9200"
index => Str (optional)
type  => Str (optional)

The values for index and type act like a "default" value and they are only used in methods that could use them. Which is handy to save some extra typing. Given objects constructed with different default of index attribute:

$es0 = Elastijk->new();
$es1 = Elastijk->new( index => "foo" );

... calling the same search method with the same arugments will generate different request:

my @args = (uri_param => { q => "nihao" });
$es0->search( @args  ); # GET /_search?q=nihao
$es1->search( @args  ); # GET /foo/_search?q=nihao

This behaviour is consistent for all methods.

METHODS

All methods takes the same key-value pair HashRef as Elastijk::request function, and returns 2 values that are HTTP status code, and the body hashref.

Many of of methods are named after an server command. For eaxmple, the command _search corresponds to method search, the command _bulk corresponds to method bulk.

Elastijk does as little data transformation as possible to keep it a stupid, thin client.

All methods return 2 values that are HTTP status code, and the body hashref:

my ($status, $res) = $es->search(...)
if (substr($status,0,1) eq '2') { # 2xx = successful
    ...
}

The status code is used for error-checking purposes. ElastiSearch should respond with status 4XX when the relevant thing is missing, and 5XX when there are some sort of errors. To check if a request is successful, test if it is 200 or 201.

Due to the fact the value of a lists is the last value of element, it is a little bit shorter if status check could be ignored:

my $res = $es->search(...);
for (@{ $res->{hits}hits} }) {
    ...
}

count and exists method modified $res to be a scalar (instead of HashRef) to allow these intuitive use cases:

if ($es->exists(...)) { ... }
if ($es->count(...) > 10) { ... }

... the original response body are discarded.

request( ... )

This is a low-level method that just bypass things, but it is useful when, say, newer ElasticSearch versoin introduce a new command, and there are no corresponding method in the Client yet. The only difference between using this method and calling Elasijk::request directly, is that the values of host,port,index, and ind the object context are consumed.

head(...), get(...), put(...), post(...), delete(...)

Shorthands for the HTTP verbs. All these are just direct delegate to request method.

search( body => {...}, uri_param => {...} )

This method invokes the search api.

The arguments are key-value pairs from the API documents.

count( body => {...}, uri_param => {...} )

This method corresponds to the search count api

exists( index => Str, type => Str, id => Str )

Check if the given thing exists. Which can be a document, a type, and an index. Due to the nature of their dependency, here's the combination you would need to check the existence of different things:

document: index => "foo", type => "bar", id => "beer"
type:     index => "foo", type => "bar"
index:    index => "foo"

See also https://www.elastic.co/guide/guide/en/elasticsearch/reference/current/indices-exists.html , https://www.elastic.co/guide/guide/en/elasticsearch/reference/current/indices-types-exists.html#indices-types-exists , and https://www.elastic.co/guide/guide/en/elasticsearch/guide/current/doc-exists.html

scan_scroll( ..., on_response => sub {} )

A way to perform scan and scroll.

The boilerplate to use it is something like:

$es->scan_scroll(
    index => "tweet",
    body => { query => { match_all => {} }, size => 1000 },
    on_response => sub {
        my ($status,$res) = @_;
        for my $hit (@{ $res->{hits}{hits} }) {
            ...
        }
    }
);

The "search_type" is forced to be "scan" in this method.

The very last value to the on_response key is a callback subroutine that is called after each HTTP request. The arguments are HTTP status code and response body hash just like other methods.

AUTHORS

COPYRIGHT

Copyright (c) 2013,2014 Kang-min Liu <gugod@gugod.org>.

LICENCE

The MIT License

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.