ElasticSearchX::Model::Tutorial - Tutorial for ElasticSearchX::Model
version 0.1.7
In this tutorial we are going to walk through the ElasticSearch example on http://www.elasticsearch.org/. Go ahead and read it first, this gives you a good insight into how ElasticSearch works and how ElasticSearchX::Model can help to make it even more elastic.
ElasticSearch is a document-based storage system. Even though it states that it is schema free, it is not recommended to use ElasticSearch without defining a proper schema or mapping, as ElasticSearch calls it.
ElasticSearchX::Model::Document takes care of that. The ElasticSearch example consists of two types: tweet and user. The tweet type contains the properties user, post_date and message. The user type contains only the name property. Using ElasticSearchX::Model::Document this looks like:
tweet
user
post_date
message
name
package MyModel::Tweet; use Moose; use ElasticSearchX::Model::Document; has id => ( is => 'ro', id => [qw(user post_date)] ); has user => ( is => 'ro', isa => 'Str' ); has post_date => ( is => 'ro', isa => 'DateTime', required => 1, default => sub { DateTime->now } ); has message => ( is => 'rw', isa => 'Str', index => 'analyzed' ); package MyModel::User; use Moose; use ElasticSearchX::Model::Document; has nickname => ( is => 'ro', isa => 'Str', id => 1 ); has name => ( is => 'ro', isa => 'Str' );
You might be wondering why there is an additional id attribute and a nickname. The id attribute in the Tweet class is build dynamically by concatenating the values of user and post_date. this value is digested using SHA1 and used as id for the document. If you want to change the message of the tweet, you don't have to delete the old record and add a new one but simply change the message and reindex the document. Since the id will stay the same, the new record will overwrite the old one. Also, you don't have to keep track of incrementing numerical document ids.
id
nickname
In the User class, the nickname attribute acts as id. Since it does not depend on the value of any other attribute, the id matches the nickname.
User
ElasticSearch will assign a random id to the document if there is no id attribute.
Each document belongs to a type. Think of it as a table in a relational database. And each type belongs to an index, which corresponds to a database.
Modeling indices and types with ElasticSearchX::Model is pretty easy and the types have actually already been built: the meta objects of the document classes describe the types. They include all the necessary information to build a type mapping. You can even use MooseX::Types::Structured to build deepy nested structures that will be translated to object properties in ElasticSearch. DateTime attributes become a Date type and so on.
object
Date
Indices are defined in a model class:
package MyModel; use Moose; use ElasticSearchX::Model; index twitter => ( namespace => 'MyModel' );
This is all you need to define the index and its types. The namespace option of the index twitter will load all classes in the MyModel namespace and add them to the twitter index. Actually, you don't even have to define the namespace in this case, since the namespace defaults to the name of the model class. You can also load types explicitly by defining a types option:
twitter
MyModel
types
index twitter => ( types => [qw(MyModel::Tweet MyModel::User)] );
Make sure that the classes are loaded. See ElasticSearchX::Model::Index for all the available options.
To deploy the indices and mappings to ElasticSearch, simply call
my $model = MyModel->new; $model->deploy;
This will try to connect to an ElasticSearch instance on 127.0.0.1:9200. See "es" in ElasticSearchX::Model for more information.
Indexing describes the process of adding documents to types.
use DateTime; my $twitter = $model->index('twitter'); my $timestamp = DateTime->now; my $tweet = $twitter->type('tweet')->put({ user => 'mo', post_date => $timestamp, message => 'Elastic baby!', }, { refresh => 1 }); $twitter->type('tweet')->count; # 1
The first parameter contains the property/value pairs. The post_date property is special because it is a DateTime object. Objects are being deflated prior to insertion. This is handled by MooseX::Attribute::Deflator and is configured in ElasticSearchX::Model::Document::Types. You can easily add deflators for other objects.
Since the post_date property is required and has a default, you don't even have to it to put. ElasticSearchX::Model will automatically build values from required attributes. If there is no builder or default, it will throw an exception.
default
put
The second parameter to "put" in ElasticSearchX::Model::Document::Set tells ElasticSearch to refresh the index immediately. Otherwise it can take up to one second for the server to refresh and the subsequent call to "count" in ElasticSearchX::Model::Document::Set will return 0.
0
If you index large numbers of documents, it is advised to call "refresh" in ElasticSearchX::Model::Index once you are finished and not on every put.
Documents can be retrieved either with their id or by providing the properties that define the id:
my $tweet_copy = $twitter->type('tweet')->get($tweet->id); # or my $tweet_copy = $twitter->type('tweet')->get({ user => 'mo', post_date => $timestamp, });
Objects that have been deflated (i.e. post_date) will be inflated again. Thus, $tweet_copy->post_date is a DateTime object again.
$tweet_copy->post_date
If you don't really care about objects or need extra speed, you can set "inflate" in ElasticSearchX::Model::Documents::Set to 0. This will return the raw response from ElasticSearch.
$twitter->type('tweet')->raw->get($tweet->id);
ElasticSearch is You know, for Search. ElasticSearchX::Model::Set tries to help you with its very verbose query syntax.
my @tweets = $twitter->type('tweet')->filter({ term => { user => 'mo' } })->query({ field => { 'message.analyzed' => 'baby' } })->size(100)->all;
If you need to retrieve large amounts of data, you probably want to scroll through the results, which is much faster and safer than scrolling manually using "from" in ElasticSearchX::Model::Set.
my $iterator = $twitter->type('tweet')->scroll; while(my $tweet = $iterator->next) { # do something with $tweet }
For extra speed use $twitter->type('tweet')->raw->scroll which will skip the object inflation and give you the raw HashRef.
$twitter->type('tweet')->raw->scroll
ElasticSearch allows you to create aliases for each index. This makes it easy to reindex to a new index, and change the alias once the reindexing is done, to the new index. This is how you do it with ElasticSearchX::Model.
package MyModel; use Moose; use ElasticSearchX::Model; index twitter => ( namespace => 'MyModel', alias_for => 'twitter_v1' );
This will create an index called twitter_v1 in ElasticSearch and an alias twitter. To reindex data, you simply add a second index with a different name but the same document classes:
twitter_v1
index twitter_v2 => ( namespace => 'MyModel' );
Now deploy the new index and start reindexing your data to the new index:
$model->deploy; my $old = $model->index('twitter'); my $new = $model->index('twitter_v2'); my $iterator = $old->type('tweet')->size(1000)->scroll; while(my $tweet = $iterator->next) { $tweet->message('something else'); $tweet->index($new); $tweet->put; }
Afterwards, you simply remove the twitter_v2 index and set the alias_for attribute on index twitter to twitter_v2. You have to call $model->deploy again, which will automatically update the aliases.
twitter_v2
alias_for
$model->deploy
Moritz Onken
This software is Copyright (c) 2013 by Moritz Onken.
This is free software, licensed under:
The (three-clause) BSD License
To install ElasticSearchX::Model, copy and paste the appropriate command in to your terminal.
cpanm
cpanm ElasticSearchX::Model
CPAN shell
perl -MCPAN -e shell install ElasticSearchX::Model
For more information on module installation, please visit the detailed CPAN module installation guide.