ElasticSearchX::Sequence - Fast integer ID sequences with ElasticSearch
version 0.01
use ElasticSearch(); use ElasticSearchX::Sequence(); my $es = ElasticSearch->new(); my $seq = ElasticSearchX::Sequence->new( es => $es ); $seq->bootstrap(); my $it = $seq->sequence('mail_id'); my $mail_id = $it->next;
ElasticSearchX::Sequence gives you a sequence of auto-incrementing integers (eg to use as IDs) that are guaranteed to be unique across your application.
It is similar in spirit to DBIx::Sequence, but uses ElasticSearch as a backend.
ElasticSearch already has built in unique IDs, but they look like this: KpSb_Jd_R56dH5Qx6TtxVA.
KpSb_Jd_R56dH5Qx6TtxVA
If you are migrating from an RDBM where you are using (eg) an auto-increment column to give you unique IDs, your application may depend on these IDs being integers. Or you may just prefer integer IDs.
Either way, this module makes it easy to get these unique auto-incrementing IDs without needing an RDBM to provide them.
And it is fast! Given the performance, if you are already using ElasticSearch, you may want to move your ticket servers from your database to ElasticSearch instead.
This module is blazing fast, especially when ElasticSearch uses the ElasticSearch::Transport::Curl backend.
You can try out the benchmark yourself, in the benchmark folder in this distribution.
benchmark
The script compares:
MySQL, using the ticket method described by Flickr
this module, using the httptiny backend
this module, using the curl backend
this module, using the curl backend but only requesting blocks of 10 IDs at a time
The results I get when running this on my laptop are:
Rate es_curl_10 db_ticket es_tiny es_curl es_curl_10 38760/s -- -48% -55% -72% db_ticket 74627/s 93% -- -13% -47% es_tiny 85470/s 121% 15% -- -39% es_curl 140845/s 263% 89% 65% --
Plus, with ElasticSearch, you get distributed and high-availability thrown in for free.
my $seq = ElasticSearchX::Sequence->new( es => $es, # ElasticSearch instance, required index => 'index', # defaults to 'sequence', type => 'type', # defaults to 'sequence', );
new() returns a new instance of ElasticSearchX::Sequence. By default, your sequences will be stored in index sequence, type sequence, but you can change those values to whatever suits your application.
new()
sequence
By default, the index is optimised for serving sequences, and has different settings than those you would typically use in your main index, so rather than storing your sequences in the main index for your application(s), you may prefer to store all of your sequences in the single index sequence.
The type (default sequence) could be used to separate sequences for different applications. For instance, you could store the sequences for your personal blog in type personal and for your work blog in type work.
personal
work
See "bootstrap()" for how to initiate your index/type.
my $it = $seq->sequence('mail_id'); my $it = $seq->sequence( name => 'mail_id', size => 100 );
The sequence() method returns a new sequence iterator identified by the name.
sequence()
name
New IDs/values are generated in blocks of size (default 100), as this is much faster than requesting them individually.
size
This does mean that, if you have several instances of the iterator mail_id, then the next ID won't always be the highest number available. For instance:
mail_id
$i_1 = $seq->('mail_id'); $i_2 = $seq->('mail_id'); say $i_1->next; say $i_2->next; say $i_1->next; # 1 # 101 # 2
See also ElasticSearchX::Sequence::Iterator.
$seq->bootstrap( %settings );
This method will create the index, if it doesn't already exist, and will setup the type. This can be called even if the index and type have already been setup. It won't fail unless the type already exists and has a different mapping / definition.
By default, the index is setup with the following %settings:
%settings
( number_of_shards => 1, auto_expand_replicas => '0-all', )
In other words, it will have only a single primary shard (instead of the ElasticSearch default of 5), and a replica of that shard on every ElasticSearch node in your cluster.
If you pass in any %settings then the defaults will not be used at all.
See Index Settings for more.
$seq->delete_index()
Deletes the index associated with the sequence. You will lose your data!
$seq->delete_type()
Deletes the type associated with the sequence. You will lose your data!
$index = $seq->index
Read-only getter for the index value
$type = $seq->type
Read-only getter for the type value
$es = $seq->es
Read-only getter for the ElasticSearch instance.
ElasticSearch, http://www.elasticsearch.org
If you have any suggestions for improvements, or find any bugs, please report them to https://github.com/clintongormley/ElasticSearchX-Sequence/issues. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
Clinton Gormley <drtech@cpan.org>
This software is copyright (c) 2011 by Clinton Gormley.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install ElasticSearchX::Sequence, copy and paste the appropriate command in to your terminal.
cpanm
cpanm ElasticSearchX::Sequence
CPAN shell
perl -MCPAN -e shell install ElasticSearchX::Sequence
For more information on module installation, please visit the detailed CPAN module installation guide.