Clinton Gormley > Elastic-Model > Elastic::Model::View

Download:
Elastic-Model-0.28.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.28   Source   Latest Release: Elastic-Model-0.29_2-TRIAL

NAME ^

Elastic::Model::View - Views to query your docs in Elasticsearch

VERSION ^

version 0.28

SYNOPSIS ^

    $view    = $model->view();         # all domains and types known to the model
    $view    = $domain->view();        # just $domain->name, and its types
    $posts   = $view->type( 'post' );  # just type post

10 most relevant posts containing 'perl' or 'moose'

    $results = $posts->queryb( content => 'perl moose' )->search;

10 most relevant posts containing 'perl' or 'moose' published since 1 Jan 2012, sorted by timestamp, with highlighted snippets from the content field:

    $results = $posts
                ->queryb    ( 'content' => 'perl moose'            )
                ->filterb   ( 'created' => { gte => '2012-01-01' } )
                ->sort      ( 'timestamp'                          )
                ->highlight ( 'content'                            )
                ->search;

The same as the above, but in one step:

    $results = $domain->view(
        type             => 'post',
        sort             => 'timestamp',
        queryb           => { content => 'perl moose' },
        filterb          => { created => { gte => '2012-01-01' } },
        highlight        => 'content',
    )->search;

Efficiently retrieve all posts, unsorted:

    $results = $posts->size(100)->scan;

    while (my $result = $results->shift_result) {
        do_something_with($result);
    );

Cached results:

    $cache   = CHI->new(....);
    $view    = $view->cache( $cache )->cache_opts( expires_in => '2 min');

    $results = $view->queryb( 'perl' )->cached_search();
    $results = $view->queryb( 'perl' )->cached_search( expires => '30 sec');

DESCRIPTION ^

Elastic::Model::View is used to query your docs in Elasticsearch.

Views are "chainable". In other words, you get a clone of the current view every time you set an attribute. For instance, you could do:

    $all_types      = $domain->view;
    $users          = $all_types->type('user');
    $posts          = $all_types->('post');
    $recent_posts   = $posts->filterb({ published => { gt => '2012-05-01' }});

Alternatively, you can set all or some of the attributes when you create a view:

    $recent_posts   = $domain->view(
        type    => 'post',
        filterb => { published => { gt => '2012-05-01 '}}
    );

Views are also reusable. They only hit the database when you call one of the methods, eg:

    $results        = $recent_posts->search;    # retrieve $size results
    $scroll         = $recent_posts->scroll;    # keep pulling results

METHODS ^

Calling one of the methods listed below executes your query and returns the results. Your view is unchanged and can be reused later.

See Elastic::Manual::Searching for a discussion about when and how to use "search()", "scroll()" or "scan()".

search()

    $results = $view->search();

Executes a search and returns an Elastic::Model::Results object with at most "size" results.

This is useful for returning finite results, ie where you know how many results you want. For instance: "give me the 10 best results".

cached_search()

NOTE: Think carefully before you cache data outside of Elasticsearch. Elasticsearch already has smart filter caches, which are updated as your data changes. Most of the time, you will be better off using those directly, instead of an external cache.

    $results = $view->cache( $cache )->cached_search( %opts );

If a "cache" attribute has been specified for the current view, then "cached_search()" tries to retrieve the search results from the "cache". If it fails, then a "search()" is executed, and the results are stored in the "cache". An Elastic::Model::Results::Cached object is returned.

Any %opts that are passed in override any default "cache_opts", and are passed to CHI's get() or set() methods.

    $view    = $view->cache_opts( expires_in => '30 sec' );

    $results = $view->cached_search;                            # 30 seconds
    $results = $view->cached_search( expires_in => '2 min' );   #  2 minutes

Given the near-real-time nature of Elasticsearch, you sometimes want to invalidate a cached result in the near future. For instance, if you have cached a list of comments on a blog post, but then you add a new comment, you want to invalidate the cached comments list. However, the new comment will only become visible to search sometime within the next second, so invalidating the cache immediately may or may not be useful.

Use the special argument force_set to bypass the cache get() and to force the cached version to be updated, along with a new expiry time:

    $results = $view->cached_search( force_set => 1, expires_in => '2 sec');

scroll()

    $scroll_timeout = '1m';
    $scrolled_results = $view->scroll( $scroll_timeout );

Executes a search and returns an Elastic::Model::Results::Scrolled object which will pull "size" results from Elasticsearch as required until either (1) no more results are available or (2) more than $scroll_timeout (default 1 minute) elapses between requests to Elasticsearch.

Scrolling allows you to return an unbound result set. Useful if you're not sure whether to expect 2 results or 2000.

scan()

    $timeout = '1m';
    $scrolled_results = $view->scan($timeout);

"scan()" is a special type of "scroll()" request, intended for efficient handling of large numbers of unsorted docs (eg when you want to reindex all of your data).

first()

    $result = $view->first();
    $object = $view->first->object;

Executes the search and returns just the first result. All other metadata is thrown away.

total()

    $total = $view->total();

Executes the search and returns the total number of matching docs. All other metadta is thrown away.

delete()

    $results = $view->delete();

Deletes all docs matching the query and returns a hashref indicating success. Any docs that are stored in a live scope or are cached somewhere are not removed. Any unique keys are not removed.

This should really only be used once you are sure that the matching docs are out of circulation. Also, it is more efficient to just delete a whole index (if possible), rather than deleting large numbers of docs.

Note: The only attributes relevant to "delete()" are "domain", "type", "query", "routing", "consistency" and "replication".

CORE ATTRIBUTES ^

domain

    $new_view = $view->domain('my_index');
    $new_view = $view->domain('index_one','alias_two');

    \@domains = $view->domain;

Specify one or more domains (indices or aliases) to query. By default, a view created from a domain will query just that domain's name. A view created from the model will query all the main domains (ie the "name" in Elastic::Model::Namespace) and fixed domains known to the model.

type

    $new_view = $view->type('user');
    $new_view = $view->type('user','post');

    \@types   = $view->type;

By default, a view will query all types known to all the domains specified in the view. You can specify one or more types.

query

queryb

    # native query DSL
    $new_view = $view->query( text => { title => 'interesting words' } );

    # SearchBuilder DSL
    $new_view = $view->queryb( title => 'interesting words' );

    \%query   = $view->query

Specify the query to run in the native Elasticsearch query DSL or use queryb() to specify your query with the more Perlish Elastic::Model::SearchBuilder query syntax.

By default, the query will match all docs.

filter

filterb

    # native query DSL
    $new_view = $view->filter( term => { tag => 'perl' } );

    # SearchBuilder DSL
    $new_view = $view->filterb( tag => 'perl' );

    \%filter  = $view->filter;

You can specify a filter to apply to the query results using either the native Elasticsearch query DSL or, use filterb() to specify your filter with the more Perlish Elastic::Model::SearchBuilder DSL. If a filter is specified, it will be combined with the "query" as a filtered query, or (if no query is specified) as a constant score query.

post_filter

post_filterb

    # native query DSL
    $new_view = $view->post_filter( term => { tag => 'perl' } );

    # SearchBuilder DSL
    $new_view = $view->post_filterb( tag => 'perl' );

    \%filter  = $view->post_filter;

Post-filters filter the results AFTER any "facets" have been calculated. In the above example, the facets would be calculated on all values of tag, but the results would then be limited to just those docs where tag == perl.

You can specify a post_filter using either the native Elasticsearch query DSL or, use post_filterb() to specify it with the more Perlish Elastic::Model::SearchBuilder DSL.

sort

    $new_view = $view->sort( '_score'                ); # _score desc
    $new_view = $view->sort( 'timestamp'             ); # timestamp asc
    $new_view = $view->sort( { timestamp => 'asc' }  ); # timestamp asc
    $new_view = $view->sort( { timestamp => 'desc' } ); # timestamp desc

    $new_view = $view->sort(
        '_score',                                       # _score desc
        { timestamp => 'desc' }                         # then timestamp desc
    );

    \@sort    = $view->sort

By default, results are sorted by "relevance" (_score => 'desc'). You can specify multiple sort arguments, which are applied in order, and can include scripts or geo-distance. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/search-request-sort.html for more information.

Note: Sorting cannot be combined with "scan()".

from

    $new_view = $view->from( 10 );

    $from     = $view->from;

By default, results are returned from the first result. Think of it as "the number of docs to skip", so setting from to 0 would start from the first result. Setting from to 10 would skip the first 10 results and return docs from result number 11 onwards.

size

    $new_view = $view->size( 100 );

    $size     = $view->size;

The number of results returned in a single "search()", which defaults to 10.

Note: See "scan()" for a slightly different application of the "size" value.

facets

    $new_view = $view->facets(
        facet_one => {
            terms   => {
                field => 'field.to.facet',
                size  => 10
            },
            facet_filterb => { status => 'active' },
        },
        facet_two => {....}
    );

    $new_view = $view->add_facet( facet_three => {...} )
    $new_view = $view->remove_facet('facet_three');

    \%facets  = $view->facets;
    \%facet   = $view->get_facet('facet_one');

Facets allow you to aggregate data from a query, for instance: most popular terms, number of blog posts per day, average price etc. Facets are calculated from the query generated from "query" and "filter". If you want to filter your query results down further after calculating your facets, you can use "post_filter".

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/search-facets.html for an explanation of what facets are available.

highlight

    $new_view = $view->highlight(
        'field_1',
        'field_2' => \%field_2_settings,
        'field_3'
    );

Specify which fields should be used for highlighted snippets. to your search results. You can pass just a list of fields, or fields with their field-specific settings. These values are used to set the fields parameter in "highlighting".

highlighting

    $new_view = $view->highlighting(
        pre_tags    =>  [ '<em>',  '<b>'  ],
        post_tags   =>  [ '</em>', '</b>' ],
        encoder     => 'html'
        ...
    );

The "highlighting" attribute is used to pass any highlighting parameters which should be applied to all of the fields set in "highlight" (although you can override these settings for individual fields by passing field settings to "highlight").

See http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/search-request-highlighting.html. for more about how highlighting works, and "highlight" in Elastic::Model::Result for how to retrieve the highlighted snippets.

OTHER ATTRIBUTES ^

fields

    $new_view = $view->fields('title','content');

By default, searches will return the _source field which contains the whole document, allowing Elastic::Model to inflate the original object without having to retrieve the document separately. If you would like to just retrieve a subset of fields, you can specify them in "fields". See http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/search-request-fields.html.

Note: If you do specify any fields, and you DON'T include '_source' then the _source field won't be returned, and you won't be able to retrieve the original object without requesting it from Elasticsearch in a separate (but automatic) step.

script_fields

    $new_view = $view->script_fields(
        distance => {
            script  => q{doc['location'].distance(lat,lon)},
            params  => { lat => $lat, lon => $lon }
        },
        $name    => \%defn,
        ...
    );

    $new_view = $view->add_script_field( $name => \%defn );
    $new_view = $view->remove_script_field($name);

    \%fields  = $view->script_fields;
    \%defn    = $view->get_script_field($name);

Script fields can be generated using the mvel scripting language. (You can also use Javascript, Python and Java.)

include_paths / exclude_paths

    $new_view    = $view->include_paths('foo.*')
                        ->exclude_paths('foo.bar.*','baz.*');

    $results     = $new_view->search->as_partials;
    $partial_obj = $results->next;

If your objects are large, but you only need access to a few attributes to eg display search results, you may want to retrieve only the relevant parts of each object. You can specify which parts of the object to include or exclude using include_paths and exclude_paths. If either of these is set then the full _source field will not be loaded (unless you specify it explicitly using "fields").

The partial objects returned when "as_partials()" in Elastic::Model::Results is in effect function exactly as real objects, except that they cannot be saved.

See Partial fields on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html#partial.

routing

    $new_view = $view->routing( 'routing_val' );
    $new_view = $view->routing( 'routing_1', 'routing_2' );

Search queries are usually directed at all shards. If you are using routing (eg to store related docs on the same shard) then you can limit the search to just the relevant shard(s). Note: if you are searching on aliases that have routing configured, then specifying a "routing" manually will override those values.

See Elastic::Manual::Scaling for more.

index_boosts

    $new_view = $view->index_boosts(
        index_1 => 4,
        index_2 => 2
    );

    $new_view = $view->add_index_boost( $index => $boost );
    $new_view = $view->remove_index_boost( $index );

    \%boosts  = $view->index_boosts;
    $boost    = $view->get_index_boost( $index );

Make results from one index more relevant than those from another index.

min_score

    $new_view  = $view->min_score( 2 );
    $min_score = $view->min_score;

Exclude results whose score (relevance) is less than the specified number.

preference

    $new_view = $view->preference( '_local' );

Control which node should return search results. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html for more.

timeout

    $new_view = $view->timeout( 10 );         # 10 ms
    $new_view = $view->timeout( '10s' );      # 10 sec

    $timeout  = $view->timeout;

Sets an upper limit on the the time to wait for search results, returning with whatever results it has managed to receive up until that point.

track_scores

    $new_view = $view->track_scores( 1 );
    $track    = $view->track_scores;

By default, If you sort on a field other than _score, Elasticsearch does not return the calculated relevance score for each doc. If "track_scores" is true, these scores will be returned regardless.

CACHING ATTRIBUTES ^

Bounded searches (those returned by calling "search()") can be stored in a CHI-compatible cache.

cache

    $cache    = CHI->new(...);
    $new_view = $view->cache( $cache );

Stores an instance of a CHI-compatible cache, to be used with "cached_search()".

cache_opts

    $new_view = $view->cache_opts( expires_in => '20 sec', ...);

Stores the default options that should be passed to CHI's get() or set(). These can be overridden by passing options to "cached_search()".

DEBUGGING ATTRIBUTES ^

explain

    $new_view = $view->explain( 1 );
    $explain  = $view->explain;

Set "explain" to true to return debugging information explaining how each document's score was calculated. See "explain" in Elastic::Model::Result to view the output.

stats

    $new_view = $view->stats( 'group_1', 'group_2' );
    \@groups  = $view->stats;

The statistics for each search can be aggregated by group. These stats can later be retrieved using "index_stats()" in Search::Elasticsearch::Compat.

search_builder

    $new_view = $view->search_builder( $search_builder );
    $builder  = $view->search_builder;

If you would like to use a different search builder than the default Elastic::Model::SearchBuilder for "queryb", "filterb" or "post_filterb", then you can set a value for "search_builder".

DELETE ATTRIBUTES ^

These parameters are only used with "delete()".

consistency

    $new_view    = $view->consistency( 'quorum' | 'all' | 'one' );
    $consistency = $view->consistency;

At least one, all or a quorum (default) of nodes must be present for the delete to take place.

replication

    $new_view    = $view->replication( 'sync' | 'async' );
    $replication = $view->replication;

Should a delete be done synchronously (ie waits until all nodes within the replcation group have run the delete) or asynchronously (returns immediately, and performs the delete in the background).

AUTHOR ^

Clinton Gormley <drtech@cpan.org>

COPYRIGHT AND LICENSE ^

This software is copyright (c) 2014 by Clinton Gormley.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

syntax highlighting: