The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XAO::DO::Data::Index - XAO Indexer storable index object

SYNOPSIS

 my $keywords=$cgi->param('keywords');
 my $cn_index=$odb->fetch('/Indexes/customer_names');
 my $sr=$cn_index->search_by_string('name',$keywords);

DESCRIPTION

XAO::DO::Data::Index is based on XAO::FS Hash object and provides wrapper methods for most useful XAO Indexer functions.

METHODS

build_structure ()

If called without arguments creates initial structure in the object required for it to function properly. Safe to call on already existing data.

Will create a certain number data fields to be then used to store specifically ordered object IDs according to get_orderings() method of the corresponding indexer. The number is taken from site's configuration '/indexer/max_orderings' parameter and defaults to 10.

Should be called from site config's build_structure() method in a way similar to this:

 $odb->fetch('/Indexes')->get_new->build_structure;

Where '/Indexes' is a container objects with class 'Data::Index'. It does not have to be named 'Indexes'.

data_structure (;$$)

Returns data structure of Index data object, can be directly used in build_structure() method.

The first optional argument is the number of fields to hold orderings. If it is not given site configuration's '/indexer/common/max_orderings' parameter is used, which defaults to 10.

Second parameter sets the maximum size of a single keyword data chunk that lists all places where this word was found and their positions. The value is taken from '/indexer/common/max_kwdata_length' configuration parameter and defaults to 65000.

It depends highely on the type of text you index, but as a rough estimate, for every 1,000 allowed words you need 20,000 for the data. So, if your ignore_limit is set to 50,000 you might want to set max_kwdata_length to 1,000,000. If you are using MySQL you might also need to adjust max_allowed_packet accordingly -- to be slightly higher than the max_kwdata_length. Using compression these values can be reduced -- as the limit gets applied after compression.

get_collection_object ()

A shortcut to indexer's get_collection_object method. If there is no such method, emulates it with a call to get_collection, which is usually slower (for compatibility).

get_collection ()

Simply a shortcut to indexer's get_collection() method.

indexer (;$)

Returns corresponding indexer object, its name taken from 'indexer_objname' property.

search_by_string ($)

Most widely used method - parses string into keywords and performs a search on them. Honors double quotes to mark words that have to be together in a specific order.

Returns a reference to the list of collection IDs. IDs are not checked against real collection. If index is not in sync with the content of the actual data collection IDs of objects that don't exist any more can be returned as well as irrelevant results.

Example:

 my $keywords=$cgi->param('keywords');
 my $cn_index=$odb->fetch('/Indexes/customer_names');
 my $sr=$cn_index->search_by_string('name',$keywords);

Optional third argument can refer to a hash. If it is present, the hash will be filled with some internal information. Most useful of which is the list of ignored words from the query, stored as 'ignored_words' in the hash.

Example: my %sd; my $sr=$cn_index->search_by_string('name',$keywords,\%sd); if(keys %{$sd{ignored_words}}) { print "Ignored words:\n"; foreach my $word (sort keys %{$sd{ignored_words}}) { print " * $word ($sd{ignored_words}->{$word}\n"; } }

search_by_string_oid ($)

The same as search_by_string() method, but translates results from collection IDs to object IDs. Use it with care, on large result sets it may take significant time!

suggest_alternative ($$$)

Returns an alternative search string by trying words found during search_by_string and stored in the returned data array.

EXPERIMENTAL UNSTABLE API.

update ($)

Updates the index with the current data. Exactly what data it is based on depends entirely on the corresponding indexer object.

With drivers that support transactions the update is wrapped into a transaction, so that index data is consistent while being updated.

build_dictionary (%)

Updates the dictionary of words stored in this index. Actual implementation depends on the specific spellchecker, as configured for the project.

AUTHORS

Copyright (c) 2003 XAO Inc.

Andrew Maltsev <am@xao.com>.

SEE ALSO

Recommended reading: XAO::Indexer, XAO::DO::Indexer::Base, XAO::FS, XAO::Web.