
WE_Frontend::Indexer::Htdig - interface to the htdig search engine

use WE_Frontend::Indexer::Htdig;
my $results = WE_Frontend::Indexer::Htdig::search(-words => "word");

This is an interface to the htdig search engine. The result of the search function call is a perl hash reference containing the results.

Arguments are:
A string with the words to search. Multiple words are space-separated. This argument is required.
Specify a different htdig configuration file, otherwise the default htdig.conf is used.
(Optional) Specify a language. The configuration parameter given by conf may contain %{lang} placeholders which are substituted by the value of this argument.
Output some diagnostics to stderr.
Set to a true value if operating on a https server. htdig does not handle SSL, so a parallel http should be setup for the indexing. With the https hack the URLs in the search result list are translated at template display time.
The result is a hash reference with the following keys:
Holds an array with the search results. See below.
This variable is set to a true value if the search produces no results. Also detectable by an empty result list.
A list of URLs for the 1 .. 10 result pages.
The corresponding numbers for the pageurllist. Please note that perl/Template arrays start with index 0 (which would be page 1).
Hold the URLs for the previous resp. next result page.
Usually not needed: the number of the previous resp. next result page. In fact you would label them "Prev"/"Next" or "<"/">".
There are more keys. For a complete list refer to the htdig documentation at http://www.htdig.org, htsearch, Templates. Note that the original template variable names are converted to lowercase.
The value of list is an array reference with the matches. Each match is a hash reference with the following keys:
The URL of the page. See also the -httpshack option above.
The title of the page, as specified by the <title> html tag.
The first lines of text in the document.
The date and time the document was last modified. See also the documentation of the iso_8601 config variable in htdig.conf.
The complete list is also in the htdig documentation at http://www.htdig.org, htsearch, Templates.

It is best to just use the original conf/htdig.tpl.conf file found in the webeditor distribution. The indexing program in webeditor will use the template file and fill it with the configuration found in WEsiteinfo. Please look also into htdig.txt in the webeditor/doc directory for a first-time installation/configuration.
To override the searchindexer path (default is "rundig" without a path):
$searchengine->searchindexer("/usr/local/bin/rundig");
To set the template htdig and target htdig configuration files (these settings are highly recommended):
$searchengine->htdigconftemplate($paths->uprootdir . "/conf/htdig.tpl.conf");
$searchengine->htdigconf($paths->uprootdir . "/conf/htdig.%{lang}.conf");
where $paths is the WEsiteinfo::Paths object documented in WE_Frontend::Info. If the configuration file should not be language dependent, then use
$searchengine->htdigconf($paths->uprootdir . "/conf/htdig.conf");
instead.
If you decide to make your own htdig.conf, put at least the following lines into the configuration file:
template_map: Long long ${common_dir}/long.html \
Short short ${common_dir}/short.html \
Perl perl ${common_dir}/perl/match.pl
template_name: perl
search_results_header: ${common_dir}/perl/header.pl
search_results_footer: ${common_dir}/perl/footer.pl
nothing_found_file: ${common_dir}/perl/nomatch.pl
${common_dir}/perl should be a link to the directory .../lib/WE_Frontend/Indexer/htdig_common.

htdig is available e.g. from this location: http://www.htdig.org/files/snapshots/htdig-3.2.0b5-20040404.tar.gz.
To compile and install htdig from scratch, the following configure line could be used to create a path layout similar to the RedHat one:
sh configure --prefix=/usr --with-search-dir=/usr/share/htdig --with-image-dir=/usr/share/htdig --with-cgi-bin-dir=/usr/bin --with-config-dir=/etc --with-database-dir=/usr/share/htdig

Many. Mind the permissions. Especially, rundig may use the default database directory (/usr/local/share/htdig/database or such) as the temporary directory for sorting, which will fail if the apache user (usually nobody or www) has no permissions to write to this directory. In this case change the TMPDIR definition in rundir or set appropriate write permissions.

Slaven Rezic - slaven@rezic.de
