Andrew Johnson > Helios-Service-SolrIndexer-0.01_01 > Helios::Service::SolrIndexer

Download:
Helios-Service-SolrIndexer-0.01_01.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
Report a bug
Module Version: 0.01_01   Source  

NAME ^

Helios::Service::SolrIndexer - a demostration indexing application for the Helios job processing framework

DESCRIPTION ^

Helios::Service::SolrIndexer (SolrIndexer for short) is a simple example application to demonstrate the typical Helios application design pattern in the context of a search engine index update (in this case, Apache Solr). =head1 HELIOS CONFIG PARAMETERS

SolrIndexer does require several config parameters to be defined in your Helios collective for it to function correctly. These can be placed in either helios.ini or the Helios Ctrl Panel (the Ctrl Panel method is recommended):

index_endpoint

The URI endpoint of the Solr index (eg http://localhost:8983/solr)

source_dsn

The DBI datasource name of the database table to be indexed

source_user

Username to use to connect to the source database.

source_password

Password to use to connect the source database.

source_tb

Name of the table to be indexed in the source database.

source_fields

A comma-delimited string specifying which of the source table's fields should be selected and given to Solr to index. Remember, these must be set up in the Solr index's schema beforehand, or Solr will just return an error when an update is attempted.

source_id_field

The field name of the primary key in the source field in the database. The values of this field will passed in via the job arguments and a SQL WHERE clause built around it to uniquely identify the record in the database table. The contents of this field will also become the id of the document in the Solr index.

JOB ARGUMENTS ^

Job arguments for this service should be specified in the form:

 <params>
   <id>1234</id>
 </params>

where the <id> section contains the primary key of the source table to be indexed in the database.

METHODS ^

run($helios_job)

As is typical for Helios services, run() is the main workhorse of SolrIndexer. It will be called by Helios workers to service a job. The $helios_job passed to it will be a Helios::Job object.

Once run() has pulled in its configuration hashref and parsed the Helios::Job object's argument XML, run() performs 4 tasks to accomplish a job:

  1. Generates the SQL to retrieve the records from the database
  2. Executes the SQL with the id given to it in the job arguments
  3. Reformats the retrieved database record into a UTF-8 encoded XML stream for Solr (Solr requires UTF-8 encoding)
  4. Sends the XML stream to Solr to be added to the index

If all these steps are successful, run() calls Helios::Service->completedJob() to mark the job as completed successfully. If an error occurs, it calls Helios::Service->logMsg() to log the error message and Helios::Service->failedJob() to mark the job as failed.

generateSQL()

Generates the SQL necessary to retrieve the database record. This method determines the correct SQL by looking at the configuration parameters defined in Helios.

retrieveFromDb($sql, $id)

Given a SQL SELECT statement and a unique id, retrieveFromDb() retrieves the record identified by the $id using the supplied $sql. It returns the record in the form of a hashref.

generateXML($hashref)

Given a hashref, generateXML() takes the hashref's keys and values and turns them into an XML stream be passed to Solr. It returns this string of XML to the calling routine.

updateIndex($xml)

Given a Solr XML document addition stream, updateIndex() builds an HTTP::Request object using the stream and the Solr endpoint URI. It then uses LWP::UserAgent to POST the request to Solr. If the document update is successful, updateIndex() returns the successful status (usually '200 OK' to the calling routine. If the request was not successful, the method will throw a Helios::Error::Fatal exception with the erroneous status as the message.

SEE ALSO ^

Helios, LWP::UserAgent, HTTP::Request, <XML::Writer>, DBI

AUTHOR ^

Andrew Johnson, <lajandy at cpan.org>

COPYRIGHT AND LICENSE ^

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.0 or, at your option, any later version of Perl 5 you may have available.

WARRANTY ^

This software comes with no warranty of any kind.