# 2012-01-22: Greatly expanded tutorial. Only the pieces marked (C) CEB
# Toolbox were kept from the original version. Changed all line endings so
# lines end at column 80.
# [2014-03-07] Updated tutorial for Helios 2.80, including using new command
# utilities. Removed references to directly accessing the collective database
# tables using SQL (yuck).
=pod
=head1 NAME
Helios::Tutorial - a tutorial for getting started with Helios
=head1 DESCRIPTION
This is a short tutorial to introduce the Helios system's basic concepts and to
show some quick examples of how to get started working with Helios.
=head1 HELIOS CONCEPTS
There are a few basic concepts you need to learn in order to understand the way
Helios works. Once you understand these concepts, it will be simple for you to
create Helios applications and manage a Helios collective.
=cut
# the following section is Copyright (C) 2008-9 by CEB Toolbox, Inc.
=head2 Jobs
B<Jobs> are simply a set of parameters for services (see below) that represent
a discrete unit of work. Jobs are represented by XML-style markup and can be
submitted either programatically via the Helios API, via the command line
helios_job_submit.pl program, or via HTTP request to the submitJob.pl CGI
program.
=head2 Services
B<Services> are Perl classes that define how jobs of a certain type should be
processed. Service classes are subclasses of Helios::Service, and implement
a run() method to perform a job's operations. The run() method marks the
job as successful or failed just before it ends. Services can be configured
across the collective (see below) using Helios's built-in configuration
subsystem, which can be accessed via the Helios::Panoptes web interface or by
using the helios_config_* shell commands.
Services are loaded into memory by the helios.pl service daemon program. When
jobs are submitted to Helios for a particular service, worker processes (see
below) are launched to actually perform the work.
=head2 Workers
B<Workers> are processes launched by helios.pl service daemons to actually
perform jobs. A worker will instantiate its associated service class, do some
preparation, and call the service object's run() method. In normal operation,
a worker process performs one job and then exits, but in "OVERDRIVE" mode a
worker process will stay in memory and perform as many jobs as possible, until
1) there are no more jobs in the queue, 2) it is told to HOLD or HALT job
processing, or 3) it encounters an error processing a job that causes it to
exit.
=head2 Collective
A B<collective> is a group of servers running helios.pl daemons connected to
the same Helios database. Services in a collective can be centrally
administered using the Helios::Panoptes web interface.
In addition to these basics, there are a couple of other Helios concepts that
will not be dealt with in this tutorial but is worth knowing:
# End of section covered by CEB Toolbox, Inc. copyright.
=head2 Jobtypes
Every job in the Helios system has a B<jobtype>, which is sort of an
abstraction of a queue. For now, all you need to know is every Helios service
has a corresponding jobtype with the same name. When you submit a job to
Helios, you will set the jobtype to the name of the service you want to run
the job.
# the following section is Copyright (C) 2008-9 by CEB Toolbox, Inc.
=head2 Metajobs
B<Metajobs> are large batches of jobs submitted together to Helios.
Bound together by XML, a metajob will be burst apart into its constituent jobs
when first serviced by Helios. Metajobs can greatly decrease the time it takes
to submit large batches of jobs into the Helios job queue. Also, in
conjunction with worker OVERDRIVE mode, metajobs allow workers to achieve
maximum system throughput.
=cut
# End of section covered by CEB Toolbox, Inc. copyright.
=head1 A BASIC HELIOS SERVICE
Writing a Helios service involves writing a B<service class>, a Perl class that
subclasses Helios::Service. Your service class will need to implement the
service's run() method. The run() method will be passed a Helios::Job object
representing the job to be performed.
Here's a very simple sample class as an example:
package TestService;
use strict;
use warnings;
use base qw(Helios::Service);
sub run
{
my $self = shift;
my $job = shift;
my $config = $self->getConfig();
my $args = $self->getJobArgs($job);
foreach my $arg (keys %$args)
{
$self->logMsg($job, "param:".$arg." value:".$args->{$arg});
print '*** JOBID: '.$job->getJobid().' param: '.$arg.' value: '.$args->{$arg}." ***\n";
}
$self->completedJob($job);
}
1;
This service is extremely simple; all it does is pick up the service's
configuration and the given job's arguments, and logs the job's arguments in
the Helios log. It will also print the arguments to the terminal. Then it
calls the completedJob() method to mark the job as finished successfully.
Despite its simplicity, all Helios services ultimately follow this same basic
pattern.
Let's take a closer look at this simple example. First, let's look at the
package declaration and modules:
package TestService;
use strict;
use warnings;
use base qw(Helios::Service);
In addition to declaring the service's name with the B<package> declaration,
we've also enabled the B<strict> and B<warnings> pragmas. We declare our
service to be a subclass Helios::Service by using the B<use base> pragma.
Next, we have the run() method. This is the only required method in your
service class. It starts by pulling in config parameters and job arguments
from the Helios system:
sub run {
my $self = shift;
my $job = shift;
my $config = $self->getConfig();
my $args = $self->getJobArgs($job);
The only parameter directly passed to run() is a Helios::Job object that
represents the job the service needs to run. After stashing the service in the
$self variable and the Helios::Job object in the $job variable, the run()
method does two more things before the actual job processing starts. First, it
grabs the service's configuration using the getConfig() method, and then gets
the job's arguments using the getJobArgs() method. Both the service
configuration and job arguments are returned as hashrefs, so it will be easy
to work with them later in the run() method.
Next we have the rest of the run() method:
foreach my $arg (keys %$args)
{
$self->logMsg($job, "param:".$arg." value:".$args->{$arg});
print '*** JOBID: '.$job->getJobid().' param: '.$arg.' value: '.$args->{$arg}." ***\n";
}
$self->completedJob($job);
}
The foreach block is just looping through all the arguments in the job
argument hashref and using the logMsg() method to log them in the Helios
system log. It then also prints them to the terminal. In reality, this part
of the run() method could be anything: a mathematical computation, the
processing of a file, a call to another function or method in another Perl
module. What work you actually do in your run() method is entirely up to you!
I<Note: one thing you don't normally do in Helios services is print to the
terminal, since usually there is no terminal to print to. But we'll be running
this service later in debug mode, and it will be helpful for you to see the
job do something on the screen.>
What is important, however, is what happens when your work is done. The last
thing in this run() method (and indeed, all run() methods) is the call to
mark the job as completed successfully or failed. This run() method is very,
very simple, so in this case we are going to assume the job is successful and
mark it as such by calling the completedJob() method. The only parameter for
completedJob() is the Helios::Job object that run() was passed. If we had
decided instead that the $job had failed, we would have used the failedJob()
method:
$self->failedJob($job,"It failed!");
The failedJob() method works like completedJob() except it marks the job as
failed rather than succeeded in the system. In addition, you may also specify
an error message that will be recorded with the job so you can see I<why> the
job failed.
Once we've marked the job as completed or failed, the run() method is over.
So that, in a nutshell, is the basics of creating a Helios service class. All
Helios service classes ultimately use this design pattern. This makes creating
new Helios services easy, either by writing new code or adapting existing code.
=head1 STARTING A HELIOS SERVICE AND SUBMITTING A JOB
Having read through the last section, you may ask, "But how do I actually get
this TestService thing to run a job?" If you've got your helios.ini configured
and ready, you're almost ready to go.
Make sure the path to your helios.ini is set in the HELIOS_INI environment
variable, and that the variable is exported. At the command line:
export HELIOS_INI=/path/to/helios.ini
Also make sure it is an absolute path; relative paths will confuse the Helios
service loader/daemon. Also, for this tutorial, go ahead and enable debug mode by
setting the HELIOS_DEBUG environment variable:
export HELIOS_DEBUG=1
This will allow you to see some extra Helios debugging messages and prevent the
service daemon from daemonizing, allowing you to stop it from the command line.
First, we'll go ahead and submit the job we want to run by using the
helios_job_submit.pl program at the command line:
helios_job_submit.pl -v TestService "<job><params><myarg1>This is a test</myarg1></params></job>"
This will submit a job with a I<jobtype> of TestService, meaning it is meant to
be run by the service named "TestService". In the XML arguments for the job,
there is actually only one argument, named 'myarg1', that has the value "This
is a test". Of course, you can have a large number of arguments; the limit in
the default Helios MySQL database schema is about 16MB, though you really
should not be submitting that much data as job arguments, at least while you
are learning the system.
The -v option tells helios_job_submit.pl to return the jobid of the new job.
If you use the -v option or you enabled HELIOS_DEBUG, you should
receive a message if your Helios setup is functioning properly:
Job submit successful. JOBID: 9
(The jobid will vary depending on how many jobs you have submitted to the
system previously.) If you received an error, there is most likely a problem
with your Helios configuration; go back to the install instructions, fix the
problem, and try again.
So now that you have submitted a job to Helios, how do you make it run? If you
saved the service we discussed above in a file called TestService.pm in the
current directory, you can start the service using the helios.pl service
loader/daemon:
helios.pl TestService
If you enabled HELIOS_DEBUG, you'll see a lot of messages scroll on the screen
as helios.pl does some setup, attempts to load your service class, and parses
the configuration for the service in helios.ini and in the Helios database. If
that all goes well, the service daemon will look for jobs, see the job you
submitted earlier, and launch a worker process to run the job. The worker
process will call the run() method you defined, logging the job arguments to
the Helios log and marking the job as completed. You'll see the job arguments
printed on the screen:
*** JOBID: 9 param: myarg1 value: This is a test! ***
Once all that is done, you'll see a "0 waiting TestService jobs." message. At
that point you can press Ctrl-C to exit the service daemon. You can also open
another terminal session and submit another job and watch it being processed
if you like.
(If you didn't enable HELIOS_DEBUG, the service daemon will still do all the
things described, but you'll only see a message that your TestService class was
loaded, and then helios.pl will daemonize, disconnecting from your terminal in
the process.)
If you want to check the log messages your service wrote to the log while
processing the job, you can use the helios_job_info command to find out a job's
start and complete times, whether it ran successfully, and any log messages it
recorded. If you have the jobid from the job submitted earlier, issue a
command like this:
helios_job_info --jobid=9 --args --logs
to see a full report on the job like the one below:
Jobid: 9
Jobtype: Helios::TestService
Submit Time: Fri Mar 7 17:12:08 2014
Complete Time: Fri Mar 7 17:13:00 2014
Exitstatus: 0
Args:
<job><params><myarg1>This is a test</myarg1></params></job>
Logs:
Fri Mar 7 17:13:00 2014 [localhost:13432] INFO Helios::TestService says, "Hello World!"
Fri Mar 7 17:13:00 2014 [localhost:13432] INFO JOBARG=myarg1 VALUE=This is a test
You can also use the Helios::Panoptes web application to view and search the
Helios system log. In addition to messages related to specific jobs,
Helios::Panoptes will also show log messages that the helios.pl
service daemon logged about starting up, seeing jobs, and launching processes
to handle those jobs. It is worth becoming familiar with these messages so
will be able to understand what is happening to your jobs and services as you
develop, deploy, and manage services in your Helios collective.
=head1 SUBMITTING JOBS
In the previous section, you saw that you can submit jobs to Helios using the
helios_job_submit.pl command line program. There are actually 3 ways to
submit jobs to Helios:
=over 4
=item
helios_job_submit.pl, a shell program
=item
over HTTP with the included submitJob.pl CGI script
=item
in your own Perl programs, using the Helios::Job class
=back
If you want to submit jobs via the shell or over HTTP, check the perldoc for
helios_job_submit.pl and submitJob.pl for more information.
Sometimes you need more integration than a shell or CGI script can provide,
especially if you're running in a persistent environment like FastCGI or
mod_perl. In those cases, you should use the Helios job submission API
directly.
To use the Helios job submission API, you initialize Helios using the
Helios::Service class, create a Helios::Job object, and submit it to the
system.
For example:
use strict;
use warnings;
use Helios::Service;
use Helios::Job;
# create a Helios::Service object, initialize it with prep()
# then get the $config hash with getConfig()
my $service = Helios::Service->new();
$service->prep() or die($service->errstr);
my $config = $service->getConfig();
# create your job arguments in XML
# then instantiate a Helios::Job object
# give it the Helios $config with setConfig()
# tell it the service class that should process the job with setJobType()
# set your job arguments with setArgString()
my $jobxml = '<job><params><filename>Rise.mp3</filename></params></job>';
my $job = Helios::Job->new();
$job->setConfig($config);
$job->setJobType('MP3IndexerService');
$job->setArgString($jobxml);
# finally, submit the job to the system
my $jobid = $job->submit();
The first thing to do is to instantiate a Helios::Service object, call the
prep() method to parse the configuration and initialize a connection to the
Helios collective database, and get the basic Helios configuration by calling
the getConfig() method.
Once you have the Helios configuration, you're ready to create your job.
Create an XML string specifying the job arguments in XML. Then instantiate
the Helios::Job object with the new() method. Give your job object the
Helios configuration you retrieved earlier (with setConfig()) and the name of
the service class you want to service the job (with setJobType()). Finally,
set the job's arguments by using the setArgString() method.
Then submit the job to Helios using the submit() method. If the job
submission was successful, submit() will return the jobid of the newly
submitted job. If something goes wrong, submit() will throw an exception.
Once the job is submitted, it goes into the Helios collective's job queue
marked for the service you specified. When a service with that name starts,
the helios.pl daemon will see jobs for that service are available, and will
launch worker processes to process them. The worker processes will pull the
jobs from the queue and call your service's run() method, passing it the
Helios::Job object. Once your run() method has marked the job as a
success or failure and returned, the worker process will end or, if the
OVERDRIVE configuration parameter has been set, the worker process will
pull another job from the queue and call your service's run() method again.
=head2 JOB ARGUMENT XML
Helios job arguments are normally specified in XML-like markup that follow a
relatively simple format:
<job>
<params>
<argument_tag>argumentValue</argument_tag>
...
</params>
</job>
While the markup language is definitely XML-like and must be well-formed like
XML, in reality there is no DTD to validate against, and the tags in the
<params> section are left entirely up to the user to define. This gives you
maximal flexibility in determining the names and values of your job
arguments, and also makes it simple to parse the arguments into the job
argument hash for Helios services to use. Take the following job arguments,
for example:
<job>
<params>
<id>456</id>
<type>blog</type>
<email>hanse@davion.gov</email>
</params>
</job>
In the run() method of a service, calling the getJobArgs() method with a job
with the above arguments will yield a reference to a hash like this:
{
'id' => '456',
'type' => 'blog',
'email' => 'hanse@davion.gov'
}
So the tag names become the keys of the hash, and the enclosed strings become
the hash values.
Keep in mind that although job argument XML can be flexible, the XML parser is
set up to do things relatively simply, so complex XML structures should be
avoided. In Helios, "jobs" are really only parameters to "services," so job
arguments are best kept simple. The logic of your application should go in
your Helios service class.
=head1 CONFIGURING SERVICES
In the previous simple TestService example, you saw that the service's
configuration is available via the getConfig() method. But how is that
configuration set up? The Helios configuration system provides the ability to
centrally configure services across an entire collective and, if necessary,
tailor a service's configuration on a per host basis.
The first piece of the Helios configuration system is the helios.ini file.
All of the configuration parameters set in the [global] section of helios.ini
are available not just to the helios.pl service daemon, but to all Helios
services running in a particular collective. You may also put configuration
parameters specific to your service in helios.ini by creating a section named
the same as the service:
[global]
dsn=dbi:mysql:host=hostname;db=helios_db
user=helios
password=password
[TestService]
loggers=HeliosX::Logger::File
logfile_path=/var/log/helios/
logfile_priority_threshold=6
The [TestService] section here would set up the logging configuration
specifically for the TestService service (see below for more about the Helios
logging system). While all Helios services will see the configuration
options set in the [global] section, only the TestService service will see the
configuration options set in the TestService section.
While you can set the configuration options for your service in helios.ini and
distribute the helios.ini between all of your hosts, that is very tedious and
unwieldly way to manage a service's configuration. In addition to the
helios.ini file, configuration parameters for a service can also be set using
the helios_config_set command. The helios_config_set command takes 4 arguments:
=over 4
=item --service
The service you are setting the config parameter for.
=item --hostname
The host you are setting the config parameter for. A parameter can be set to
affect a service on a single host or every host in the collective. If you
do not specify a --hostname, helios_config_set will assume the parameter
should only affect the specified service on the current host. If you want the
parameter to affect the service running on any host, set the hostname an
asterisk ("*").
=item --param
The name of the config parameter to set.
=item --value
The actual value of the parameter to set.
=back
For example, if you want to Helios to run up to 5 TestService workers at a
time on the current host, you can issue the following command to set the
MAX_WORKERS config parameter:
helios_config_set --service=TestService --param=MAX_WORKERS --value=5
To enable OVERDRIVE mode on TestService workers on every host in your Helios
collective, use the --hostname parameter and set it to '*':
helios_config_set --service=TestService --hostname=* --param=OVERDRIVE --value=1
If you want to check your work, you can use the helios_config_get command, with
the same options:
helios_config_get --service=TestService --param=MAX_WORKERS
You can also use the helios_config_unset command to delete a parameter from the
collective database entirely.
You can also use the Helios::Panoptes web application to set config parameters
for your services. Also, remember that though Helios defines a lot of special
configuration parameters itself, you can use the Helios configuration subsystem
to specify other parameters your service might need. For example, if you have
a Helios service called Indexer, which has a landing directory where it stores
incoming files, you can specify a "landing_zone" parameter available to all of
Indexer instances running on every host of your collective:
helios_config_set --service=Indexer --hostname=* --param=landing_zone --value=/mnt/SAN1/incoming"
Regardless of how you set configuration parameters, when your service class
calls the getConfig() method, a hashref will be returned that will contain the
configuration options specific to the service running on that particular host.
The hash keys will be the parameter name, while the hash values will be the
values specified for that particular parameter. The hash
will contain:
=over 4
=item
any parameters set in the helios.ini [global] section,
=item
any parameters set in helios.ini with section name matching the service's name,
=item
any parameters in Helios collective database matching the service's
name and a hostname set to '*'
=item
any parameters in collective database with the service's name and a hostname
set to the current host.
=back
Each of the above items will override the config options set by the previous
ones. For example, if you set a 'log_priority_threshold' option for a service
for the current host, it will override any 'log_priority_threshold' options
set for the service globally (hostname = '*') or in helios.ini. In this way
you can set configuration options for services running across the collective
but isolate specific instances of a service on particular hosts if necessary.
=head1 LOGGING
You will note in the TestService example the use of the logMsg() method to send
messages to the Helios logging system. The Helios logging system is an
extensible system to keep track of what goes on in the Helios system and
during job processing.
Inside of your service, the logMsg() method is what you need to log messages to
the Helios logging system. The logMsg() method takes 3 parameters:
=over 4
=item
the Helios::Job object of representing the current job (optional)
=item
the priority level of the message (optional)
=item
a string with the message you want to add to the log
=back
If you pass a Helios::Job object in your call to logMsg(), the jobid will be
recorded along with the message.
The message priority levels of messages are defined in
Helios::LogEntry::Levels. If you import these levels with the ':all' tag at
the beginning of your service:
use Helios::LogEntry::Levels ':all';
you can use symbols rather than integers to specify the severity of your log
entry. If you don't specify a priority level, the message will default to
LOG_INFO priority.
The default, internal Helios logging system records messages in a table in the
Helios collective database. You can access log messages for a specific job
using the helios_job_info command. You can also use the Helios::Panoptes
application to view log messages for particular jobs and more system-level
messages recorded by the helios.pl daemon. Helios::Panoptes will also allow
you to filter and search for messages matching certain criteria.
You can check the L<Helios::Service> man page entry for the logMsg() method for
information on logging, and the L<Helios::Configuration> page for more
information about logging configuration. If you want to configure your Helios
collective to use some other logging system, check the L<Helios::Logger> man
page for information about creating your own Helios interfaces to other
logging systems.
=head1 A MORE USEFUL EXAMPLE
Included in the eg/ directory of your Helios distribution is a simple sample
Helios application called MP3IndexerService. Unlike the TestService service
class discussed in this tutorial, MP3IndexerService actually does something
useful: given a list of filenames of MP3s, MP3IndexerService will parse the
ID3 and other useful information and store it in a database table. It can be
useful for finding duplicate copies of tracks or just reviewing the different
artists, albums, etc. that you have on your hard drive. A look at its code
will reveal it uses all the major Helios subsystems (job queuing,
configuration, logging) in some way or another. Though it remains a very
simple application, it demonstrates how easily a useful Helios application can
be written.
=head1 SEE ALSO
L<helios.pl>, L<Helios::Service>, L<Helios::Job>, L<Helios::Panoptes>
=head1 AUTHOR
Andrew Johnson, E<lt>lajandy at cpan dotorgE<gt>
=head1 COPYRIGHT AND LICENSE
Copyright (C) 2012-4 by Andrew Johnson.
Portions of this document, where noted, are
Copyright (C) 2008-9 by CEB Toolbox, Inc.
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.0 or,
at your option, any later version of Perl 5 you may have available.
=head1 WARRANTY
This software comes with no warranty of any kind.
=cut