The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Net::OpenSSH::Parallel - Run SSH jobs in parallel

SYNOPSIS

  use Net::OpenSSH::Parallel;

  my $pssh = Net::OpenSSH::Parallel->new();
  $pssh->add_host($_) for @hosts;

  $pssh->push('*', scp_put => '/local/file/path', '/remote/file/path');
  $pssh->push('*', command => 'gurummm',
              '/remote/file/path', '/tmp/output');
  $pssh->push($special_host, command => 'prumprum', '/tmp/output');
  $pssh->push('*', scp_get => '/tmp/output', 'logs/%HOST%/output');

  $pssh->run;

DESCRIPTION

Run this here, that there, etc.

  ***
  *** Note: This is an early release!
  ***
  *** The module design and particularly the public API has not yet
  *** stabilized. Future versions of the module are not guaranteed to
  *** remain compatible with this one.
  ***

Net::OpenSSH::Parallel is an scheduler that can run commands in parallel in a set of hosts through SSH. It tries to find a compromise between being simple to use, efficient and covering a good part of the problem space of parallel process execution via SSH.

Obviously, it is build on top of Net::OpenSSH!

Common usage of the module is as follows:

  • create a Net::OpenSSH::Parallel object

  • register the hosts where you want to run commands with the "add_host" method

  • queue the actions you want to run (commands, file copy operations, etc.) using the "push" method.

  • call the "run" method and let the parallel scheduler take care of everything!

Labelling hosts

Every host is identified by an unique label that is given when the host is registered into the parallel scheduler. Usually, the host name is used also as the label, but this is not required by the module.

The rationale behind using labels is that a hostname does not necessarily identify unique "remote processors" (for instance, sometimes your logical "remote processors" may be user accounts distributed over a set of hosts: foo1@bar1, foo2@bar1, foo3@bar2, ...; a set of hosts that are accesible behind an unique IP, listening in different ports; etc.)

Selecting hosts

Several of the methods of this module (well, currently, just push) accept a selector string to determine which of the registered hosts should be affected by the operation.

For instance, in...

  $pssh->push('*', command => 'ls')

the first argument is the selector. The one used here, *, selects all the registered hosts.

Other possible selectors are:

  'bar*'                # selects everything beginning by 'bar'
  'foo1,foo3,foo6'      # selects the hosts of the given names
  'bar*,foo1,foo3,foo6' # both
  '*doz*'               # everything containing 'doz'

Note: I am still considering how the selector mini-language should be, don't hesitate to send your suggestions!

Local resource usage

When the number of hosts managed by the scheduler is too high, the local node can become overloaded.

Roughly, every SSH connection requires two local ssh processes (one to run the SSH connection and another one to launch the remote command) that results in around 5MB of RAM usage per host.

CPU usage varies greatly depending on the tasks carried out. The most expensive are short remote tasks (because of the local process creation and destruction overhead) and tasks that transfer big ammounts of data through SSH (because of the encryption going on).

In practice, CPU usage doesn't matter too much (mostly because the OS would be able to manage it but also because there is not too many things we can do to reduce it) and usually it is RAM about what we should be more concerned.

The module accepts two parameters to limit resource usage:

  • maximum_workers

    is the maximun number of remote commands that can be running concurrently.

  • maximum_connections

    is the maximum number of SSH connections that can be active concurrently.

In practice, limiting maximum_connections indirectly limits RAM usage and limiting the maximum_workers indirectly limits CPU usage.

The module requires maximum_connections to be at least equal or bigger than maximum_workers, and it is recomended that maximum_connections >= 2 * maximum_workers (otherwise the scheduler will not be able to reuse connections efficiently).

You will have to experiment to find out which combinations give the best results in your particular scenarios.

Also, for small sets of hosts you can just let these parameters unlimited.

Variable expansion

This module activates Net::OpenSSH variable expansion by default. That way, it is possible to easily customize the actions executed on every host in base to some of its properties.

For instance:

  $pssh->queue('*', scp_get => "/var/log/messages", "messages.%HOST%");

copies the log files appending the name of the remote hosts to the local file names.

The variables HOST, USER, PORT and LABEL are predefined.

Error handling

When something goes wrong (for instance, some host is unreachable, some connection dies, some command fails, etc.) the module can handle the error in several predefined ways as follows:

Error policies

To set the error handling police, "new", "add_host" and "push" methods support and optional on_error argument that can take the following values (these constants are available from Net::OpenSSH::Parallel::Constants):

OSSH_ON_ERROR_IGNORE

Ignores the error and continues executing tasks in the host queue as it had never happened.

OSSH_ON_ERROR_ABORT

Aborts the processing on the corresponding host. The error will be propagated to other hosts joining it at any later point once the join is reached.

In other words, this police aborts the queued jobs for this host and any other that has a dependency on it.

OSSH_ON_ERROR_DONE

Similar to OSSH_ON_ERROR_ABORT but will not propagate errors to other hosts via joins.

OSSH_ON_ERROR_ABORT_ALL

Not implemented yet!

Causes all the host to abort as soon as possible (and that usually means after they finish their currently running tasks).

OSSH_ON_ERROR_REPEAT

The module will try to perform the current task again and again until it succeeds. This police can lead to an infinite loop and so its direct usage is discouraged (but see the following point about setting the policy dinamically).

The default policy is OSSH_ON_ERROR_ABORT.

Setting the policy dynamically

When a subroutine reference is used as the policy instead of the any of the constants previously described, the given subroutine will be called on error conditions as follows:

  $on_error->($pssh, $label, $error, $task)

$pssh is a reference to the Net::OpenSSH::Parallel object, $label is the label associated to the host where the error happened. $error is the error type as defined in Net::OpenSSH::Parallel::Constants and $task is a reference to the task that was being carried out.

The return value of the subroutine must be one of the described constants and the corresponding policy will be applied.

Retrying connection errors

If the module fails when trying to stablish a new SSH connection or when an existing connection dies unexpectedly, the option reconnections can be used to instruct the module to retry the connection until it succeds or the given maximun is reached.

reconnections is accepted by both the "new" and "add_host" methods.

Example:

  $pssh->add_host('foo', reconnections => 3);

Note that the reconnections maximum is not per host but per queued task.

API

These are the available methods:

$pssh = Net::OpenSSH::Parallel->new(%opts)

creates a new object.

The accepted options are:

workers => $maximum_workers

sets the maximum number of operations that can be carried out in parallel (see "Local resource usage").

connections => $maximum_connections

sets the maximum number of SSH connections that can be stablished simultaneously (see "Local resource usage").

$maximum_connections must be equal or bigger than $maximum_workers

reconnections => $maximum_reconnections

when connecting to some host fails, this argument tells the module the maximum number of additional connection atemps that it should perform before giving up. The default value is zero.

See also "Retrying connection errors".

on_error => $policy

Sets the error handling policy (see "Error handling").

$pssh->add_host($label, %opts)
$pssh->add_host($label, $host, %opts)

registers a new host into the $pssh object.

$label is the name used to refer to the registered host afterwards.

When the hostname argument is ommited, the label is used also as the hostname.

The accepted options are:

on_error => $policy

Sets the error handling policy (see "Error handling").

max_reconns => $maximum_reconnections

See </Retrying connection errors>.

Any additional option will be passed verbatim to the Net::OpenSSH constructor later.

$pssh->push($selector, $action, \%opts, @action_args)
$pssh->push($selector, $action, @action_args)

pushes a new action into the queues selected by $selector.

The supported actions are:

command => @cmd

queue the given shell command on the selected hosts.

Example:

  $self->push('*', 'command'
              { stdout_fh => $find_fh, stderr_to_stdout => 1 },
              'find', '/my/dir');
scp_get => @remote, $local
scp_put => @local, $remote

These methods queue an SCP remote file copy operation in the selected hosts.

sub { ... }

Queues a call to a perl subroutine that will be executed locally.

When given, %opts can contain the following options:

on_error => $fail_mode
on_error => sub { }

See "Error handling".

timeout => $seconds

not implemented yet!

on_done => sub { }

not implemented yet!

Any other option will be passed to the corresponding Net::OpenSSH method (spawn, scp_put, etc.).

$pssh->run

Runs the queued operations.

It returns a true value on success and false otherwise.

$pssh->get_error($label)

Returns the last error associated to the host of the given label.

TODO

  • run N processes per host concurrently

    allow running more than one process per remote server concurrently

  • delay before reconnect

    when connecting fails, do not try to reconnect inmediately but after some predefined period

  • rationalize debugging

    currently it is a mess

  • add loggin support

    log the operations performed in a given file

  • stdio redirection

    add support for better handling of the Net::OpenSSH stdio redirection facilities

  • configurable valid return codes

    Non zero exit code is not always an error.

BUGS AND SUPPORT

This is a very, very, very early release of the module, lots of bugs should be expected!!!

If you find any, report it via http://rt.cpan.org or by email (to sfandino@yahoo.com), please.

Feedback and comments are also welcome!

Reporting bugs

In order to report a bug, write a minimal program that triggers it and place the following line at the beggining:

  $Net::OpenSSH::Parallel::debug = -1;

Then, send me (via rt or email) the debugging output you get when you run it. Include also the source code of the script, a description of what is going wrong and the details of your OS and the versions of Perl, Net::OpenSSH and Net::OpenSSH::Parallel you are using.

Development version

The source code for this module is hosted at GitHub: http://github.com/salva/p5-Net-OpenSSH-Parallel.

Commercial support

Commercial support, professional services and custom software development around this module are available through my current company. Drop me an email with a rough description of your requirements and we will get back to you ASAP.

My wishlist

If you like this module and you're feeling generous, take a look at my Amazon Wish List: http://amzn.com/w/1WU1P6IR5QZ42

SEE ALSO

Net::OpenSSH is used to manage the SSH connections to the remote hosts.

SSH::Batch has a similar focus as this module. In my opinion it is simpler to use but rather more limited.

GRID::Machine allows to run perl code distributed in a cluster via SSH.

If your application requires orchestating workflows more complex than those supported by Net::OpenSSH::Parallel, you should probably consider some POE based solution (check POE::Component::OpenSSH).

App::MrShell is another module allowing to run the same command in several host in parallel.

COPYRIGHT AND LICENSE

Copyright © 2009-2010 by Salvador Fandiño (sfandino@yahoo.com).

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.