Przemysław Wesołek >
Data-Pareto-0.05 >
Data::Pareto

Module Version: 0.05
Data::Pareto - Computing Pareto sets in Perl

Version 0.05

use Data::Pareto; # only first and third columns are used in comparison # the others are simply descriptive my $set = new Data::Pareto( { columns => [0, 2] } ); $set->add( [ 5, "pareto", 10, 11 ], [ 5, "dominated", 11, 9 ], [ 4, "pareto2", 12, 12 ] ); # this returns [ [ 5, "pareto", 10, 11 ], [ 4, "pareto2", 12, 12 ] ], # the other one is dominated on selected columns $set->get_pareto_ref;

This module makes calculation of Pareto set. Given a set of vectors (i.e. arrays of simple scalars), Pareto set is all the vectors from the given set which are not dominated by any other vector of the set. A vector `X`

is said to be dominated by `Y`

, iff `X[i] >= Y[i]`

for all `i`

and `X[i] > Y[i]`

for at least one `i`

.

Pareto sets play an important role in multiobjective optimization, where each non-dominated (i.e. Pareto) vector describes objectives value of "optimal" solution to the given problem.

This module allows occurrence of duplicates in the set - this makes it rather a bag than a set, but is useful in practice (e.g. when we want to preserve two solutions giving the same objectives value, but structurally different). This assumption influences dominance definition given above: two duplicates never dominate each other and hence can be present in the Pareto set. This is controlled by `duplicates`

option passed to new(): if set to `true`

value, duplicates are allowed in Pareto set; otherwise, only the first found element of the subset of duplicated vectors is preserved in Pareto set.

The values are allowed to be invalid. The meaning of 'invalid' is 'the worst possible'. It's different concept than 'unknown'; unknown value make the definition of domination less clear.

By default, the comparison of column values is numerical and the smaller value dominates the larger one. If you want to override this behaviour, pass your own dominator sub in arguments to new().

By default, a vector is passed around as a ref to array of consecutive column values. This means you shouldn't mess with it after passing to `add`

method.

Creates a new object for calculating Pareto set.

The first argument passed is a hashref with options; the recognized options are:

`columns`

Arrayref containing column numbers which should be used for determining domination and duplication. Column numbers are

`0`

-based array indexes to data vectors.Only values at those positions will be ever compared between vectors. Any other data in the vectors may be present and is not used in any way.

At least one column number should be passed, for obvious reasons.

`duplicates`

If set to

`true`

value, duplicated vectors are all put in Pareto set (if they are Pareto, of course). If set to`false`

, duplicates of vectors already in the Pareto set are discarded.`invalid`

The value considered invalid in pareto set. Such value is dominated by any value and dominates only invalid value.

However, computations of domination in presence of invalid values can be considerably slower, as much as 5 times. So it probably will be faster to first parse the data and replace invalid markers with some huge-and-surely-dominated values.

`column_dominator`

The sub(s) used to compare specific column values and determining domination between them. Scalar, sub ref or hash ref. If not set, the default is that the numerically smaller value dominates the other one.

When the scalar is passed, it is assumed to be the name of a predefined dominator. This is a much faster option to specifying the sub of your own. Recognized dominators are:

`min`

numerically smaller value dominates`max`

numerically greater value dominates`lexi`

earlier in collation order value dominates (lexicographical order)`lexi_rev`

later in collation order value dominates (reversed lexicographical order)`std`

standard, i.e.`min`

dominator

During creation of Pareto set, the dominator sub is called with three arguments: column number, first vector's value, second vector's value, and should return

`true`

, when the second value dominates the first one, assuming they appeared in the specified column.Make sure that your sub returns

`true`

when two passed values are the same. This is necessary to obey the whole Pareto set domination contract.There are two approaches possible when the values in different columns are of different types, in the sense of domination. First, you can use passed column number to decide the domination check function. Alternatively, you can pass a hash ref with mapping from the column number to the sub ref used to compare the given column:

my $lexi_dominator = sub { my ($col, $dominated, $by) = @_; return ($dominated ge $by); }; my $min_dominator = sub { my ($col, $dominated, $by) = @_; return ($dominated >= $by); } my $set = new Data::Pareto({ columns => [0, 2], column_dominator => { 0 => $lexi_dominator, 2 => $min_dominator } }); $set->add(['a', 'label 1', 12], ['b', 'label 2', 9]);

The rest of arguments are assumed to be vectors, and passed to add() method.

Tests vectors passed as arguments and adds the non-dominated ones to the Pareto set.

Returns the current content of Pareto set as a list of vectors.

Returns the current content of Pareto set as a ref to array with vectors. The return value references the original array, so treat it as read-only!

Checks if the first vector passed is dominated by the second one. The comparison is made based on the values in vectors' columns, which were passed to new().

The vectors passed are never duplicates of each other when this method is called from inside this module.

Returns `true`

, when the first vector from arguments list is dominated by the other one, and `false`

otherwise.

Checks if the given value is considered invalid for the current object. Every value is valid by default.

Allow specifying built-in dominators inside dominator hash.

For large data sets calculations become time-intensive. There are a couple of techniques which might be applied to improve the performance:

- defer the phase of removing vectors dominated by newly added vectors to get_pareto() call; this results in smaller number of arrays rewritings.
- split the set of vectors being added into smaller subsets, calculate Pareto sets for such subsets, and then apply insertion of resulting Pareto subsets to the main set; this results in smaller number of useless tries of adding dominated vectors into the set.

Przemyslaw Wesolek, `<jest at go.art.pl>`

Please report any bugs or feature requests to `bug-data-pareto at rt.cpan.org`

, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Pareto. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

You can find documentation for this module with the perldoc command.

perldoc Data::Pareto

You can also look for information at:

- RT: CPAN's request tracker
- AnnoCPAN: Annotated CPAN documentation
- CPAN Ratings
- Search CPAN

Copyright 2009 Przemyslaw Wesolek

This program is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0. For details, see the full text of the license in the file LICENSE.

syntax highlighting: