The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Set::IntSpan::Island - extension for Set::IntSpan to handle islands, holes and covers

SYNOPSIS

  use Set::IntSpan::Island;

  # inherits normal behaviour from Set::IntSpan
  $set = Set::IntSpan::Island->new( $set_spec );
  # special pair input creates a span a-b
  $set = Set::IntSpan::Island->new( $a,$b );

  # equivalent to $set->cardinality($another_set)->size;
  if ($set->overlap( $another_set )) { ... }

  # distance between spans is negative if spans overlap, positive if not
  $distance = $set->distance( $another_set );

  # remove islands whose size is smaller than $minsize
  $new_set = $set->excise( $minsize );

  # remove islands whose size is found in the set $sizes_set,
  $new_set = $set->excise( $sizes_set );
  # all islands sized <= 10 removed
  $new_set = $set->excise( Set::IntSpan( "(-10" ) );
  # all islands sized >= 10 removed
  $new_set = $set->excise( Set::IntSpan( "10-)" ) );
  # all islands of size between 2-5 removed
  $new_set = $set->excise( Set::IntSpan( "2-5" ) );

  # remove islands larger than $maxlength
  $set = $set->excise_large( $minlength );

  # fill holes up to $maxsize
  $set = $set->fill( $maxsize );

  # fill holes whose size is found in the set $sizes_set
  $set = $set->fill( $sizes_set);
  # all holes sizes <= 10 filled
  $set = $set->fill( Set::IntSpan( "(-10" ) );
  # all holes sizes >= 10 filled
  $set = $set->fill( Set::IntSpan( "10-)" ) );
  # all holes sizes 2-5 filled
  $set = $set->fill( Set::IntSpan( "2-5" ) );

  # return a set composed of islands of $set that overlap $another_set
  $set = $set->find_island( $another_set );

  # return a set composed of the nearest non-overlapping island(s) to $another_set
  $set = $set->nearest_island( $another_set );

  # construct a list of covers by exhaustively intersecting all sets
  @covers = Set::IntSpan::Island->extract_covers( { id1=>$set1, id2=>set2, ... } );
  for $cover (@covers) {
    ($coverset,@ids) = ($cover->[0], @{$cover->[1]});
    print "cover",$coverset->run_list,"contains sets",join(",",@ids);
  }

DESCRIPTION

This module extends the Set::IntSpan module by Steve McDougall. It implementing methods that are specific to islands, holes and covers. Set::IntSpan::Island inherits from Set::IntSpan.

Terminology

An integer set, as represented by Set::IntSpan, is a collection of islands (or spans) on the number line

  ...-----xxxx----xxxxxxxx---xxxxxxxx---xx---x----....

Holes are regions not in the set that fall between adjacent spans. For example, the integer set above is composed of 5 islands and 4 holes. The two infinite regions on either side of the set are not counted as holes within the context of this module.

METHODS

$set = Set::IntSpan::Island->new( $set_spec )

Constructs a set using the set specification as supported by Set::IntSpan.

$set = Set::IntSpan::Island->new( $a, $b )

Extension to Set::IntSpan new method, this double-argument version creates a set formed by the range a-b. This is equivalent to

  $set = Set::IntSpan::Island->new("$a-$b")

but permits initialization from a list instead of a string. The arguments $a and $b are expected to be integers - any decimal component will be truncated.

  new(1.2,2.9) equivalent to new(1,2)

$set_copy = $set->clone()

Creates a copy of $set. Also accessible using $set-duplicate()>;

$set_copy = $set->duplicate()

Same as clone().

$olap = $set->overlap( $another_set );

Returns the size of intersection of two sets. Equivalent to

  $set->intersect( $another_set )->size;

The returned value is either 0 (if the sets do not overlap) or positive (if they do).

$d = $set->distance( $another_set )

Returns the distance between sets, measured as follows. If the sets overlap, then the distance is negative and given by

  $d = -$set->overlap( $another_set )

If the sets abut, $d is 1. Here $d can be interpreted as the difference between the closest edges of the two sets.

The above generalizes to 1+size(hole) if the sets do not overlap and are composed of multiple islands. The hole used is the one between two closest islands of the sets.

Returns undef if $another_set is not defined, or either $set or $another_set is empty.

Here are some examples of how the distance is calculated.

   A ----xxxx---xxx-----xx--
   B ------xxx------xx--x---
           !!           !    d=-3

   A ----xxxx---xxx-----xx--
   B ----xxxx---xxx---------
         !!!!   !!!          d=-7

   A ----xxxx---xxx-----xx--
   B --------------x--------
                  ><         d=1

   A ----xxxx---xxx-----xx--
   B ---------------x-------
                  > <        d=2

   A ----xxxx---xxx-----xx--
   B ---------------xx------
                  > <        d=2

   A ----xxxx---xxx-----xx--
   B ---------------xxxx----
                       ><    d=1

$d = $set->sets()

Returns all spans in $set as Set::IntSpan::Island objects. This method overrides the sets method in Set::IntSpan in order to return sets as Set::IntSpan::Island objects.

$new_set = $set->excise( $minlength | $size_set )

Removes all islands smaller than $minlength. If $minlength < 1 then no elements are removed and a copy of the set is returned. Since only islands smaller than $minlength are removed, the smallest useful value for $minlength is 2.

If passed a set $size_set, removes all islands whose size is found in $size_set. This extended functionality allows you to pass in arbitrary size cutoffs. For example, to remove islands of size <=10

  $new_set = $set->excise( Set::IntSpan->( "(-10" ) )

or to remove islands of size 2-10

  $new_set = $set->excise( Set::IntSpan->( "2-10" ) )

Since size of an island must be non-zero and positive, any negative elements in the size set will be ignored. The two are therefore equivalent

  $new_set = $set->excise( Set::IntSpan->( "2-10" ) )
  $new_set = $set->excise( Set::IntSpan->( "(--1,2-10" ) )

Using a size set allows you to excise islands larger than a certain size. For example, to remove all islands 10 or bigger,

  $new_set = $set->excise( Set::IntSpan->( "10-)" ) )

Regardless of input, if all islands are excised (i.e. all elements from $set are removed), this function will return an empty set.

Contrast excise() to keep(). Use excise() when you have a set of island sizes you want to remove. Use keep() when you have a set of island sizes you want to keep. In other words, these are equivalent:

  $set->excise( $size_set )
  $set->keep( $size_set->complement )

Strictly speaking, you can pass in any object as a size limiter, as long as it implements a member() function which returns 1 if the size is in the cutoff set and 0 otherwise.

  $filter = Some::Other::Module->new();
  # set $filter parameters according to Some::Other::Module API...
  ...
  # $filter must implement "member" function
  $filter->can("member")
  if($filter->member(10)) {
    print "islands of size 10 will be removed";
  } else {
    print "islands of size 10 will be kept";
  }
  $set->excise($filter);

$new_set = $set->keep( $maxlength | $size_set )

If passed an integer $maxlength, removes all islands larger than $maxlength.

If passed a set $size_set, removes all islands whose size is not found in $size_set. For example, to keep all islands sized 10 or larger,

  $new_set = $set->keep( Set::IntSpan->( "10-)" ) )

or keep all islands sized 2-10

  $new_set = $set->excise( Set::IntSpan->( "2-10" ) )

Returns an empty set if no islands are kept.

Since size of an island must be non-zero and positive, any negative elements in the size set will be ignored. The two are therefore equivalent

  $new_set = $set->keep( Set::IntSpan->( "2-10" ) )
  $new_set = $set->keep( Set::IntSpan->( "(--1,2-10" ) )

Contrast keep() to excise(). Use keep() when you have a set of island sizes you want to keep. Use excise() when you have a set of island sizes you want to remove. In other words, these are equivalent:

  $set->keep( $size_set )
  $set->excise( $size_set->complement )

Strictly speaking, you can pass in any object as a size limiter, as long as it implements a member() function which returns 1 if the size is in the cutoff set and 0 otherwise. See the description of excise() for details.

$set = $set->fill( $maxsize | $size_set )

If passed an integer $maxsize, fills in all holes in $set smaller than $maxsize.

If passed a set $size_set, fills in all holes whose size appears in $size_set.

Strictly speaking, you can pass in any object as a size limiter, as long as it implements a member() function which returns 1 if the size is in the cutoff set and 0 otherwise. See the description of excise() for details.

$island_set = $set->find_islands( $integer | $another_set )

Returns a set composed of islands from $set that overlap with $integer or $another_set.

If an integer is passed and $integer is not in $set, an empty set is returned.

If a set is passed and $set and $another_set have an empty intersection, an empty set is returned.

           set ----xxxx---xxx-----xx--
   another_set ------------x----------
    island_set -----------xxx---------

           set ----xxxx---xxx-----xx--
   another_set ------------xxxxx------
    island_set -----------xxx---------

           set ----xxxx---xxx-----xx--
   another_set ------------xxxxx---xx-
    island_set -----------xxx-----xx--

Contrast this to nearest_island() which returns the closest island(s) that do not overlap with $integer or $another_set.

$island_set = $set->nearest_island( $integer | $another_set)

Returns the island(s) in $set closest (but not overlapping) to $integer or $another_set. If $integer or $another_set lie exactly between two islands, then the returned set contains these two islands.

If no non-overlapping islands in $set are found, an empty set is returned.

           set ----xxxx---xxx-----xx--
   another_set ------------x----------
    island_set ----xxxx---------------

           set ----xxxx---xxx-----xx--
   another_set ------------xxxxx------
    island_set -------------------xx--

           set ----xxxx---xxx-----xx--
   another_set ----------xxxxxxx------
    island_set ----xxxx-----------xx--

If $another_set contains multiple islands, such as below, $island_set may also contain multiple islands.

           set ----xxxx---xxx-----xx--
   another_set ---x----xxx------------
    island_set ----xxxx---xxx---------

Contrast this to find_islands() which returns the island(s) that overlap with $integer or $another_set.

$num_islands = $set->num_islands()

Returns the number of islands in the set. If the set is empty, 0 is returned.

$island = $set->at_island( $island_index )

Returns the island indexed by $island_index. Islands are 0-indexed. For a set with N islands, the first island (ordered left-to-right) has index 0 and the last island has index N-1.

If $island_index is negative, counting is done back from the last island.

If $island_index is beyond the last island, undef is returned.

$island = $set->first_island()

Returns the first island of the set. As a side-effect, sets the iterator to the first island.

If the set is empty, returns undef.

$island = $set->last_island()

Returns the last island of the set. As a side-effect, sets the iterator to the last island.

If the set is empty, returns undef.

$island = $set->next_island()

Advances the iterator forward by one island, and returns the next island. If the iterator is undefined, the first island is returned.

Returns undef if the set is empty or if no more islands are available.

$island = $set->prev_island()

Reverses the iterator backward by one island, and returns the previous island. If the iterator is undefined, the last island is returned.

Returns undef if the set is empty or if no more islands are available.

$island = $set->current_island()

Returns the island at the current iterator position.

Returns undef if the set is empty or if the iterator is not defined.

$cover_data = Set::IntSpan::Island->extract_covers( $set_hash_ref )

Given a $set_hash reference

  { id1=>$set1, id2=>$set2, ..., idn=>$setn}

where $setj is a finite Set::IntSpan::Island object and idN is a unique key, extract_covers performs an exhaustive intersection of all sets and returns a list of all covers and set memberships. For example, given the id/runlist combination

  a 10-15 
  b 12 
  c 14-20 
  d 25

The covers are

  10-11 a
  12    a b
  13    a
  14-15 a c
  16-20 c
  21-24 -
  25    d

The cover data is returned as an array reference and its structure is

  [ [ $cover_set1, [ id11, id12, id13, ... ] ],
    [ $cover_set2, [ id21, id22, id23, ... ] ],
    ...
  ]

If a cover contains no elements, then its entry is

  [ $cover_set, [ ] ]

AUTHOR

Martin Krzywinski <martink@bcgsc.ca>

ACKNOWLEDGMENTS

  • Steve McDougall <swmcd@theworld.com> (Set::IntSpan)

  • Adam Janin (testing)

HISTORY

v0.10 3 Mar 2010

Now inherits from Set::IntSpan vis parent pragma.

Added clone() as an alias to duplicate().

On error, now confess is used instead of croak.

Expanded testing, now with Test::More.

Minor style adjustments in documentation.

v0.05 22 Sep 2008

Minor cosmetic fixes.

v0.04 17 Sep 2008

Modified excise(), distance() and fill(). Added keep().

v0.03 10 April 2007

More comprehensive extract_cover testing after bug in v0.01 was reported.

v0.02 12 Mar 2007

Added island iterator.

v0.01 5 Mar 2007

Release.

SEE ALSO

Set::IntSpan by Steven McDougall

COPYRIGHT

Copyright (c) 2007 by Martin Krzywinski. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 793:

'=item' outside of any '=over'

Around line 797:

You forgot a '=back' before '=head1'