The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::ConnectDots::SimpleGraph

SYNOPSIS

  use SimpleGraph;

  my $graph=new Bio::ConnectDots::SimpleGraph;
  # read pairs of nodes from STDIN
  while (<>) {
    my($node1,$node2)=split;
    $graph->add_edge($node1,$node2);
  }
  my @nodes=graph->nodes;           # get list of nodes
  my @edges=graph->edges;           # get list of edges
  for each $node (@nodes) {
    my @neighbors=$node->neighbors; # get list of neighboring nodes
  }

DESCRIPTION

This is a simple, hopefully fast undirected graph package. The only reason this exists is that the standard CPAN Graph pacakge, Graph::Base, is seriously broken. The package implements a small and eclectic assortment of standard graph algorithms that we happened to need for our applications.

This module is a subclass of Class::AutoClass (available at CPAN). AutoClass auotgenerates simple accessor and mutator methods (aka get and set methods). It also automates class initialization.

Nodes can be any Perl values, including object references. Edges are pairs of nodes.

(Caveat: be careful with values that contain embedded instances of $; (the character Perl uses to separate components of multi-dimensional subscripts), because we use this in the text representation of edges.

The main data structures are:

  An edge (x,y) is represented canonically as a two element list in
  which the lexically smaller value is first.  Eg, the node ('b','a')
  is represented as ['a','b'].  

  The graph contains 

  1) A hash mapping the text representation of a node to the node
  itself.  This is mostly relevant when the node is a reference.

  2) A hash mapping the text representation of a node to a list of the node's neighbors.

  3) A hash mapping the text representation of an edge to the edge itself.

KNOWN BUGS AND CAVEATS

This is still a work in progress.

AUTHOR - Nat Goodman

Email natg@shore.net

COPYRIGHT

Copyright (c) 2003 Institute for Systems Biology (ISB). All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

APPENDIX

Conventions for nodes and edges

A node can be any Perl values, including an object, ARRAY, or HASH reference. When nodes are references, the software often works with the text representaion of the reference, ie, what you get if you print the reference. This can be confusing. Sorry. For example if a node is the HASH

  {name=>'caspase-9',symbol=>'CASP9'}

The text representation would be something like

  HASH(0x804c830)

When nodes are scalar values, eg, a string, the value and the text representation are the same. This is a common case in test programs and examples, but less common in real applications.

An edge is represented internally as an ARRAY ref of two nodes, in which the lexically smaller value is first. Actually, the first node is the one whose text representation is lexically smaller.

When passing edges as arguments to SimpleGraph methods, the edge can be represented in several ways.

  1) An ARRAY ref of the nodes, eg, ['a','b'].

  2) A list of the two nodes, eg, ('a','b')

  3) Form (1) or (2) using the text represention of the node instead of the node itself

You needn't worry about which node is lexically smaller. SimpleGraph performs this calculation internally.

When SimpleGraph returns edges as results, they are always in form (1), ie, as ARRAY refs of nodes in correct lexical order.

General conventions for methods

When methods return lists, we generally check the context (via wantarray) and return an ARRSY or ARRAY ref as appropriate. We're not 100% consistent in this (sorry), so check the code if you have doubts.

We often define singular and plural forms of methods, eg, node and nodes. These differ in how they behave in a scalar context. The singular form assumes you want one answer and returns that, while the plural form assumes you want a list of answers are returns it as an ARRAY ref. We're not 100% consistent in this (sorry), so check the code if you have doubts.

The rest of the documentation describes the methods.

Constructors

 Title   : new (inherited from Class::AutoClass)
 Usage   : my $graph=new Bio::ConnectDots::SimpleGraph;
 Function: Create new Bio::ConnectDots::SimpleGraph object
 Returns : Newly created object
 Args    : (optional)
           nodes=>ARRAY of nodes, eg, ['a','b','c']
           edges=>ARRAY of edges, see add_edges for details

Basic node and edge operations

 Title   : add_nodes, add_node
 Usage   : $graph->add_nodes('a','b');
           $graph->add_node('a')) {
 Function: Add nodes to graph. Nodes that are already in graph
           are ignored.
 Args    : ARRAY of nodes.
 Returns : Nothing useful
 Note    : Singular and plural forms are synonymous

 Title   : add_edges, add_edge
 Usage   : $graph->add_edges('a','b',['b','c']);
           $graph->add_edge('c','d')) {
 Function: Add edges to graph. 
           Edges that are already in graph are not added again, but
           are placed in a separate 'duplicate edges' list.
           Automatically adds any nodes that are not yet in the graph.
 Args    : ARRAY of edges in any of the forms described in the
           previous section.  The forms can be mixed as shown in
           the Usage here.
 Returns : Nothing useful
 Note    : Singular and plural forms are synonymous

 Title   : nodes
 Usage   : my @nodes=$graph->nodes;
           if (@{$graph->nodes('a','b')}==2) {
             print "a, b are both nodes\n";
           }
 Function: Return all nodes or the given ones.  
           With no args returns all nodes.  
           With args, returns the nodes corresponding to each arg, or
           undef if the arg is not a node.  Useful for testing whether
           a given value is a node in the graph.
 Args    : (optional)
           ARRAY of nodes or text representations of nodes
 Returns : ARRAY or ARRAY ref of nodes (for args that correspond to
           nodes), or undef (for args that are not nodes)

 Title   : edges
 Usage   : my @edges=$graph->edges;
           if (@{$graph->edges('a','b',['b','c'])}==2) {
             print "[a,b] and [b,c] are both edges\n";
           }
 Function: Return all edges or the given ones.  
           With no args returns all edges.  
           With args, returns the edges corresponding to each arg, or
           undef if the arg is not a edge.  Useful for testing whether
           a given value is a edge in the graph.
 Args    : (optional)
           One or more edges in any of the forms described in the
           previous section.  The forms can be mixed as shown in
           the Usage here.
 Returns : ARRAY or ARRAY ref of edges for args that correspond to
           edges), or undef (for args that are not edges)

 Title   : node
 Usage   : if ($graph->node('a')) {
             print "a is a node\n";
           }
 Function: Test whether a value is a node in the graph, or map the
           text representation of a node to the node itself.  The
           method can also be fed a list of values (like the 'nodes'
           method) and it will test all of them.
 Args    : Usually, a single node.
           The function also accepts a list of nodes.
 Returns : In scalar context (the usual case): the node corresponding
           to the arg (if there's just one), or the node corresponding
           to the first arg (if a list of args were provided, which is
           kind of dumb in this case), or undef if the arg is not a
           node.

           In array context, it behaves just like 'nodes', returning
           an ARRAY of nodes (for args that correspond to nodes), or
           undef (for args that are not nodes)

 Title   : edge
 Usage   : if ($graph->edge('a','b')) {
             print "a,b is a edge\n";
           }
           if ($graph->edge(['a','b'])) {
             print "[a,b] is a edge\n";
           }
 Function: Test whether a value is a edge in the graph, or map the
           text representation of a edge to the edge itself.  The
           method can also be fed a list of edges (like the 'edges'
           method) and it will test all of them.
 Args    : Usually, a single edge.  Same format as 'edges'
           The function also accepts a list of edges, exactly like 
           'edges'
 Returns : In scalar context (the usual case): the edge corresponding
           to the arg (if there's just one), or the or the edge
           corresponding to the first arg (if a list of args were
           provided, which is kind of dumb in this case), or undef if
           the arg is not a edge.

           In array context, it behaves just like 'edge's, returning
           an ARRAY of edges (for args that correspond to edges), or
           undef (for args that are not edges)

 Title   : has_nodes, has_node
 Usage   : if ($graph->has_nodes('a','b')) {
             print "a, b are both nodes\n";
           }
           if ($graph->has_node('a')) {
             print "a is a node\n";
           }
 Function: Return true is all args are nodes.
 Args    : ARRAY of nodes or text representations of nodes
 Returns : Boolean
 Note    : Singular and plural forms are synonymous

 Title   : has_edges
 Usage   : if ($graph->has_edges('a','b',['b','c'])) {
             print "[a,b] and [b,c] are both edges\n";
           }
           if ($graph->has_edge('a','b')) {
             print "[a,b] is an edge\n";
           }
 Function: Return true is all args are edges.
 Args    : ARRAY of edges in the forms described in the section above
 Returns : Boolean
 Note    : Singular and plural forms are synonymous

 Title   : neighbors, neighbor
 Usage   : my @nodes=$graph->neighbors($node)
           my @nodes=$graph->neighbors($node,'node')
           my @edges=$graph->neighbors($edge,'edge');
 Function: Return the node or edge neighbors of a given node or edge.
 Args    : (mandatory)
           $source: node or edge whose neighbors are sought
           (optional)
           $what: the word 'node' or 'edge' (actually, anything starting
                  with 'n' or 'e' will do)
                  default: 'node'
 Returns : ARRAY or ARRAY ref of nodes or edges
 Note    : Singular and plural forms are synonymous. This may not be
           right.

 Title   : dup_edges
 Usage   : my @dups=$graph->dup_edges;
 Function: Return duplicate edges
 Args    : None
 Returns : ARRAY or ARRAY ref of edges that have been added more than
           once.

Graph properties

 Title   : is_connected
 Usage   : if ($graph->is_connected) {
             print "graph has only one connected component\n";
           }
 Function: Return true if the graph is connected
 Args    : None
 Returns : Boolean

 Title   : is_empty
 Usage   : if ($graph->is_empty) {
             print "graph has no nodes or edges\n";
           }
 Function: Return true if the graph is empty, ie, has no nodes or edges
 Args    : None
 Returns : Boolean

 Title   : is_tree
 Usage   : if ($graph->is_tree) {
             print "graph is a tree\n";
           }
 Function: Return true if the graph is a tree, ie, it's connected and
           has no cycles
 Args    : None
 Returns : Boolean

 Title   : is_forest
 Usage   : if ($graph->is_forest) {
             print "graph is a forest\n";
           }
 Function: Return true if the graph is a forest, ie, it has no cycles
           but may not be connected
 Args    : None
 Returns : Boolean

 Title   : is_cyclic
 Usage   : if ($graph->is_cyclic) {
             print "graph contains at least one cycle\n";
           }
 Function: Return true if the graph is a cyclic.
 Args    : None
 Returns : Boolean

 Title   : density
 Usage   : my $density=$graph->density
 Function: Compute graph 'density' which is the number of edges
           divided by the maximum possible number of edges
 Args    : None
 Returns : Number

Graph operations

 Title   : subgraph
 Usage   : my $subgraph=$graph->subgraph('a','b','c');
 Function: Compute node subgraph. Constructs a new graph whose nodes
           are the arguments, and whose edges are the edges of the
           original graph that only involve the given nodes.
 Args    : ARRAY of nodes or text representations of nodes
 Returns : New graph

 Title   : neighbor_subgraph
 Usage   : my $subgraph=$graph->subgraph('a');
 Function: Construct node subgraph graph whose nodes are the given
           node and its neighbors.  are the arguments, and whose edges
           are the edges of the original graph that only involve the
           given nodes.
 Args    : Node or text representations of node
 Returns : New graph

 Title   : union
 Usage   : my $union=$graph->union($other_graph);
 Function: Construct new graph whose nodes are the union of the nodes
           of the current graph and $other_graph, and whose edges are
           the union of the edges of the current graph and
           $other_graph.
 Args    : $other_graph: a graph
 Returns : New graph

 Title   : intersection
 Usage   : my $intersection=$graph->intersection($other_graph);
 Function: Construct new graph whose nodes are the intersection of the
           nodes of the current graph and $other_graph, and whose
           edges are the intersection of the edges of the current
           graph and $other_graph.
 Args    : $other_graph: a graph
 Returns : New graph

Graph algorithms

 Title   : traversal
 Usage   : my $traversal=$graph->traversal('a','depth first','node');
           my @nodes;
           while (my $node=$traversal->get_next) {
             push(@nodes,$node);
           }
           my $traversal=$graph->traversal('a','depth first','node');
           my @nodes=$traversal->get_all;
 Function: Do node or edge traversal in depth or breadth first order.
 Args    : (optional)
           $start: starting node or edge for traversal
                   default: software picks arbitrary start
           $order: 'depth first' or 'breadth first' (actually,
                   anything starting with 'd' or 'b' will do)
                   default: 'depth first'
           $what: 'node' or 'edge' (actually, anything starting
                  with 'n' or 'e' will do)
                  default: 'node'
 Returns : SimpleGraph::Traversal object
           This is an iterator with the following methods:

           get_next: get next item in traversal or undef if 
                     traversal is exhausted
           get_this: get current item in traversal
           get_all : get all remaining items in traversal as
                     ARRAY (in array context) or ARRAY ref
           has_next: return true if there are more items in
                     traversal, else undef
           reset   : restart traversal

 Note    : It's also possible, and perhaps easier, to perform a
           traversal by creating a SimpleGraph::Traversal object
           directly.  The constructor is

           new Bio::ConnectDots::SimpleGraph::Traversal(-graph=>$graph,-start=>$start,
                                      -order=>$order,-what=>$what)

 Title   : node_traversal
 Usage   : my $traversal=$graph->node_traversal('a','depth first');
           my @nodes;
           while (my $node=$traversal->get_next) {
             push(@nodes,$node);
           }
           my $traversal=$graph->node_traversal('a','depth first');
           my @nodes=$traversal->get_all;
 Function: Do node traversal in depth or breadth first order.
           Wrapper for 'traversal' method. See above.
 Args    : (optional)
           $start: starting node for traversal
                   default: software picks arbitrary start
           $order: 'depth first' or 'breadth first' (actually,
                   anything starting with 'd' or 'b' will do)
                   default: 'depth first'
 Returns : SimpleGraph::Traversal object
 
 Title   : edge_traversal
 Usage   : my $traversal=$graph->edge_traversal(['a','b'],'depth first');
           my @edges;
           while (my $edge=$traversal->get_next) {
             push(@edges,$edge);
           }
           my $traversal=$graph->edge_traversal(['a','b'],'depth first');
           my @edges=$traversal->get_all;
 Function: Do edge traversal in depth or breadth first order.
           Wrapper for 'traversal' method. See above.
 Args    : (optional)
           $start: starting edge for traversal
                   default: software picks arbitrary start
           $order: 'depth first' or 'breadth first' (actually,
                   anything starting with 'd' or 'b' will do)
                   default: 'depth first'
 Returns : SimpleGraph::Traversal object

 Title   : components
 Usage   : my @components=$graph->components;
           for my $component (@components) {
             my @nodes=$component->nodes;
             my @edges=$component->edges;
           }
 Function: Compute the connected components of the graph.  A connected
           component is a maximal connected subgraph.  'Connected'
           means you can get from any node of the component to any
           other by following a path.  'Maximal' means that every node
           you can reach from the component is in the component.
 Args    : None
 Returns : ARRAY or ARRAY ref of SimpleGraphs
 Note    : The software caches the components once computed, so it's efficient
           to call this repeatedly.

 Title   : shortest_paths
 Usage   : my $paths=$graph->shortest_paths;
           while(my($endpoints,$path)=each %$paths) {
             my($start,$stop)=split($;,$endpoints);
             my @nodes_on_path=@$path;
             print "Path from $start to $stop: @$nodes_on_path\n";
           }
           -- OR --
           my ($paths,$distances)=$graph->shortest_paths(-want_distances=>1);
           while(my($endpoints,$path)=each %$paths) {
             my $distance=$distances->{$endpoints};
             my($start,$stop)=split($;,$endpoints);
             my @nodes_on_path=@$path;
             print "Path from $start to $stop, distance $distance: @$nodes_on_path\n";
           }
 Function: Compute shortest path between each pair of nodes.
 Args    : (optional)
           -max=>Maximum path length to compute
                 Paths longer than this are not found
           -want_distances=>Boolean. If true, distances are also returned
 Returns : HASH whose keys are pairs of nodes (encoded in the standard
           Perl manner as two strings joined by $;) and whose values
           are paths. Each path is an ARRAY ref of nodes starting at
           the first endpoint and ending at the second.

           The first endpoint is always lexically smaller (lt) than
           the second.

           If -want_distances is specified, the result is a pair of
           HASHes, one containing paths and the other containing
           distances.

 Title   : distances
 Usage   : my $distances=$graph->distances;
           while(my($endpoints,$distance)=each %$distances) {
             my($start,$stop)=split($;,$endpoints);
             print "Path from $start to $stop has distance $distance\n";
           }
           -- OR --
           my ($paths,$distances)=$graph->distances(-want_paths=>1);
           while(my($endpoints,$distance)=each %$distances) {
             my $path=$paths->{$endpoints};
             my($start,$stop)=split($;,$endpoints);
             my @nodes_on_path=@$path;
             print "Path from $start to $stop, distance $distance: @$nodes_on_path\n";
           }
 Function: Compute distance between each pair of nodes. This is the
           length of the shortest path
 Args    : (optional)
           -max=>Maximum path length to compute
                 Distances longer than this are not found
           -want_paths=>Boolean. If true, paths are also returned
 Returns : HASH whose keys are pairs of nodes (encoded in the standard
           Perl manner as two strings joined by $;) and whose values
           are distances.

           The first endpoint is always lexically smaller (lt) than
           the second.

           If -want_paths is specified, the result is a pair of
           HASHes, one containing paths and the other containing
           distances.

 Title   : connected_nodesets
 Usage   : my @nodesets=$graph->connected_nodesets;
           for my $nodeset (@nodesets) {
             my @nodes=@$nodeset;
           }
 Function: Compute all sets of nodes that form connected subgraphs. 
           A connected nodeset is a set of nodes such that it's
           possible to get from any node to any other by following a
           path that only includes nodes in the set. 
 Args    : None
 Returns : ARRAY or ARRAY ref of nodeset, where each nodeset is an ARRAY
           ref of nodes.  
 Note    : Use with caution.  The number of nodesets is very
           large for graphs that are highly connected.

 Title   : connected_subgraphs
 Usage   : my @subgraphs=$graph->connected_subgraphs;
 Function: Compute all connected subgraphs of the current graph.
 Args    : None
 Returns : ARRAY or ARRAY ref of subgraphs
 Note    : Use with caution.  The number of connected subgraphs is
           very large for graphs that are highly connected.