David Burdick >
Bio-ConnectDots >
Bio::ConnectDots::SimpleGraph

Bio::ConnectDots::SimpleGraph

use SimpleGraph; my $graph=new Bio::ConnectDots::SimpleGraph; # read pairs of nodes from STDIN while (<>) { my($node1,$node2)=split; $graph->add_edge($node1,$node2); } my @nodes=graph->nodes; # get list of nodes my @edges=graph->edges; # get list of edges for each $node (@nodes) { my @neighbors=$node->neighbors; # get list of neighboring nodes }

This is a simple, hopefully fast undirected graph package. The only reason this exists is that the standard CPAN Graph pacakge, Graph::Base, is seriously broken. The package implements a small and eclectic assortment of standard graph algorithms that we happened to need for our applications.

This module is a subclass of Class::AutoClass (available at CPAN). AutoClass auotgenerates simple accessor and mutator methods (aka get and set methods). It also automates class initialization.

Nodes can be any Perl values, including object references. Edges are pairs of nodes.

(Caveat: be careful with values that contain embedded instances of $; (the character Perl uses to separate components of multi-dimensional subscripts), because we use this in the text representation of edges.

The main data structures are:

An edge (x,y) is represented canonically as a two element list in which the lexically smaller value is first. Eg, the node ('b','a') is represented as ['a','b']. The graph contains 1) A hash mapping the text representation of a node to the node itself. This is mostly relevant when the node is a reference. 2) A hash mapping the text representation of a node to a list of the node's neighbors. 3) A hash mapping the text representation of an edge to the edge itself.

This is still a work in progress.

Email natg@shore.net

Copyright (c) 2003 Institute for Systems Biology (ISB). All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

A node can be any Perl values, including an object, ARRAY, or HASH reference. When nodes are references, the software often works with the text representaion of the reference, ie, what you get if you print the reference. This can be confusing. Sorry. For example if a node is the HASH

{name=>'caspase-9',symbol=>'CASP9'}

The text representation would be something like

HASH(0x804c830)

When nodes are scalar values, eg, a string, the value and the text representation are the same. This is a common case in test programs and examples, but less common in real applications.

An edge is represented internally as an ARRAY ref of two nodes, in which the lexically smaller value is first. Actually, the first node is the one whose text representation is lexically smaller.

When passing edges as arguments to SimpleGraph methods, the edge can be represented in several ways.

1) An ARRAY ref of the nodes, eg, ['a','b']. 2) A list of the two nodes, eg, ('a','b') 3) Form (1) or (2) using the text represention of the node instead of the node itself

You needn't worry about which node is lexically smaller. SimpleGraph performs this calculation internally.

When SimpleGraph returns edges as results, they are always in form (1), ie, as ARRAY refs of nodes in correct lexical order.

When methods return lists, we generally check the context (via wantarray) and return an ARRSY or ARRAY ref as appropriate. We're not 100% consistent in this (sorry), so check the code if you have doubts.

We often define singular and plural forms of methods, eg, node and nodes. These differ in how they behave in a scalar context. The singular form assumes you want one answer and returns that, while the plural form assumes you want a list of answers are returns it as an ARRAY ref. We're not 100% consistent in this (sorry), so check the code if you have doubts.

The rest of the documentation describes the methods.

Title : new (inherited from Class::AutoClass) Usage : my $graph=new Bio::ConnectDots::SimpleGraph; Function: Create new Bio::ConnectDots::SimpleGraph object Returns : Newly created object Args : (optional) nodes=>ARRAY of nodes, eg, ['a','b','c'] edges=>ARRAY of edges, see add_edges for details

Title : add_nodes, add_node Usage : $graph->add_nodes('a','b'); $graph->add_node('a')) { Function: Add nodes to graph. Nodes that are already in graph are ignored. Args : ARRAY of nodes. Returns : Nothing useful Note : Singular and plural forms are synonymous Title : add_edges, add_edge Usage : $graph->add_edges('a','b',['b','c']); $graph->add_edge('c','d')) { Function: Add edges to graph. Edges that are already in graph are not added again, but are placed in a separate 'duplicate edges' list. Automatically adds any nodes that are not yet in the graph. Args : ARRAY of edges in any of the forms described in the previous section. The forms can be mixed as shown in the Usage here. Returns : Nothing useful Note : Singular and plural forms are synonymous Title : nodes Usage : my @nodes=$graph->nodes; if (@{$graph->nodes('a','b')}==2) { print "a, b are both nodes\n"; } Function: Return all nodes or the given ones. With no args returns all nodes. With args, returns the nodes corresponding to each arg, or undef if the arg is not a node. Useful for testing whether a given value is a node in the graph. Args : (optional) ARRAY of nodes or text representations of nodes Returns : ARRAY or ARRAY ref of nodes (for args that correspond to nodes), or undef (for args that are not nodes) Title : edges Usage : my @edges=$graph->edges; if (@{$graph->edges('a','b',['b','c'])}==2) { print "[a,b] and [b,c] are both edges\n"; } Function: Return all edges or the given ones. With no args returns all edges. With args, returns the edges corresponding to each arg, or undef if the arg is not a edge. Useful for testing whether a given value is a edge in the graph. Args : (optional) One or more edges in any of the forms described in the previous section. The forms can be mixed as shown in the Usage here. Returns : ARRAY or ARRAY ref of edges for args that correspond to edges), or undef (for args that are not edges) Title : node Usage : if ($graph->node('a')) { print "a is a node\n"; } Function: Test whether a value is a node in the graph, or map the text representation of a node to the node itself. The method can also be fed a list of values (like the 'nodes' method) and it will test all of them. Args : Usually, a single node. The function also accepts a list of nodes. Returns : In scalar context (the usual case): the node corresponding to the arg (if there's just one), or the node corresponding to the first arg (if a list of args were provided, which is kind of dumb in this case), or undef if the arg is not a node. In array context, it behaves just like 'nodes', returning an ARRAY of nodes (for args that correspond to nodes), or undef (for args that are not nodes) Title : edge Usage : if ($graph->edge('a','b')) { print "a,b is a edge\n"; } if ($graph->edge(['a','b'])) { print "[a,b] is a edge\n"; } Function: Test whether a value is a edge in the graph, or map the text representation of a edge to the edge itself. The method can also be fed a list of edges (like the 'edges' method) and it will test all of them. Args : Usually, a single edge. Same format as 'edges' The function also accepts a list of edges, exactly like 'edges' Returns : In scalar context (the usual case): the edge corresponding to the arg (if there's just one), or the or the edge corresponding to the first arg (if a list of args were provided, which is kind of dumb in this case), or undef if the arg is not a edge. In array context, it behaves just like 'edge's, returning an ARRAY of edges (for args that correspond to edges), or undef (for args that are not edges) Title : has_nodes, has_node Usage : if ($graph->has_nodes('a','b')) { print "a, b are both nodes\n"; } if ($graph->has_node('a')) { print "a is a node\n"; } Function: Return true is all args are nodes. Args : ARRAY of nodes or text representations of nodes Returns : Boolean Note : Singular and plural forms are synonymous Title : has_edges Usage : if ($graph->has_edges('a','b',['b','c'])) { print "[a,b] and [b,c] are both edges\n"; } if ($graph->has_edge('a','b')) { print "[a,b] is an edge\n"; } Function: Return true is all args are edges. Args : ARRAY of edges in the forms described in the section above Returns : Boolean Note : Singular and plural forms are synonymous Title : neighbors, neighbor Usage : my @nodes=$graph->neighbors($node) my @nodes=$graph->neighbors($node,'node') my @edges=$graph->neighbors($edge,'edge'); Function: Return the node or edge neighbors of a given node or edge. Args : (mandatory) $source: node or edge whose neighbors are sought (optional) $what: the word 'node' or 'edge' (actually, anything starting with 'n' or 'e' will do) default: 'node' Returns : ARRAY or ARRAY ref of nodes or edges Note : Singular and plural forms are synonymous. This may not be right. Title : dup_edges Usage : my @dups=$graph->dup_edges; Function: Return duplicate edges Args : None Returns : ARRAY or ARRAY ref of edges that have been added more than once.

Title : is_connected Usage : if ($graph->is_connected) { print "graph has only one connected component\n"; } Function: Return true if the graph is connected Args : None Returns : Boolean Title : is_empty Usage : if ($graph->is_empty) { print "graph has no nodes or edges\n"; } Function: Return true if the graph is empty, ie, has no nodes or edges Args : None Returns : Boolean Title : is_tree Usage : if ($graph->is_tree) { print "graph is a tree\n"; } Function: Return true if the graph is a tree, ie, it's connected and has no cycles Args : None Returns : Boolean Title : is_forest Usage : if ($graph->is_forest) { print "graph is a forest\n"; } Function: Return true if the graph is a forest, ie, it has no cycles but may not be connected Args : None Returns : Boolean Title : is_cyclic Usage : if ($graph->is_cyclic) { print "graph contains at least one cycle\n"; } Function: Return true if the graph is a cyclic. Args : None Returns : Boolean Title : density Usage : my $density=$graph->density Function: Compute graph 'density' which is the number of edges divided by the maximum possible number of edges Args : None Returns : Number

Title : subgraph Usage : my $subgraph=$graph->subgraph('a','b','c'); Function: Compute node subgraph. Constructs a new graph whose nodes are the arguments, and whose edges are the edges of the original graph that only involve the given nodes. Args : ARRAY of nodes or text representations of nodes Returns : New graph Title : neighbor_subgraph Usage : my $subgraph=$graph->subgraph('a'); Function: Construct node subgraph graph whose nodes are the given node and its neighbors. are the arguments, and whose edges are the edges of the original graph that only involve the given nodes. Args : Node or text representations of node Returns : New graph Title : union Usage : my $union=$graph->union($other_graph); Function: Construct new graph whose nodes are the union of the nodes of the current graph and $other_graph, and whose edges are the union of the edges of the current graph and $other_graph. Args : $other_graph: a graph Returns : New graph Title : intersection Usage : my $intersection=$graph->intersection($other_graph); Function: Construct new graph whose nodes are the intersection of the nodes of the current graph and $other_graph, and whose edges are the intersection of the edges of the current graph and $other_graph. Args : $other_graph: a graph Returns : New graph

Title : traversal Usage : my $traversal=$graph->traversal('a','depth first','node'); my @nodes; while (my $node=$traversal->get_next) { push(@nodes,$node); } my $traversal=$graph->traversal('a','depth first','node'); my @nodes=$traversal->get_all; Function: Do node or edge traversal in depth or breadth first order. Args : (optional) $start: starting node or edge for traversal default: software picks arbitrary start $order: 'depth first' or 'breadth first' (actually, anything starting with 'd' or 'b' will do) default: 'depth first' $what: 'node' or 'edge' (actually, anything starting with 'n' or 'e' will do) default: 'node' Returns : SimpleGraph::Traversal object This is an iterator with the following methods: get_next: get next item in traversal or undef if traversal is exhausted get_this: get current item in traversal get_all : get all remaining items in traversal as ARRAY (in array context) or ARRAY ref has_next: return true if there are more items in traversal, else undef reset : restart traversal Note : It's also possible, and perhaps easier, to perform a traversal by creating a SimpleGraph::Traversal object directly. The constructor is new Bio::ConnectDots::SimpleGraph::Traversal(-graph=>$graph,-start=>$start, -order=>$order,-what=>$what) Title : node_traversal Usage : my $traversal=$graph->node_traversal('a','depth first'); my @nodes; while (my $node=$traversal->get_next) { push(@nodes,$node); } my $traversal=$graph->node_traversal('a','depth first'); my @nodes=$traversal->get_all; Function: Do node traversal in depth or breadth first order. Wrapper for 'traversal' method. See above. Args : (optional) $start: starting node for traversal default: software picks arbitrary start $order: 'depth first' or 'breadth first' (actually, anything starting with 'd' or 'b' will do) default: 'depth first' Returns : SimpleGraph::Traversal object Title : edge_traversal Usage : my $traversal=$graph->edge_traversal(['a','b'],'depth first'); my @edges; while (my $edge=$traversal->get_next) { push(@edges,$edge); } my $traversal=$graph->edge_traversal(['a','b'],'depth first'); my @edges=$traversal->get_all; Function: Do edge traversal in depth or breadth first order. Wrapper for 'traversal' method. See above. Args : (optional) $start: starting edge for traversal default: software picks arbitrary start $order: 'depth first' or 'breadth first' (actually, anything starting with 'd' or 'b' will do) default: 'depth first' Returns : SimpleGraph::Traversal object Title : components Usage : my @components=$graph->components; for my $component (@components) { my @nodes=$component->nodes; my @edges=$component->edges; } Function: Compute the connected components of the graph. A connected component is a maximal connected subgraph. 'Connected' means you can get from any node of the component to any other by following a path. 'Maximal' means that every node you can reach from the component is in the component. Args : None Returns : ARRAY or ARRAY ref of SimpleGraphs Note : The software caches the components once computed, so it's efficient to call this repeatedly. Title : shortest_paths Usage : my $paths=$graph->shortest_paths; while(my($endpoints,$path)=each %$paths) { my($start,$stop)=split($;,$endpoints); my @nodes_on_path=@$path; print "Path from $start to $stop: @$nodes_on_path\n"; } -- OR -- my ($paths,$distances)=$graph->shortest_paths(-want_distances=>1); while(my($endpoints,$path)=each %$paths) { my $distance=$distances->{$endpoints}; my($start,$stop)=split($;,$endpoints); my @nodes_on_path=@$path; print "Path from $start to $stop, distance $distance: @$nodes_on_path\n"; } Function: Compute shortest path between each pair of nodes. Args : (optional) -max=>Maximum path length to compute Paths longer than this are not found -want_distances=>Boolean. If true, distances are also returned Returns : HASH whose keys are pairs of nodes (encoded in the standard Perl manner as two strings joined by $;) and whose values are paths. Each path is an ARRAY ref of nodes starting at the first endpoint and ending at the second. The first endpoint is always lexically smaller (lt) than the second. If -want_distances is specified, the result is a pair of HASHes, one containing paths and the other containing distances. Title : distances Usage : my $distances=$graph->distances; while(my($endpoints,$distance)=each %$distances) { my($start,$stop)=split($;,$endpoints); print "Path from $start to $stop has distance $distance\n"; } -- OR -- my ($paths,$distances)=$graph->distances(-want_paths=>1); while(my($endpoints,$distance)=each %$distances) { my $path=$paths->{$endpoints}; my($start,$stop)=split($;,$endpoints); my @nodes_on_path=@$path; print "Path from $start to $stop, distance $distance: @$nodes_on_path\n"; } Function: Compute distance between each pair of nodes. This is the length of the shortest path Args : (optional) -max=>Maximum path length to compute Distances longer than this are not found -want_paths=>Boolean. If true, paths are also returned Returns : HASH whose keys are pairs of nodes (encoded in the standard Perl manner as two strings joined by $;) and whose values are distances. The first endpoint is always lexically smaller (lt) than the second. If -want_paths is specified, the result is a pair of HASHes, one containing paths and the other containing distances. Title : connected_nodesets Usage : my @nodesets=$graph->connected_nodesets; for my $nodeset (@nodesets) { my @nodes=@$nodeset; } Function: Compute all sets of nodes that form connected subgraphs. A connected nodeset is a set of nodes such that it's possible to get from any node to any other by following a path that only includes nodes in the set. Args : None Returns : ARRAY or ARRAY ref of nodeset, where each nodeset is an ARRAY ref of nodes. Note : Use with caution. The number of nodesets is very large for graphs that are highly connected. Title : connected_subgraphs Usage : my @subgraphs=$graph->connected_subgraphs; Function: Compute all connected subgraphs of the current graph. Args : None Returns : ARRAY or ARRAY ref of subgraphs Note : Use with caution. The number of connected subgraphs is very large for graphs that are highly connected.

syntax highlighting: