
Bio::Network::ProteinNet - a representation of a protein interaction graph.

# Read in from file
my $graphio = Bio::Network::IO->new(-file => 'human.xml',
-format => 'psi');
my $graph = $graphio->next_network();
my @edges = $gr->edges;
for my $edge (@edges) {
for my $node ($edge->[0],$edge->[1]) {
my @proteins = $node->proteins;
for my $protein (@proteins) {
print $protein->display_id," ";
}
}
print "\n";
}

The bioperl-network package uses Perl's Graph module and it's essential that version .80 or greater be installed.
A Node object represents either a protein or a protein complex. Nodes can be retrieved through their identifiers:
# Get a node (represented by a sequence object) from the graph.
my $node = $graph->get_nodes_by_id('UniProt:P12345');
# A node that's a protein can be treated just like a Sequence object
print $node->seq;
# Remove a node by specifying its identifier
$graph->remove_nodes($graph->get_nodes_by_id('UniProt:P12345'));
# How many nodes are there?
my $ncount = $graph->nodes();
# Get interactors of your favourite protein
my $node = $graph->get_nodes_by_id('RefSeq:NP_023232');
my @neighbors = $graph->neighbors($node);
print " NP_023232 interacts with ";
print join " ,", map{$_->primary_id()} @neighbors;
print "\n";
# Annotate your sequences with interaction info
my @seq_objects = ($seq1, $seq2, $seq3);
for my $seq (@seq_objects) {
if ( $graph->get_nodes_by_id($seq->accession_number) ) {
my $node = $graph->get_nodes_by_id( $seq->accession_number);
my @neighbors = $graph->neighbors($node);
for my $n (@neighbors) {
my $ft = Bio::SeqFeature::Generic->new(
-primary_tag => 'Interactor',
-tag => { id => $n->accession_number }
);
$seq->add_SeqFeature($ft);
}
}
}
# Get proteins with > 10 interactors
my @nodes = $graph->nodes();
my @hubs;
for my $node (@nodes) {
if ($graph->neighbors($node) > 10) {
push @hubs, $node;
}
}
print "the following proteins have > 10 interactors:\n";
print join "\n", map {$_->primary_id()} @hubs;
# Get clustering coefficient of a given node.
my $id = "RefSeq:NP_023232";
my $cc = $graph->clustering_coefficient($graph->get_nodes_by_id($id));
if ($cc != -1) { ## result is -1 if cannot be calculated
print "CC for $id is $cc";
}
# How many edges are there? my $ecount = $graph->edges; # Get all the paired nodes, or edges, in the graph as an array my @edges = $graph->edges
# How many interactions are there?
my $icount = $graph->interactions;
# Retrieve all interactions
my @interx = $graph->interactions;
# Let's get interactions above a threshold confidence score.
for my $interx (@interx) {
if ($interx->weight > 0.6) {
print $interx->primary_id, "\t", $interx->weight, "\n";
}
}
# Get graph density my $density = $graph->density(); # Get connected sub-graphs my @graphs = $graph->connected_components(); # Copy interactions from one graph to another $graph1->add_interactions_from($graph2);
If you have interaction data in your own format, e.g.
<interaction id> <protein id 1> <protein id 2> <score>
A simple approach would look something like this:
my $io = Bio::Root::IO->new(-file => 'mydata');
my $graph = Bio::Network::ProteinNet->new(refvertexed => 1);
while (my $l = $io->_readline() ) {
my ($id, $nid1, $nid2, $sc) = split /\s+/, $l;
my $prot1 = Bio::Seq->new(-accession_number => $nid1);
my $prot2 = Bio::Seq->new(-accession_number => $nid2);
# create new Interaction object based on an id and weight
my $interaction = Bio::Network::Interaction->new(-id => $id,
-weight => $sc );
$graph->add_interaction(-nodes => [($prot1,$prot2)]),
-interaction => $interaction );
}

A ProteinNet is a representation of a protein interaction network. Its functionality comes from the Graph of Perl and from BioPerl, the nodes or vertices in the network are Sequence objects.
A node is one or more BioPerl sequence object, a Bio::Seq or Bio::Seq::RichSeq object. Essentially the graph can use any objects that implement Bio::AnnotatableI and Bio::IdentifiableI interfaces since these objects hold useful identifiers. This is relevant since the identity of nodes is determined by their identifiers.
Since bioperl-network is built on top of the Graph and Graph::Undirected modules of Perl it uses its formal model as well. An Edge corresponds to a pair of nodes, and there is only one Edge per pair. An Interaction is an attribute of an Edge, and there can be 1 or more Interactions per Edge. So
$ecount = $network->edges
Tells you how many paired nodes there are and
$icount = $network->interactions
Tells you how many node-node interactions there are. An Interaction is equivalent to one experiment or one experimental observation.

In this module, the nodes or vertexes are represented by Bio::Seq objects containing all possible database identifiers but no sequence, as parsed from the interaction files.
Interactions should be Bio::Network::Interaction objects, which are Bio::IdentifiableI implementing objects. At present Interactions only have an identifier and a weight() method, to hold confidence data.
A ProteinNet object has the following internal data, aside from the data structures of Graph itself:
Look-up hash ('_id_map') for finding a node by any of its ids. The keys are standard identifiers (e.g. "GenBank:A12345") and the values are memory addresses used by Graph (e.g. "Bio::Network::Node=HASH(0x1bc53e4)").
Look-up hash for Interactions ('_interx_id_map'),used for retrieving an Interaction object using an identifier. The keys are primary ids of the Interaction (e.g. "DIP:2341E") and the values are addresses of Interactions (e.g. "Bio::Network::Interaction=HASH(0x1bc46f2)").
The function of these hashes is either to facilitate fast lookups or cache data temporarily.

These modules were first released as part of the core BioPerl package and were called Bio::Graph. Bio::Graph was copied to a separate package, bioperl-network, and renamed Bio::Network. All of the modules were revised and a new module, Interaction.pm, was added. The functionality of the PSI MI parser, IO/psi.pm, was significantly enhanced.
Graph manipulation in Bio::Graph was based on the Bio::Graph::SimpleGraph module by Nat Goodman. The first release as a separate package, bioperl-network, replaced SimpleGraph with the Perl Graph package. Other API changes were also made, partly to keep nomenclature consistent with BioPerl, partly to use the terms used by the interaction databases, and partly to accomodate the differences between Graph and Bio::Graph::SimpleGraph.
The advantages to using Graph are that Bioperl developers are not responsible for maintaining the code that actually handles graph manipulation and there is more functionality in Graph than in SimpleGraph.
The disadvantage is that we now rely on others to keep the package bug-free, and there are some bugs in Graph. You should use version .80 or greater but even this version is not free of bugs (a list of known bugs can be found in the BUGS file in this package).
Bio::Graph::Edge has been replaced by Bio::Network::Interaction and Bio::Network::Edge
This method has been replaced by next_network().
The union() method has been removed since it was not performing a true union. It has been replaced by add_interaction_from
remove_nodes() is now an alias to Graph::delete_vertices
_get_ids_by_db() has been renamed get_ids_by_node
add_node() is now an alias to Graph::add_vertex
components() is now an alias to Graph::connected_components
edge_count() is now an alias to Graph::edges
node_count() is now an alias to Graph::vertices
nodes_by_id() is now an alias to get_nodes_by_id
This method has been removed since edges no longer have identifiers, Interactions do. Use get_interaction_by_id
unconnected_nodes() is now an alias to Graph::isolated_vertices
object_id() is now an alias to Interaction::primary_id()

To use this module you need Graph.pm, version .80 or greater. To read XML data (e.g. PSI XML) you will need XML::Twig.

Bio::Network::IO Bio::Network::Edge Bio::Network::Node Bio::Network::Interaction Bio::Network::IO::dip Bio::Network::IO::psi

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists
Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web:
http://bugzilla.open-bio.org/

Richard Adams richard.adams@ed.ac.uk Brian Osborne bosborne at alum.mit.edu
Maintained by Brian Osborne
The first version of this package was based on the Bio::Graph::SimpleGraph module written by Nat Goodman.
Name : get_interaction_by_id Purpose : Get an interaction using an id Usage : $interx = $g->get_interaction_by_id($id) Returns : One or more Interactions Arguments : One or more Interaction identifiers, the primary id
Name : get_nodes_by_id Purpose : Get node using an id Usage : $node = $g->get_nodes_by_id($id) Returns : One node Arguments : One or more protein identifiers
Name : get_interactions
Purpose : Get 1 or more Interaction objects given a pair of nodes
Usage : @interx = $g->get_interactions($n1,$n2)
Returns : A hash of Interaction objects where the key is the primary
id of the Interaction and the value is the Interaction
Arguments : 2 nodes
Notes :
Name : add_id_to_interaction
Purpose : Store identifiers in an internal hash that is used to look
up interactions by id - this does not add ids to Interaction
objects.
Usage : $g->add_id_to_interaction($id,$interaction)
Arguments : Identifier and Interaction object.
Returns :
Notes : The identifier should be concatenated
with a database or namespace name in order to make
accurate comparisons when you are merging data from different
formats. Examples: DIP:3455E.
Use _get_standard_name() to find a standardized name.
Name : add_id_to_node
Purpose : Store identifiers in an internal hash that is used to look
up nodes by id - this does not add ids to Node objects
or their associated Annotation objects.
Usage : $g->add_id_to_node($id,$node) or
$g->add_id_to_node(\@ids,$node)
Arguments : Identifier (or reference to an array of identifiers), node.
Returns :
Notes : The identifier should be concatenated
with a database or namespace name in order to make
accurate comparisons when you are merging data from different
formats. Examples: DIP:3455N, UniProt:Q45772, GenBank:7733911.
Use _get_standard_name() to find a standardized name.
Name : add_interactions_from
Purpose : To copy interactions from one graph to another
Usage : $graph1->add_interactions_from($graph2)
Returns : void
Arguments : A Graph object of the same class as the calling object.
Description : This method copies interactions from the graph passed as the
argument to the calling graph. To take account of
differing IDs identifying the same protein, all ids are
compared. The following rules are used:
1. If a pair of nodes exist in both graphs then:
a. No Interactions with the same primary id will be copied
from $graph2 to $graph1.
b. All other Interactions from $graph2 will be copied
to $graph1, even if these nodes do not interact in $graph1.
2. Nodes are never copied from $graph2 to $graph1. This is rather
conservative but prevents the problem of having duplicated,
identical nodes in $graph1 due to the same protein being identified
by different ids in the 2 graphs.
So, for example
Interaction N1 N2 Comment
Graph 1: E1 P1 P2
E2 P3 P4
E3 P1 P4
Graph 2: E1 P1 P2 E1 will not be copied to Graph1
X2 P1 P3 X2 will be copied to Graph 1
X3 P1 P4 X3 will be copied to Graph 1
X4 Z4 Z5 Nothing copied to Graph1
There are measures one could take to allow copying nodes from $graph2
to $graph1, currently unimplemented:
1. Use sequence, if available, and some threshold measure of similarity,
or length, to prove that proteins are not identical and can be copied.
2. Use species information. For example, if $graph1 is entirely composed
of human proteins then any non-human proteins could be copied to
$graph1 without risk (and cross-species interactions are fairly common
due the nature of interaction experiments).
3. Use namespace or dataspace when assessing identity. For example, assume
that all nodes in $graph1 are identified by Swissprot ids. Assume a
protein in $graph2 is also identified by a Swissprot id, not found in
$graph1. This could be reasonable grounds for allowing the protein in
$graph2 to be copied to $graph1.
4. Some combination of the above.
Name : subgraph
Purpose : Construct a subgraph of nodes from another network, including
all Interactions.
Usage : my $subgraph = $graph->subgraph(@nodes).
Returns : A subgraph composed of nodes, edges, and Interactions from the
original graph.
Arguments : A list of nodes.
Name : get_ids_by_node Purpose : Gets all ids for a node Arguments: A Bio::SeqI object Returns : A hash: Keys are db ids, values are identifiers Usage : my %ids = $gr->get_ids_by_node($seqobj);
Name : add_interaction
Purpose : Adds an Interaction to a graph.
Usage : $gr->add_interaction(-interaction => $interx
-nodes => \@nodes );
Arguments : An Interaction object and a reference to an array holding
a pair of nodes
Returns :
Description : This is the method to use to add an interaction to a graph.
Name : add_edge Purpose : Usage : $gr->add_edge(@nodes) Arguments : A pair of nodes Returns : Description :
Name : add_vertex Purpose : Adds a node to a graph. Usage : $gr->add_vertex($n) Arguments : A Bio::Network::Node object Returns : Description :
Name : add_node Purpose : Alias to add_vertex Usage : $gr->add_node($node) Arguments : A Bio::Network::Node object Returns : Description :
Name : clustering_coefficient
Purpose : Determines the clustering coefficient of a node, a number
in range 0-1 indicating the extent to which the neighbors of
a node are interconnnected.
Arguments : A Node or a text identifier
Returns : The clustering coefficient. 0 is a valid result.
If the CC is not calculable ( if the node has <2 neighbors),
returns -1.
Usage : my $node = $gr->get_nodes_by_id('P12345');
my $cc = $gr->clustering_coefficient($node);
Name : remove_nodes Purpose : Alias to Graph::delete_vertices Usage : $graph2 = $graph1->remove_nodes($node); Arguments : A single Node object or a list of Node objects Returns : A Graph with the given nodes deleted Notes :
Name : is_forest
Purpose : Determine if a graph is a forest (2 or more trees)
Usage : if ($gr->is_forest){ ..... }
Arguments : none
Returns : 1 or ""
Name : is_tree
Purpose : Determine if the graph is a tree
Usage : if ($gr->is_tree){ ..... }
Arguments : None
Returns : 1 or ""
Name : is_empty
Purpose : Determine if graph has no nodes
Usage : if ($gr->is_empty){ ..... }
Arguments : None
Returns : 1 or ""
Name : articulation_points
Purpose : Find nodes in a graph that if removed will fragment
the graph into sub-graphs.
Usage : my @nodes = $gr->articulation_points
or
my $count = $gr->articulation_points
Arguments : None
Returns : An array or a count of the array of nodes that will fragment
the graph if deleted.
Notes : This method is currently broken due to bugs in Graph v. .69
Name : is_articulation_point
Purpose : Determine if a given node is an articulation point or not.
Usage : if ($gr->is_articulation_point($node)) {....}
Arguments : A node (Sequence object)
Returns : 1 if node is an articulation point, 0 if it is not
Notes : This method is currently broken due to bugs in Graph v. .69
Name : nodes Purpose : Alias to Graph::vertices() Arguments: Returns : An integer Usage : my $count = $graph->nodes;
Name : has_node
Purpose : Alias to Graph::has_vertex
Arguments:
Returns : True if the node exists
Usage : if ( $graph->has_node($node) ){ ... }
Name : nodes_by_id Purpose : Alias to get_nodes_by_id Notes : Deprecated
Name : edge_count Purpose : Alias to edges() Notes : Deprecated, use edges()
Name : interactions
Purpose : Count the total number of Interactions in the network (an Edge can
have one or more Interactions) or retrieve all the Interactions in
the network as an array
Usage : my $count = $gr->interactions or
my @interx = $gr->interactions
Arguments:
Returns : A number or an array of Interactions
Notes :
Name : neighbor_count Purpose : Alias to Graph::neighbors Usage : my $count = $gr->neighbor_count($node) Arguments : A node Returns : An integer Notes : Deprecated
Name : node_count Purpose : Alias to Graph::vertices() Notes : Deprecated, use nodes()
Name : components Purpose : Alias to Graph::connected_components Usage : my @components = $gr->components Arguments : Returns : Notes : Deprecated
Name : unconnected_nodes Purpose : Alias to Graph::isolated_vertices Arguments : None Returns : An array of unconnected nodes Notes : Deprecated
Name : _all_pairs
Purpose : Find unique set of all pairwise combinations
Usage : my @pairs = $self->_all_pairs(@arr)
Arguments : An array
Returns : An array of array references, each array in the 2nd dimension
is a 2-element array
Name : _ids Purpose : Usage : Arguments : Returns :
Name : next_interaction
Purpose : Retrieve Interactions using an edge
Usage : while (my $interx = $edge->next_interaction){ ... }
Returns : Interactions, one by one.
Arguments :
Name : next_edge
Purpose : Retrieve all edges
Usage : while (my $edge = $graph->next_edge){ ... }
Returns : Edges, one by one.
Arguments :
Name : next_node
Purpose : Retrieve all nodes
Usage : while (my $node = $graph->next_node){ ... }
Returns : Nodes, one by one.
Arguments :