Bio-Homology-InterologWalk version 0.01
=======================================

This document refers to version 0.01 of Bio::Homology::InterologWalk.
This version was released August 31st, 2010.

INSTALLATION-------------------------------------------------------------------------

To install this module on your system, place the tarball archive file in a 
temporary directory and call the following:

% gunzip Bio-Homology-InterologWalk-0.01.tar.gz
% tar xf Bio-Homology-InterologWalk-0.01.tar
% cd Bio-Homology-InterologWalk-0.01
% perl Makefile.PL
% make
% make test
% make install

DEPENDENCIES-------------------------------------------------------------------------

This module requires the following modules and libraries:

===============
1.  Ensembl API
===============
    
The Ensembl project is currently branched in two sub-projects:

    The Ensembl Vertebrates project
        This is of interest to you if you work with vertebrate genomes
        (although it also includes data from a few non-vertebrate common
        model organisms). See http://www.ensembl.org/index.html for further
        details.

    The Ensembl Genomes project
        This utilises the Ensembl software infrastructure (originally
        developed in the Ensembl Core project) to provide access to
        genome-scale data from non-vertebrate species. This is of interest
        to you if your species is a non-vertebrate, or if your species is a
        vertebrate but you *also want to obtain results mapped from
        non-vertebrates*. "Bio::Homology::InterologWalk" currently only
        supports the metazoa sub-site from the Ensembl Genomes Project. See
        http://metazoa.ensembl.org/index.html for further details.

    IMPORTANT You will need to decide which Ensembl-DB set you will need
    prior to installing "Bio::Homology::InterologWalk". The module requests
    that

    Ensembl API Version == Ensembl-DB set version.

    This means that if you install e.g. API V.58, you will only be able to
    get data from Ensembl Vertebrates / Metazoa databases V. 58. As the
    EnsemblGenomes DB releases are one version behind the Ensembl Vertebrate
    DB release, if you install the bleeding-edge Ensembl Vertebrate API, *a
    matching EnsemblGenomes DB release might not be available yet*: you will
    still be able to use "Bio::Homology::InterologWalk" to run an orthology
    walk using exclusively Ensembl Vertebrate DBs, but you will get an error
    if you try to choose metazoan databases. See "setup_ensembl_adaptor" for
    further information.

    Therefore, before installing "Bio::Homology::InterologWalk", you are
    faced with the following choice:

    a)  If you are exclusively interested in vertebrates (plus the few
        non-vertebrate model organisms still present in Ensembl Vertebrates)
        then obtain the APIs and set up the environment by following the
        steps described on the Ensembl Vertebrates API installation pages:

        http://www.ensembl.org/info/docs/api/api_installation.html

        or alternatively

        http://www.ensembl.org/info/docs/api/api_cvs.html

        This option allows you to get the most recent datasets provided by
        Ensembl Core. However, you might not be able to query EnsemblCompara
        data.

    b)  If you are interested in querying/getting back data from vertebrate
        + metazoan genomes, then obtain the APIs and set up the environment
        by following the steps described on the Ensembl Metazoa API
        installation pages: (this allows you to query across a wider
        selection of taxa)

        http://metazoa.ensembl.org/info/docs/api/api_installation.html

        or alternatively

        http://metazoa.ensembl.org/info/docs/api/api_cvs.html

        This option will probably not use the most recent API+DBs, but will
        guarantee functionality across both Vertebrate and Metazoan genomes.

    Option (b) is the recommended one.

==========
2. Bioperl
==========

    Ensembl should provide a customised Bioperl installation tailored to its
    API, v. 1.2.3. Should version 1.2.3 be no more available through
    Ensembl, please obtain release 1.6.x from CPAN. (while not officially
    supported by the Ensembl Project it will work fine when using the API
    within the scope of the present module)

    =================================================

    NOTE 1: All the API components ("ensembl", "ensembl-compara",
    "ensembl-variation", "ensembl-functgenomics") are required. 

    NOTE 2: The
    module has been tested on Ensembl Vertebrates API & DB v. 58 and v. 59
    and EnsemblGenomes API & DB v. 5 (58).

EXAMPLE==========================================
e.g. to install API CORE V.58, do the following:

log into the Ensembl CVS server at Sanger (using password: CVSUSER):

$ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl login
Logging in to :pserver:cvsuser@cvs.sanger.ac.uk:2401/cvsroot/ensembl
CVS password: CVSUSER
Install the Ensembl Core Perl API for version 58

$ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl checkout -r branch-ensembl-58 ensembl

=====================
3. EXTRA PERL MODULES
=====================
You will also need to install the following modules (including all dependencies) from CPAN:

1. REST::Client
2. GO::Parser
3. DBD::CSV (requires Perl DBI)
4. String::Approx
5. List::Util
6. File::Glob

SAMPLE SCRIPS--------------------------------------------------------------------------------
The scripts/Code sub-directory provide an example for the usage of 
the module. The meaning of the files is as follows:

-doInterologWalk.pl:        example usage of the core methods: given a flat file containing a list of stable flybase ids, this script will use Bio::Homology::InterologWalk
                            to build a TSV file containing the putative interactors of such ids according to the interolog mapping method.

-getDirectInteractions.pl:  generate a dataset of direct PPIs based on the input ID list

-doScores.pl:               given a tsv obtained with doInterologWalk.pl, this file will compute an aggregated score for each (id, putative interactor) couple, representing a measure
                            of the reliability of the interaction. The output of this script is a new TSV file containing a new compound score column
                            
                            REQUIRES: doInterologWalk.pl getDirectInteractions.pl

-doNets.pl                  given a tsv obtained from doFlyWalk.pl (optionally, processed by doScores.pl to add a compound score column) this script will produce a .sif network file and 
                            a .noa network attribute file, suitable for importing into the Cytoscape (http://www.cytoscape.org/) network visualisation program. The two files follow the definition 
                            on page http://cytoscape.org/cgi-bin/moin.cgi/Cytoscape_User_Manual/Network_Formats and have been tested on Cytoscape v. 2.6.2 / 2.6.3
                            
                            REQUIRES: doInterologWalk.pl
                            OPTIONAL: doScores.pl

scripts/Data      contains a psi-mi obo ontology (used by doScores.pl interaction types and 
                  interaction detection methods) and a small sample Mus musculus dataset. 
		


COPYRIGHT AND LICENSE------------------------------------------------------------------------

Original author:  Giuseppe Gallone
CPAN ID: GGALLONE
G.Gallone@sms.ed.ac.uk

Copyright (C) 2010 by Giuseppe Gallone
This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the
LICENSE file included with this module.