The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <title>GO Database Perl Module</title>
    <link rel="stylesheet" type="text/css" href="../../doc/stylesheet.css">
  </head>

  <body>
    <h1>GO Database Perl Module</h1>

    <h2>Introduction</h2>

    <p>
      The go-db-perl module is an object/relational API for querying
      the GO db and receiving perl objects. Using the API you can
      write perl scripts to:

      <ul>
	<li>
	  Search on GO terms from the database
	</li>

	<li>
	  Fetch and traverse subgraphs of the GO ontology
	</li>

	<li>
	  Fetch gene products from the database; for example, <b>find
	    all gene products attached to <i>transmembrane
	      receptor</i> and its children</b>
	</li>

	<li>
	  Fetch terms and graphs from the database by gene product
	</li>

	<li>
	  Filter and constrain searches by GO evidence codes, species
	  and other criteria
	</li>

	<li>
	  ...and much more!
	</li>

      </ul>

      go-db-perl comes with a number of applications for loading and
      querying the GO Database, including a Tk graphical user
      interface, and a powerful command line interface "GOshell"

    </p>
    <p>
      You should have a solid understanding of object oriented perl
      before using the the go-db-perl modules
    </p>


    <h2>Installation</h2>

    <p>
      You should have a local copy of the GO mysql database. You will
      also need the go-perl modules
    </p>

    <p>
      Please follow the installation instructions for <a
      href="../../go-perl/doc/go-perl-doc.html">go-perl</a>
    </p>

    <h2>Scripts</h2>
    <p>
      go-db-perl comes with scripts in the <a
      href="../scripts">scripts</a> directory to help with querying and loading
      databases.
    </p>
    <h3>Command Line Options</h3>
    <p>
      The following command line options are available to all database
      scripts. If you are connecting to a local MySQL installation
      that is not password protected you should only need the
      <b>-d</b> option
    <ul>
      <li><b>-dbname (-d)</b> <i>database-name</i>
      </li>
      <li><b>-dbhost (-h)</b> <i>server-name</i>
      </li>
      <li><b>-dbuser (-u)</b> <i>user-name</i>
      </li>
      <li><b>-dbauth (-p)</b> <i>password</i>
      </li>
      <li><b>-port</b> <i>port-number</i>
      </li>
      <li><b>-dbms</b> <i>DBMS-type</i>
      </li>
      <li><b>-dsn</b> <i>DBI-DSN</i>
      </li>
    </ul>
    </p>
    <h3>Data access scripts</h3>
    <ul>
      <li>
        <a href="/dev/pod/scripts/GOshell.html">GOshell.pl</a> -- an
        interactive shell for easy object-oriented querying of the
        database. Contains online help
      </li>
    </ul>
    <h3>Data loading scripts</h3>
    <ul>
      <li>
          <a href="/dev/pod/scripts/load-go-into-db.html">load-go-into-db.pl</a> -- loads a go/obo or assoc file into a database
        </li>
        <li>
          <a
          href="/dev/pod/scripts/go-prepare-release.html">go-prepare-release.pl</a>
          -- handles the entire db release process, from generating an
          empty db through loading it to generating exported data
        </li>
        <li>
          <a href="/dev/pod/scripts/go-manager.html">go-manager.pl</a>
          -- similar to above but allows interactive control over the
          parts of the release process
        </li>
    </ul>
    <h2>GO::AppHandle</h2>

    <p>
      The core class in the API is the <a
	href="../../pod/GO/AppHandle.html">GO::AppHandle</a> object - it
      mediates between your code and the database.
    </p>
    <p>
      After downloading go-db-perl, consult the POD documentation,
      either using the <b>perldoc</b> command,
    <div class="codeblock">
      <pre>
perldoc GO/AppHandle.pm
      </pre>
    </div>
      or consult the <a href="../../pod/GO/AppHandle.html">Online
        Documentation</a>
    </p>

    <h3>Fetching Objects from the DB</h3>

    <p>
      The AppHandle takes requests, queries the database, and turns
      the results into perl objects. See the <a
	href="../../go-perl/doc/go-perl-doc.html">go-perl</a>
      documentation for a description of the object model
    </p>


    <h2>Database Loading</h2>
    <h3>How database loading works</h3>
    <p>
      First of all a file (any ontology format or a gene assoc file)
      is parsed using <a
      href="/dev/pod/GO/Parser.html">GO::Parser</a>. The parser will
      generate an Obo-xml stream. This stream is <i>transformed</i>
      using an <a href="../../xml/xsl/oboxml_to_godb_prestore.xsl">XSLT
      Transformation</a> into a different kind of XML that is
      isomorphic to the GO Database Schema. This godb-xml can be
      loaded into the database using a generic loader.
    </p>

    <h3>Database loading components</h3>
    <p>
      <ul>
        <li>
          <a href="/dev/pod/scripts/go-prepare-release.html">go-prepare-release.pl</a> -- this script is a wrapper for the components below
        </li>
        <li>
          <a href="/dev/pod/GO/Parser.html">GO::Parser</a> -- the parser
          is created and used by the go-prepeare-release.pl script to
          make obo-xml

        </li>
        <li>
          <a
          href="../../xml/xsl/oboxml_to_godb_prestore.xsl">oboxml_to_godb_prestore.xsl</a>
          -- this is a 'program' written in the XSLT language
          specifying how to transform obo-xml into db-xml. It uses the
          xsltproc command.
        </li>
        <li>
          <a
          href="/dev/pod/GO/Handlers/godb.html">GO::Handlers::godb</a>
          -- this module takes the db-xml stream and loads the
          database; this module is actually a wrapper for the generic
          <a
          href="http://search.cpan.org/perldoc?DBIx::DBStag">DBIx::DBStag</a>
          loader
        </li>
      </ul>
    </p>
    <p>
      See also <a
        href="../../xml/doc/xml-doc.html">xml</a> documentation.
    </p>

    <h2>Future Directions</h2>

    <p>
      We're currently looking at alternatives to the object/relational
      approach to querying the database via perl and other
      languages. On the one hand, the API allows us to reuse code and
      provide a simplified interface to some complex queries. On the
      other hand, it requires a lot of hard-to-maintain code. And
      whilst the API approach works well with queries that follow
      certain set patterns, it is not so good for arbitrary queries -
      for that you need to revert back to the full expressive power of
      a query language, such as SQL
    </p>

    <h3>DBStag</h3>
    <p>
      We are developing a library called DBStag (see <a
      href="http://stag.sf.net">Stag project page</a> for details),
      which transforms the results of multijoin SQL queries into
      nested XML. It also allows for SQL reuse in the form of Stag SQL
      templates. We have provided a number of these templates for GO
      in the <a href="../../sql/stag-templates">stag-templates
      directory</a>
    </p>

    <p>
      We expect to stop development on GO::AppHandle by 2005 and
      switch to an approach such as DBStag which combines the
      expressive power of a language such as SQL with hierarchical XML
      query results
    </p>

    <p>
      DBStag also allows for querying of the go-database using SQL
      templates - see <a href="../../sql/doc/godb-sql-doc.html">GODB
        SQL</a> documentation
    </p>

    <hr>
    <address><a href="mailto:cjm@fruitfly.org">Chris Mungall</a></address>
<!-- Created: Fri Jan 23 14:30:13 PST 2004 -->
<!-- hhmts start -->
Last modified: Thu Feb 10 15:30:54 PST 2005
<!-- hhmts end -->
  </body>
</html>