htdocs/tutorial/tutorial.html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html> <head>
<title>GBrowse Administration Tutorial</title>
<link rel="stylesheet" href="tutorial.css">
</head>

<body>
<h1>Generic Genome Browser Version 2: A Tutorial for Administrators</h1>

<h2>Author: Lincoln Stein, 20 January 2010</h2>

<p>

This is an extensive tutorial to take you through the main features
and gotchas of configuring GBrowse as a server.  This tutorial assumes
that you have successfully set up Perl, GD, BioPerl and the other
GBrowse dependencies. If you haven't, please see the <a
href="http://gmod.org/wiki/GBrowse_2.0_HOWTO">GBrowse HOWTO</a> During
most of the tutorial, we will be using the "in-memory" GBrowse
database (no relational database required!)  Later we will show how to
set up a genome size database using the berkeleydb and MySQL adaptors.

<p>

<b>Important:</b>This tutorial is designed for GBrowse 2.00 and will
not work with earlier versions of the software.</b>

<h2>Table of Contents</h2>

<ol>
  <li><a href="#basics">The Basics</a>
      <ol>
	<li><a href="#data_file">The Data File</a>
	<li><a href="#basic_conf">Defining Tracks</a>
	<li><a href="#descriptions">Adding Descriptions to a Feature</a>
	<li><a href="#naming">Adjusting GBrowse Name Searches</a>
	<li><a href="#linking">Linking</a>
	<li><a href="#balloon">Adding Popup Balloons to Tracks</a>
	    <ol>
	      <li><a href="#customizing_balloons">Customizing Balloons</a>
	    </ol>
      </ol>
  <li><a href="#feature_types">Displaying Common Types of Features</a>
      <ol>
	<li><a href="#segmented_features">Multi-segmented features</a>
	<li><a href="#canonical_gene">Protein-Coding Genes</a>
	    <ol>
	      <li><a href="#simple_gene">Simpler Genes</a>
	    </ol>
	<li><a href="#cds">Reading Frames</a>
	<li><a href="#grouping">Grouped Features</a>
	<li><a href="#graph">Quantitative Data (basic)</a>
	<li><a href="#wiggle">Quantitative Data (advanced)</a>
	<li><a href="#dna">DNA and 3-frame translations</a>
	<li><a href="#multiple_alignments">ESTs and Other Alignments</a>
	    <ol>
	      <li><a href="#adding_dna_to_alignments">Adding DNA to Alignments</a>
	    </ol>
	<li><a href="#trace">Trace Data</a>
      </ol>

  <li><a href="#enhancements">GBrowse Enhancements</a>
      <ol>
	<li><a href="#region">Adding a "Region" Panel</a>
	<li><a href="#overview">Putting Features into the Overview &amp; Regionview</a>
	<li><a href="#semantic_zooming">Semantic Zooming</a>
	<li><a href="#grouping_tracks">Grouping Tracks</a>
	<li><a href="#group_tables">Grouping Tracks into a Table</a>
	<li><a href="#plugins">Using Plugins</a>
      </ol>

  <li><a href="#external">Adding Features from External Sources</a>
      <ol>
	<li><a href="#upload">Uploading an Annotation File</a>
	<li><a href="#sharing">Sharing an Annotation File</a>
<!--  DAS SERVER SUPPORT NOT SUPPORTED AS OF JANUARY 2010	   
	<li><a href="#DAS">Using GBrowse as a DAS Server or Client</a>
	    <ol>
	      <li><a href="#das_combining">Combining Databases with
      DAS</a>
	      <li><a href="#das_exporting">Exporting DAS Tracks to
      Ensembl and other Genome Browsers<a>
	      <li><a href="#das_entire">Running GBrowse off DAS Entirely<a>
	    </ol>
-->
      </ol>

  <li><a href="#other_backends">Using Other Backends</a>
      <ol>
	<li><a href="#berkeleydb">The Berkeleydb Backend</a>
	    <ol>
	      <li><a href="#bp_seqfeature_load">The bp_seqfeature_load.pl script</a>
	    </ol>
	<li><a href="#mysql">The MySQL Backend</a>
	<li><a href="#other_backends">Other Backends</a>
      </ol>
  <li><a href="#multiple_databases">Multiple Database Backends</a>
  <li><a href="#conclusion">Conclusion</a>
</ol>

<h2><a name="basics">1. The Basics</a></h2>

<p>

We will be working with simulated Volvox genome annotation data.  The
database will be named "volvox" and GBrowse will be invoked with this
URL:

<blockquote
class="example"><pre>http://localhost/cgi-bin/gbrowse2/gbrowse/volvox</pre></blockquote>

<p>

These directories contain data files used during the tutorial:

<dl>
  <dt><a href="data_files/">data_files</a>
  <dd>DNA and features files to load into the local database.
      <p>
  <dt><a href="conf_files/">conf_files</a>
  <dd>GBrowse configuration files for you to take and modify.
</dl>

<p>

To introduce you to the system we will be using a file-based database
which allows GBrowse to run directly off text files.  To prepare this
database for use, find the GBrowse databases directory which was
created in your Apache web server directory at the time of
installation.  It should be located at
<b>/var/lib/gbrowse2/databases</b>, but check to make sure.

<p>

Similarly, check that you can find the gbrowse.conf configuration
directory.  It should be located at <b>/etc/gbrowse2</b> and contain the main
configuration file "GBrowse.conf" and the example yeast genome
datasource file "yeast_simple.conf" (among several others).

<p>

Now you will change the permissions of the database and configuration
directories so that you can write to them without root privileges.
This is only an issue on Unix systems, and Windows users can safely
ignore this step.

<blockquote class="example"><pre>
% <b>su</b>
Password: <b>*********</b>
# <b>chown my_user_name /var/lib/gbrowse2/databases</b>
# <b>chown my_user_name /etc/gbrowse2</b>
# <b>exit</b>
%
</pre></blockquote>

<p>

(Be sure to replace "my_user_name" with your login name!)

<p>

Now look around inside the databases directory.  There should be a
single subdirectory named "yeast_chr1+2". The yeast subdirectory is
where the example yeast chromosomes 1 and 2 data set is stored.

<p>

You will create an empty volvox subdirectory, and make it world
writable.  On Unix systems:

<blockquote class="example"><pre>
% <b>cd /var/lib/gbrowse2/databases</b>
% <b>mkdir volvox</b>
% <b>chmod go+rwx volvox</b>
</pre></blockquote>

<p>

<blockquote> <i>NOTE: The "%" sign in these examples is the
command-line prompt.  On Windows systems, the command-line prompt is
something like C:\Program Files\Apache
Group\Apache2\htdocs\databases&gt;.  Unix systems are more
variable, but the prompt usually ends with a "%" or a "#".  In all the
examples in this tutorial, what you type is rendered in
<b>boldface</b>, while prompts and command-line results are shown in
medium typeface.</i> </blockquote>

<p>

On Windows systems, use the file manager ("Explorer") to create a new
folder named "volvox."  If you are using Windows NT, 2000 or XP, right
click on the new folder and grant write privileges to all.

<p>

You'll now put the first of several data files into the volvox
database directory.  In the <a href="data_files/">data_files</a>
subdirectory of this tutorial you will find the file <a
href="data_files/volvox_remarks.gff3">volvox_remarks.gff3</a>.  Copy this into the
volvox database directory.  On Unix systems:

<blockquote class="example">
<pre>
% <b>cd /var/www/gbrowse2</b>
% <b>cp tutorial/data_files/volvox_remarks.gff3 /var/lib/gbrowse2/databases/volvox</b>
</pre>
</blockquote>

<p>

On Windows systems, use Explorer to copy the file into the volvox
database directory.

<p>

Now we will need a GBrowse config file to tell GBrowse how to render
this data set.  In the subdirectory <a
href="conf_files">/var/www/gbrowse2/tutorial/conf_files</a>, you will find a sample configuration
file named <a href="conf_files/volvox.conf">volvox.conf</a>.  Copy
this into your GBrowse configuration directory (/etc/gbrowse2).

<p>

You'll now edit the main GBrowse.conf configuration file to tell it
about the new data source. Open <b>/etc/gbrowse2/GBrowse.conf</b> with a text
editor, scroll to the bottom, and add this stanza to the bottom of the
file:

<blockquote><pre>

[volvox]
description  = Tutorial database
path         = volvox.conf
</pre></blockquote>

<p>

Be sure to leave a blank line between the bottom of the previous
stanza and the top of the new one (i.e., there should be a blank line
above "[volvox]").

<p>

You should now be able to view the data set.  Point your web browser
at <a
href="/cgi-bin/gb2/gbrowse/volvox">http://localhost/cgi-bin/gb2/gbrowse/volvox</a>
and type in "ctgA" in the search box.  The result is shown in Figure
1.

<p>
<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/basics1.gif"><br>
<i>Figure 1: volvox_remarks.gff3 data with volvox.conf config file.</i>
</blockquote>

<h3>If You are Having Problems...</h3>

<p>

If for some reason you get a blank page or an "Internal server error,"
there are a couple of things to check.  First, open the file
volvox.conf with a text editor ("Notepad" on Windows systems, emacs,
pico or vi on Unix systems) and confirm that the path to the volvox
database directory in this section is correct:

<pre>
[GENERAL]
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor memory
		-dir     '/var/lib/gbrowse2/databases/volvox'
</pre>

<p>

If there is a space in "/var/www/gbrowse2" then you must be certain to put
single quotes around the path as shown in the example above.

<p>

Next check that the volvox_remarks.gff3 file does exist inside the volvox
database directory and that it is readable by all users on your
system.  Similarly, check that the volvox.conf configuration file is
in the same directory as yeast.conf, and that it is readable by
all users on your system.

<p>

Microsoft Windows has an unpleasant tendency to add a ".txt" extension
to files without warning.  If something seems to be wrong with the
config or GFF file and you can't figure out what, check that the file
extension hasn't been modified.  To avoid this phenomenon, I suggest
that you select "All File Types" from the popup menu in the File Save
dialog.  You might also want to configure your Folder display to show
known file extensions.

<p>

If you're still having no luck, check the bottom of the Apache server
error log for error messages.  This file is located in various places
depending on how Apache is installed.  Look for the file
<b>error_log</b>, typically located in /usr/local/apache/logs,
C:\Program Files\Apache Group\Apache2\logs, /var/log/www, or
/var/log/httpd.  The error message will usually point you in the right
direction.

<p>

<blockquote style="background-color:orange"> If this doesn't fix the problem,
please stop the tutorial and send an e-mail to GBrowse support at <a
href="mailto:gmod-gbrowse@lists.sourceforge.net">gmod-gbrowse@lists.sourceforge.net.</a>
Someone will be happy to assist you.  </blockquote>

<h3><a name="data_file">1.1 The Data File</a></h3>

<p>


Let's look at the data file we loaded in detail now.  If you open the
<a href="data_files/volvox_remarks.gff3">volvox_remarks.gff3</a> file in a text
editor, you will see that it contains a series of 15 genome "features"
that look like this:

<blockquote class="example"><pre>
ctgA example contig 1     50000 . . . Name=ctgA
ctgA example remark 1659  1984  . + . Name=f07;Note=This is an example
ctgA example remark 3014  6130  . + . Name=f06;Note=This is another example
ctgA example remark 4715  5968  . - . Name=f05;Note=Ok! Ok! I get the message.
ctgA example remark 13280 16394 . + . Name=f08
...
</pre></blockquote>

<p>

Each feature has a "source" of "example", a type of "remark", and
occupies a short range (roughly 1.5k) on a contig named "ctgA."  In
addition to the features themselves, there is an entry for the contig
itself (type "contig").  This entry is needed to tell GBrowse what the
length of ctgA is.

<p>

The load file uses a standard known as <a
href="http://www.sequenceontology.org/gff3.shtml">GFF3 (General
Feature Format version 3)</a>.  Each line of the file corresponds to a
feature on the genome, and the nine columns are separated by tabs.

<p>

The 9 columns are as follows:

<ol>
  <li><b>reference sequence</b><br>
      This is the name of the feature that will be used to establish the
      coordinate system for the annotation.  This is usually the name of
      a chromosome, a clone, or a contig.  In our example, the
      reference sequence is "ctgA".  A single GFF file can refer to
      multiple reference sequences.</li><br>
  <li><b>source</b><br>
      The source of the annotation.  This field describes how the
      feature was derived.  In the example, the source is
      "example" for want of a better description.  Many people find
      the source as a way of distinguishing between similar features
      that were derived by different methods, for example, gene
      calls derived from different prediction software.  You can
      leave this column blank by replacing the source with a single
      dot (".").</li><br>
  <li><b>type</b><br>
      This column describes the feature type. Although, you can choose anything
      you like to describe the feature type, you are strongly encouraged to use
      well-recognized sequence ontology (SO) terms such as "gene", "repeat_region", "exon",
      and "CDS."  You can find a list of the recognized SO terms at
      <a
      href="http://song.cvs.sourceforge.net/song/ontology/sofa.ontology?rev=HEAD&content-type=text/vnd.viewcvs-markup">the Sequence Ontology Project web site</a>. For
      lack of a better name, the features in the volvox example are of
      type "remark." Another </li><br>
  <li><b>start position</b><br>
      The position that the feature starts at, relative to the
      reference sequence.  The first base of the reference sequence
      is position 1.</li><br>
  <li><b>end position</b><br>
      The end of the feature, again relative to the reference
      sequence.  End is always greater than or equal to start.</li><br>
  <li><b>score</b><br>
      For features that have a numeric score, such as sequence
      similarities, this field holds the score.  Score units are
      arbitrary, but most people use the expectation value for
      similarity features.  You can leave it blank by replacing
      the column with a dot.</li><br>
  <li><b>strand</b><br>
      For features that are strand-specific, this field is the
      strand on which the annotation resides.  It is "+" for the forward
      strand, "-" for the reverse strand, or "." for annotations that are
      not stranded.  If you are unsure of whether a feature is
      stranded, it won't hurt to use a "+" here.</li><br>
  <li><b>phase</b><br>
      For CDS features that encode proteins, this field describes
      where the next codon starts.
      The phase is one of the integers 0, 1, or 2, indicating the
      number of bases that should be removed from the beginning of
      this feature in order to reach the first base of the next codon. In other
      words, a phase of "0" indicates that the next codon begins at
      the first base of the region described by the current line, a
      phase of "1" indicates that the next codon begins at the second
      base of this region, and a phase of "2" indicates that the next codon
      begins at the third base of this region. This
      information is used by the "cds" glyph to show how the reading
      frame changes across splice sites.  For all other feature types,
      use a dot here.</li><br>
  <li><b>attributes</b><br>
      A list of feature attributes in the format tag=value.  Multiple
      tag=value pairs are separated by semicolons.  URL escaping rules are
      used for tags or values containing the following characters: ",=;".
      Spaces are allowed in this field, but tabs must be replaced with the
      %09 URL escape.

      <br><br>
      These tags have predefined meanings:
      <dl>
	<dt>ID</dt>
	<dd>Gives the feature a unique identifier. Useful when grouping features
	    together (such as all the exons in a transcript).</dd>
	    
	<dt>Name</dt>
	<dd>Display name for the feature.  This is the name to be
	    displayed to the user.</dd>

	<dt>Alias</dt>
	<dd>
	    A secondary name for the feature.  It is suggested that
	    this tag be used whenever a secondary identifier for the
	    feature is needed, such as locus names and
	    accession numbers.</dd>
	    
	<dt>Note</dt>
	<dd>A descriptive note to be attached to the feature. This will be displayed
	    as the feature's description.</dd>
      </dl>

      Alias and Note fields can have multiple values separated by commas.
      For example:
      <blockquote>Alias=M19211,gna-12,GAMMA-GLOBULIN</blockquote>
      Other good stuff can go into the attributes field, as we shall see later.</li>
</ol>

<p>

<b>It is very important to have a full-length entry (such as the one
for ctgA) for each reference sequence mentioned in the first column of
the GFF3 file. However, the reference sequence can have any source and
type you choose. Commonly used types are "clone", "chromosome" and
"contig."</b>

<h3><a name="basic_conf">1.2. Defining Tracks</a></h3>

<p>

Now we'll look at the configuration file in more detail.  Using a text
editor, open the volvox.conf file from its location in the
gbrowse.conf configuraton directory.  (If you mess up, you can always
copy a fresh version from <a
href="conf_files/volvox.conf">volvox.conf</a> in the tutorial
directory).

<p>

At the top is a [GENERAL] section which defines basic things such as
the database backend to use, the path to the database files, which
plugins to activate, and which tracks to show by default. In the
volvox.conf example:

<blockquote class="example"><pre>
[GENERAL]
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor memory
		-dir '/var/lib/gbrowse2/databases/volvox'

# just the basic track dumper plugin
plugins     = TrackDumper

# list of tracks to turn on by default
default features = ExampleFeatures

# size of the region
region segment         = 10000

# examples to show in the introduction
examples = ctgA

# feature to show on startup
initial landmark = ctgA:5000..10000
</pre></blockquote>

We'll discuss how to customize these options later. For now, focus on
the section at the bottom of the file, which starts with the line:
<code>### TRACK CONFIGURATION ###</code>:

<blockquote class="example"><pre>
[ExampleFeatures]
feature      = remark
glyph        = generic
stranded     = 1
bgcolor      = blue
height       = 10
key          = Example Features
</pre></blockquote>

<p>

This is a "stanza" that describes one of the tracks displayed by
GBrowse.  The track has an internal name of "ExampleFeatures" which
you can use in the URL to turn the track on.  The internal name is
enclosed by square brackets. Names can be any combination of printable
characters and whitespace, but can <b>not</b> contain the hyphen ("-")
character (you can use the underscore "_" character instead).

<p>

Following the track name are a series of
options that configure the track.  The "feature" option indicates what
feature type(s) to display inside the track.  It's currently set to
display the "remark" feature type.  The "glyph" option specifies
the shape of the rendered feature.  The default is "generic", which is
a simple filled box, but there are dozens of glyphs to choose from.
The "stranded" option tells the generic glyph to try to display the
strandedness of the feature -- this is what creates the little arrow
at the end of the box.  "bgcolor" and "height" control the background
color and height of the glyph respectively, and "key" assigns the
track a human-readable label.

<p>

Let's experiment with changing the track definition.  First, let's
change the color of the glyph.  With your text editor, change the
bgcolor option from blue to "orange", save it, and reload the page.
The features should change immediately as shown in Figure 2

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/basic_conf1.gif"><br>
<i>Figure 2: A Feature of a Different Color</i>
</blockquote>

<p style="font-size:9pt;font-style:italic">Note: Many of the
screenshots in this tutorial are from earlier versions of GBrowse and
may not look exactly the same as the current version.</p>

<p>

Please experiment with other changes!   Try changing the height to 5, the
key to "Skinny features" and the stranded option to 0 (which means
"false").  Just by changing a few options, you can create a very
distinctive track.

<p>

Now let's try changing the glyph.  One of the standard glyphs was
designed to show PCR primer pairs and is called "primers".  Change
"glyph = generic" to "glyph = primers" and reload the page.  Depending
on other changes that you might have made earlier, the result will
look something like Figure 3.

<p>


<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/basic_conf2.gif"><br>
<i>Figure 3: Using the primers Glyph</i>
</blockquote>

<p>

We'll see other examples of glyphs later on.  To get a list of the
most popular glyphs and the options that are available for them, see
the file CONFIGURE_HOWTO.txt, located in the docs/ subdirectory of the
GBrowse distribution.  Or for the gory and bleeding edge details, run
the command:

<blockquote class="example"><pre>
 % <b>glyph_help.pl -l</b>
</pre></blockquote>

<p>

This will give you a list of all the glyphs. Running the command with
the name of the glyph will give you copious documentation on all the
options the glyph recognizes.

<h3><a name="descriptions">1.3. Adding Descriptions to a Feature</a></h3>

<p>

By default, GBrowse will display the name of the feature above its
glyph provided that there is sufficient space to do this.  Optionally,
you can also attach some descriptive text to the feature.  This text
will be displayed below the feature, and can also be searched.

<p>

You can place descriptions, notes and other comments into the ninth
column of the GFF load file.  The example file <a
href="data_files/volvox_domains.gff3">volvox_domains.gff3</a> shows how this is done.
An excerpt from the top of the file looks like this:

<blockquote class="example"><pre>
ctgA example polypeptide_domain 11911 15561 . + . Name=m11;Note=kinase
ctgA example polypeptide_domain 13801 14007 . - . Name=m05;Note=helix loop helix
ctgA example polypeptide_domain 14731 17239 . - . Name=m14;Note=kinase
ctgA example polypeptide_domain 15396 16159 . + . Name=m03;Note=zinc finger
</pre></blockquote>

<p>

This defines several new features of type "polypeptide_domain".  The
ninth column, in addition to giving each of the motifs names adds a
"Note" attribute to each feature. As described earlier, each attribute
is a name=value pair separated by semicolons.

<p>

The attribute named <i>Note</i> is automatically displayed and made
searchable.  To see this work, add <a
href="data_files/volvox_domains.gff3">volvox_domains.gff3</a> to the volvox database
directory.  You can do this just by copying the file into
<b>/var/lib/gbrowse2/databases/volvox</b> so that the directory contains both
the original volvox_remarks.gff3 and the new volvox_domains.gff3 files.

<p>

To display this newly-loaded data set, open up volvox.conf and add the
following new stanza to the config file:

<blockquote class="example"><pre>
[Motifs]
feature      = polypeptide_domain
glyph        = span
height       = 5
description  = 1
key          = Example motifs
</pre></blockquote>

<p>

This defines a new track whose internal name is "Motifs."  The
corresponding feature type is "motif" and it uses the "span" glyph, a
graphic that displays a horizontal line capped by vertical endpoints.
The height is set to five pixels, and the human-readable key is set to
"Example motifs."  A new option, "description" is a flag that tells
GBrowse to display the Note attribute, if any.  Any non-zero value
means true.

<p>

After updating the configuration file, you will need to reload the
browser page and turn on the "Example motifs" checkbox below the main
image. The result is shown in Figure 4.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/descriptions1.gif"><br>
<i>Figure 4: Showing the Notes attribute</i>
</blockquote>

<p>

A copy of this config file is also available for you to use in
<a href="conf_files/volvox2.conf">volvox2.conf</a>.

<p>

To show that GBrowse will search the notes for keyword matches, try
typing in "kinase."  You will be presented with a list of all the
motifs whose Note attribute contains the word "kinase."

<h3><a name="naming">1.4. Adjusting GBrowse Name Searches</a></h3>

<p>

GBrowse has a very flexible search feature.  You can type in the name
of a reference sequence, such as "ctgA", and it will display the
entire thing, or you can type in a range in the format
"ctgA:start..stop".  Try "ctgA:5000..8000" to see this at work.

<p>

In addition, GBrowse can search for features by name. Anything that
has a Name or Alias attribute in the GFF3 file can be searched for by
name. For example, try searching for "f10" or even "f1*".

The only drawback to this is that you may have name collisions. For
example, some research communities distinguish genes from their
products using differences in capitalization, for example hga
and HGA. However, GBrowse's searches are case insensitive. To
avoid name collisions, you can give each type of feature a distinctive
naming prefix, for example "Gene:hga" and "Protein:HGB".

<p>

To illustrate how this works, have a look at <a
href="data_files/volvox_geneproducts.gff3">volvox_geneproducts.gff3:</a>

<blockquote class="example"><pre>
ctgA example remark                             1000 2000 . . . Name=hga
ctgA example protein_coding_primary_transcript  1100 2000 . + . Name=Gene:hga
ctgA example polypeptide                        1200 1900 . + . Name=Protein:HGA
ctgA example protein_coding_primary_transcript  1600 3000 . - . Name=Gene:hgb
ctgA example polypeptide                        1800 2900 . - . Name=Protein:HGB
</pre></blockquote>

<p>

Copy <a href="data_files/volvox2b.gff3">volvox2b.gff3</a> into the
databases/volvox folder. Now add the following configuration stanza to
volvox.conf to create a track that displays both
protein_coding_primary_transcript and polypeptide features:

<blockquote class="example"><pre>
[NameTest]
feature      = protein_coding_primary_transcript polypeptide
glyph        = generic
stranded     = 1
bgcolor      = green
height       = 10
key          = Name test track
</pre></blockquote>

<p>

This stanza creates a new track named "Name test track" and displays
features of type "protein_coding_primary_transcript" and "polypeptide"
using green generic glyphs that are 10 pixels high. When you look at
the data file, you'll see that there are three things potentially
named "HGA", a remark which uses the unqualified name, a gene which
uses the qualified name "Gene:hga", and a polypeptide region which
uses the qualified name "Protein:HGA." There is also a
protein_coding_primary_transcript named "Gene:hgb" and a protein named
"Protein:HGB." (<i>Note, in this track we are using slightly awkward
sequence ontology terms, like "protein_coding_primary_transcript,"
rather than more natural terms like "gene" in order to avoid these
example features from appearing in the real "gene" track that we
create later on in this tutorial.</i>)

<p>

To see how GBrowse searches for names, type "HGA" (either
upper or lowercase) in the search textbox and press "Search." Because
the search term matches three remark whose unqualified name is HGA,
GBrowse will bring up the region between 1000..2000 and highlight the
HGA remark.

<p>

Now search for "Protein:HGA." Because you searched with the qualified
name, GBrowse will find and highlight the protein feature.

<p>

Now try to search for "HGB." This search fails because HGB only exists
in qualified form in the database. You can still, however, search for
"Gene:HGB" or "Protein:Hgb" (capitalization doesn't matter). This may
or may not be the behavior that you desire. If you would like GBrowse
to search through qualified names when the user types the unqualified
version, you can configure this easily by adding the following line to
volvox.conf under the [General] section:

<blockquote class="example"><pre>
automatic classes = Gene Protein
</pre></blockquote>

<p>

This option directs GBrowse to search for the unqualified name first,
followed by names prefixed with "Gene:" and then names prefixed with
"Protein:". Whichever is found first will be displayed. Now searching
for "HGB" will find "Gene:hgb". Swapping the order of Gene and Protein
on this line will cause the "Protein:HGB" to be found.

<p>

Another way to approach this is to make liberal use of the Alias
attribute. For example:

<blockquote class="example"><pre>
ctgA example remark                             1000 2000 . . . Name=Remark:HGA;Alias=hga
ctgA example protein_coding_primary_transcript  1100 2000 . + . Name=Gene:hga;Alias=hga
ctgA example polypeptide                        1200 1900 . + . Name=Protein:HGA;Alias=hga
ctgA example protein_coding_primary_transcript  1600 3000 . - . Name=Gene:hgb;Alias=hgb
ctgA example polypeptide                        1800 2900 . - . Name=Protein:HGB;Alias=hgb
</pre></blockquote>

<p>

This assigns the alias of "hga" to each of the three HGA features, and
an alias of "hgb" to each of the two HGB features. This keeps the
identities of these features distinct so that you can find particular
ones by typing in the fully qualified name ("Gene:hga"), but find all
candidates when you type in the unqualified name. For instance, when
you search with "hga", GBrowse will now offer you three matches:

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/aliases.gif"><br>
<i>Figure 5: Searching for aliases</i>
</blockquote>

<p>

To try this out, simply open the installed volvox2b.gff3 file with a
text editor and edit it to match the example above.

<p>

<b>Note:</b> GBrowse caches track images for performance reasons. If
you make some changes to the data or config files and don't get the
expected behavior, try turning off caching by going to the Preferences
link, and turning off the checkbox marked "Cache tracks."

<h3><a name="linking">1.5. Linking</a></h3>

<p>

The next topic we'll cover in this tutorial is configuring GBrowse's
outgoing links.  When the user clicks on a glyph in the details image,
he will be taken to another page by following a URL.  The URL to
follow is generated from the <code>link</code> option.  The default
link option is located in the [TRACK DEFAULTS] section of the config
file; you can specify track-specific links by placing a
<code>link</code> option in one or more of the individual track
stanzas.

<p>

The volvox.conf track defaults looks like this:

<blockquote class="example"><pre>
[TRACK DEFAULTS]
glyph         = generic
height        = 10
bgcolor       = lightgrey
fgcolor       = black
font2color    = blue
label density = 25
bump density  = 100
# where to link to when user clicks in detailed view
link          = AUTO
</pre></blockquote>

<p>

In this case, we've been using a special link URL of "AUTO."  This
generates an automatic link to a helper script named
"gbrowse_details."  If you click on some of the features in the
current volvox page you'll get an idea of what this script displays.

<p>

We're going to override the default link rule for the motif track.
There's nothing sensible to link to, so we'll link to Google using
first the motif's name, and then the motif's description.

<p>

Go to the [Motifs] stanza in the volvox.conf config file and modify it
so that it looks like this:

<blockquote class="example"><pre>
[Motifs]
feature      = polypeptide_domain
glyph        = span
height       = 5
description  = 1
link         = http://www.google.com/search?q=$name
key          = Example motifs
</pre></blockquote>

<p>

The only change we've made is to add a "link" option to the stanza,
where the value is a Google search URL.  "$name" is a Perl variable.
GBrowse will fill in this variable with the name of the motif.  Reload
the page and click on a motif to see that this works as advertised
("m01," "m02" and the other example motifs are similar to the names
for galactic clusters, so be prepared for some astronomy hits).

<p>

It would be more sensible to link to the description of the motif, for
example "helix loop helix."  Fortunately we can do that too.  Just
change the link option to:

<blockquote class="example"><pre>
link         = http://www.google.com/search?q=$description
</pre></blockquote>

<p>

There are a large number of possible variables that you can use inside
link rules.  See the CONFIGURE_HOWTO document in the GBrowse
distribution for the full list.  You can also construct links using
Perl callbacks as described in the section on <a
href="#multiple_alignments">displaying ESTs</a>.  This gives you the
ability to generate any arbitrary URL.

<p>

If you want nothing to happen when the user clicks on a feature, just
set link to empty ("link = ").

<p>

The last thing we'll do is to change the behavior of the [Motif] track
so that:

<ol>
  <li>a new window pops up with the google search rather than
      replacing the contents of the current window
  <li>when the user mouses over a motif, a hints box will appear
      telling him that clicking there will initiate a google search
</ol>

<p>

These changes are easy:

<blockquote class="example"><pre>
[Motifs]
feature      = polypeptide_domain
glyph        = span
height       = 5
description  = 1
link         = http://www.google.com/search?q=$description
link_target  = _blank
title        = Search Google for $description.
key          = Example motifs
</pre></blockquote>

<p>

There's now a <code>link_target</code> option.  This contains the name
of a browser window in which to load the content when the user clicks
on the feature.  If there's no window of that name, the browser will
create a new window and give it the desired name.  Choose an ordinary
name like "Google" if you want the Google content to be loaded into
the same window each time, or choose "_blank" as we've done here in
order to pop up a new fresh window each time the user clicks.

<p>

The <code>title</code> option contains a bit of text that will be
displayed whenever the user hovers the mouse over the feature for a
second or two.  The same variable substitution rules apply, so when
the user mouses over feature "m06", a balloon will pop up that says
"Search Google for SUSHI repeat."  Give it a try!  </p>

<h3><a name="balloon">1.6. Adding Popup Balloons to Tracks</a></h3>

<p>The title option is a simple way to add popup balloons to
tracks. In addition to this feature, GBrowse can display popup
balloons when the user hovers over or clicks on a feature. The
balloons can display arbitrary HTML, either provided in the config
file, or fetched remotely via a URL. You can use this feature to
create multiple choice menus when the user clicks on the feature, to
pop up images on mouse hovers, or even to create little embedded query
forms. See <a href="http://mckay.cshl.edu/balloons.html">
http://mckay.cshl.edu/balloons.html</a> for examples.</p>

<p>To activate custom balloons, add ``balloon hover'' and/or ``balloon
click'' options to the track stanzas that you wish to add buttons
to. You can also place these options in [TRACK DEFAULTS] to create a
default balloon.</p>

<p>``balloon hover'' specifies HTML or a URL that will be displayed when
the user hovers over a feature. ``balloon click'' specifies HTML or a
URL that will appear when the user clicks on a feature. The HTML can
contain images, formatted text, and even controls. Examples:</p>

<blockquote class="example"><pre>
 balloon hover = &lt;h2&gt;Gene $name&lt;/h2&gt;
 balloon click = &lt;h2&gt;Gene $name&lt;/h2&gt;
       &lt;a href='<a href="http://www.google.com/search?q=">http://www.google.com/search?q=</a>$name'&gt;Search Google&lt;/a&gt;&lt;br&gt;
       &lt;a href='<a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&amp;term=">http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&amp;term=</a>$name'&gt;Search NCBI&lt;/a&gt;&lt;br&gt;
</pre></blockquote>

<p>For example, to add a balloon to the motifs track of the Volvox browser,
add "balloon tips = 1" near the top of the volvox.conf file, and then
add balloon hover and balloon click options like this: </p>

<blockquote class="example"><pre>
[Motifs]
feature      = polypeptide_domain
glyph        = span
height       = 5
description  = 1
balloon hover = &lt;h2&gt;Gene $name&lt;/h2&gt;
balloon click = &lt;h2&gt;Gene $name&lt;/h2&gt;
       &lt;a href='<a href="http://www.google.com/search?q=">http://www.google.com/search?q=</a>$name'&gt;Search Google&lt;/a&gt;&lt;br&gt;
       &lt;a href='<a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&amp;term=">http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&amp;term=</a>$name'&gt;Search NCBI&lt;/a&gt;&lt;br&gt;
key          = Example motifs
</pre></blockquote>

<p>You can also populate the balloon contents dynamically using data
from a local or remote web server. This facility, as well as options
for customizing balloon appearance, is described in the <a
href="http://gmod.org/wiki/GBrowse_2.0_HOWTO#Configuring_Balloon_Tooltips">GBrowse2 HOWTO</a>.

<hr>

<h2><a name="feature_types">2. Displaying Common Types of Features</a></h2>

<p>

Now that you've seen the basics, we'll discuss techniques to display
multi-part features, genes, alignments, quantitative data and other
special feature types.


<h3><a name="segmented_features">2.1. Multi-segmented features</a></h3>

<p>

Many features are discontinuous.  Examples include spliced
transcripts, and gapped sequence similarity alignments, such as the
alignment of cDNAs to the genome.  GBrowse can deal with such features
easily provided that you take a little care in setting them up.

<p>

The data file <a href="data_files/volvox_matches.gff3">volvox_matches.gff3</a>
contains a simulated data set of a series of gapped nucleotide
alignments.  An excerpt from the file is here:

<blockquote class="example"><pre>
ctgA example match 32329 32359 . + . ID=match-seg01;Name=seg01
ctgA example match 26122 26126 . + . ID=match-seg02;Name=seg02
ctgA example match 26497 26869 . + . ID=match-seg02;Name=seg02
ctgA example match 27201 27325 . + . ID=match-seg02;Name=seg02
ctgA example match 27372 27433 . + . ID=match-seg02;Name=seg02
ctgA example match 27565 27565 . + . ID=match-seg02;Name=seg02
ctgA example match 27813 28091 . + . ID=match-seg02;Name=seg02
ctgA example match 28093 28201 . + . ID=match-seg02;Name=seg02
ctgA example match 28329 28377 . + . ID=match-seg02;Name=seg02
ctgA example match 28829 29194 . + . ID=match-seg02;Name=seg02
ctgA example match  6885  7241 . - . ID=match-seg03;Name=seg03
ctgA example match  7410  7737 . - . ID=match-seg03;Name=seg03
ctgA example match  8055  8080 . - . ID=match-seg03;Name=seg03
ctgA example match  8306  8999 . - . ID=match-seg03;Name=seg03
</pre></blockquote>

<p>

This file uses a new GFF3 attribute, "ID". The ID attribute is used to
group features together and to indicate when a single feature occupies
multiple discontinuous locations. In the case of a gapped alignment,
each ungapped segment is represented by a single GFF3 line. The
segments of a single alignment are then grouped together by using the
same ID.  For example "match-seg03" starts at position 6885 and ends
at 8999.  It has four subsegments, one from 6885..7241, another from
7410..7737, and so forth.

<p>

The ID attribute is not the same as the Name attribute. If you give
three lines the same ID, they will be grouped together into a single
displayed feature. If you give three lines the same Name you will end
up with three distinct features that all happen to share the same
name. Also note that except for the coordinates and the score (which
we'll discuss later) all columns for each of the parts of a
multisegmented feature should be the same. For example, you can't have
one part of a feature on the (+) strand and another part on the (-)
strand.

<p>

Copy <a href="data_files/volvox_matches.gff3">volvox_matches.gff3</a> into the volvox
database directory.  Then edit volvox.conf to add the following track
definition:

<blockquote class="example"><pre>
[Alignments]
feature      = match
glyph        = segments
key          = Example alignments
</pre></blockquote>

<p>

This is declaring a new track named "Alignments" which displays
features of type "match" using a glyph named "segments".  The segments
glyph is specialized for displaying objects that have multiple similar
subparts. Reload the page and activate the "Example alignments"
track. You should see a track similar to Figure 6.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/segmented_features2.gif">
<br><i>Figure 6: Use the "segments" glyph to display discontinuous
multipart features.</i>
</blockquote>


<h3><a name="canonical_gene">2.2. Protein-Coding Genes</a></h3>

<p>

GBrowse can display protein-coding genes in various shapes and styles.
The easiest way to set this up is to use the
<a href="http://www.sequenceontology.org/gff3.shtml#gene">sequence ontology's
canonical description of a gene</a> along with the "gene" glyph. Take a
look at the file <a href="data_files/volvox_genes.gff3">volvox_genes.gff3</a>,
which defines a gene named EDEN, and its three spliced forms named
EDEN.1, EDEN.2 and EDEN.3.  Here is the contents of the file:

<blockquote class="example"><pre>
ctgA example gene            1050 9000 . + . ID=EDEN;Name=EDEN;Note=protein kinase

ctgA example mRNA            1050 9000 . + . ID=EDEN.1;Parent=EDEN;Name=EDEN.1;Index=1
ctgA example five_prime_UTR  1050 1200 . + . Parent=EDEN.1
ctgA example CDS             1201 1500 . + 0 Parent=EDEN.1
ctgA example CDS             3000 3902 . + 0 Parent=EDEN.1
ctgA example CDS             5000 5500 . + 0 Parent=EDEN.1
ctgA example CDS             7000 7608 . + 0 Parent=EDEN.1
ctgA example three_prime_UTR 7609 9000 . + . Parent=EDEN.1

ctgA example mRNA            1050 9000 . + . ID=EDEN.2;Parent=EDEN;Name=EDEN.2;Index=1
ctgA example five_prime_UTR  1050 1200 . + . Parent=EDEN.2
ctgA example CDS             1201 1500 . + 0 Parent=EDEN.2
ctgA example CDS             5000 5500 . + 0 Parent=EDEN.2
ctgA example CDS             7000 7608 . + 0 Parent=EDEN.2
ctgA example three_prime_UTR 7609 9000 . + . Parent=EDEN.2

ctgA example mRNA            1300 9000 . + . ID=EDEN.3;Parent=EDEN;Name=EDEN.3;Index=1
ctgA example five_prime_UTR  1300 1500 . + . Parent=EDEN.3
ctgA example five_prime_UTR  3000 3300 . + . Parent=EDEN.3
ctgA example CDS             3301 3902 . + 0 Parent=EDEN.3
ctgA example CDS             5000 5500 . + 1 Parent=EDEN.3
ctgA example CDS             7000 7600 . + 1 Parent=EDEN.3
ctgA example three_prime_UTR 7601 9000 . + . Parent=EDEN.3
</pre></blockquote>

<p>

GFF3 uses a three-tiered structure to represent the gene, descending
from gene to mRNA to CDS and UTR features. A gene has potentially many
mRNAs, and each mRNA has potentially several CDS and UTR features. To
describe how the parts fit together, we use ID and Parent features.

<p>

We start with a feature of type "gene" with the ID "EDEN". This has
three alternative splice forms named EDEN.1, EDEN.2 and EDEN.3. To
tell GBrowse that each of these splice forms are part of the same
gene, we give each one a Parent attribute of "EDEN" corresponding to
the ID of the parent gene. Now consider mRNA EDEN.1. It has a
five_prime_UTR feature, a three_prime_UTR feature, and four CDS
features. To indicate that the CDS and UTR features belong to the
mRNA, we give the mRNA a unique ID of "EDEN.1" and give each of the
subfeatures a corresponding parent. This pattern repeats for each of
the other two splice forms. Note how the five_prime_UTR of EDEN.3 is
split in two parts. 

<p>

As before, we use "Name" to give the gene and its alternative splice
forms a human-readable name, and use Note to provide a description for
the gene as a whole (you can add notes to the individual mRNAs but
they won't display by default). The Index=1 attribute is a hint to the
database to make the mRNAs searchable by name. This lets users find
the gene by searching for the mRNA names ("EDEN.1") as well as by the
gene name ("EDEN"). However, it is usually unecessary to do this. Also
notice that we are using the Phase column for the CDS features to
describe how the CDS is translated into protein. See the description
of phase in the <a name="#data_file">data file</a> section.

<p>

This is the full way to describe genes. Simpler ways are described
later in this section.

<p>

<blockquote class="example"><i>HINT: If you prefer not to distinguish
between 5' and 3' UTRs, you can simply use "UTR" as the type.  If you
don't know where the UTRs are, just leave them blank.  If you'd rather
think in terms of exons and introns, then check out so_transcript
glyph.</i> </blockquote>

<p>

Go ahead and add volvox_genes.gff3 to the database.  Then add the following
new stanza to the bottom of the file:

<blockquote class="example"><pre>
[Genes]
feature      	   = gene
glyph              = gene
bgcolor            = peachpuff
label_transcripts  = 1
draw_translation   = 1
category           = Genes
label_transcripts  = 1
key                = Protein-coding genes
</pre></blockquote>

<p>

The new Genes track associates "gene" features with the "gene" glyph,
sets its background color to peachpuff (yes, there really is a color
by this name!), turns on the description lines, and sets the human
readable track name to "Protein-coding genes."  Also, since our track
table is starting to get a little crowded, this stanza uses the
"category" option to start a separate section in the track table for
tracks having to do with genes.<p>

<p>

Upon reloading the page, turning on the new "Protein-coding genes"
track, and viewing the region around 1..10K, you'll see this:

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/canonical_gene1.gif"><br>
<i>Figure 7: The canonical gene</i>
</blockquote>

<p>

The gene glyph has a number of options that you can use to customize
its appearance:

<p>
<table border="1">
  <tr>
    <th>Option Name</th>
    <th>Possible values</th>
    <th>Description</th>
  </tr>
  <tr>
    <th>thin_utr</th>
    <td>0 (false), 1 (true)</td>
    <td>If true, makes UTRs half-height.</td>
  </tr>
  <tr>
    <th>utr_color</th>
    <td>a color name ("gray" by default)</td>
    <td>Changes the UTR color.</td>
  </tr>
    <tr>
    <th>decorate_introns</th>
    <td>0 (false), 1 (true)</td>
    <td>If true, puts little arrowheads on the introns to indicate
	direction of transcription.</td>
  </tr>
</table>

<p>

Using these options, we can make the track look like the UCSC Genome
Browser (Figure 8).

<blockquote class="example"><pre>
[Genes]
feature          = gene
glyph            = gene
height           = 9
bgcolor          = black
utr_color        = black
thin_utr         = 1
decorate_introns = 1
description      = 1
label_transcripts= 1
category         = Genes
key              = Protein-coding genes
</pre></blockquote>

<p>

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/canonical_gene3.gif"><br>
<i>Figure 8: A UCSC Genome Browser lookalike</i>
</blockquote>

<h4><a name="simple_gene">2.2.1. Simpler Genes</a></h4>

<p>

If the full three-tiered representation of a gene bugs you, there are
simpler alternatives. To represent a typical predicted gene that only
has a translated region, you can represent the translation as a single
CDS line for a single-exon gene, or a series of linked lines for a
spliced gene. <a
href="data_files/volvox_genes_simple.gff3">data_files/volvox_genes_simple.gff3</a> shows how
to do this:

<blockquote class="example"><pre>
ctgA predicted CDS 10000 11500 . + 0 Name=Apple1

ctgA predicted CDS 13000 13800 . + 0 ID=cds-Apple2;Name=Apple2
ctgA predicted CDS 15000 15500 . + 1 ID=cds-Apple2;Name=Apple2
ctgA predicted CDS 17000 17200 . + 2 ID=cds-Apple2;Name=Apple2
</pre></blockquote>

<p>

This creates two linked CDS sets: a single exon predicted called
Apple1 and a three-exon gene called Apple2. Note that we use a common
ID to tie the three Apple2 exons together.

<p>

The corresponding stanza will look like this:

<blockquote class="example"><pre>
[CDS]
feature      	   = CDS:predicted
glyph              = gene
bgcolor            = white
category           = Genes
key                = Predicted genes
</pre></blockquote>

<p>

<p>

The other thing to notice is that the feature is now qualified as
"CDS:predicted". This corresponds to a GFF3 type (column 3) of "CDS",
and a GFF3 source (column 2) of "predicted." In all previous examples,
we used an unqualified feature name, but in this case we don't want
the CDS subfeatures from the three-tier EDEN gene examples to be
displayed in the predicted gene track. Therefore we limit the features
that are displayed in this track by qualifying the feature type with
its source using the syntax shown here.

<p>

The result is shown in Figure 9:

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/predicted_genes.gif"><br>
<i>Figure 9: Simpler genes using linked CDSs and the transcript glyph</i>
</blockquote>

<p>

The bottom six lines of volvox_genes_simple.gff3 show how to display a single
transcript that has both coding and non-coding regions.

<blockquote class="example"><pre>
ctgA exonerate mRNA 17400 23000 . + . ID=rna-Apple3;Name=Apple3;Note=Predicted
ctgA exonerate UTR  17400 17999 . + . Parent=rna-Apple3
ctgA exonerate CDS  18000 18800 . + 0 Parent=rna-Apple3
ctgA exonerate CDS  19000 19500 . + 1 Parent=rna-Apple3
ctgA exonerate CDS  21000 21200 . + 2 Parent=rna-Apple3
ctgA exonerate UTR  21201 23000 . + . Parent=rna-Apple3
</pre></blockquote>

<p>

To represent this transcript, we need to create a feature of type mRNA
and a unique ID, followed by several UTR and CDS subfeatures all
linked to the mRNA via their Parent attribute. In this example we use
"UTR" for the UTR features, although the more explicit
"five_prime_UTR" and "three_prime_UTR" types will also work. The
"so_transcript" (Sequence Ontology transcript) glyph knows how to
display these correctly:

<blockquote class="example"><pre>
[Transcript]
feature      	   = mRNA:exonerate
glyph              = so_transcript
description        = 1
bgcolor            = beige
category           = Genes
key                = Exonerate predictions
</pre></blockquote>

<p>

After making this addition to the configuration file, reload the page
and turn on "Exonerate predictions." You will see a display that is
similar to the gene track, but treats each transcript as a separate
feature.


<h3><a name="cds">2.3. Reading Frames</a></h3>

<p>

Continuing with the example from section 2.2, the third exon of EDEN.1
is shared with EDEN.3.  But is the reading frame preserved?  The "cds"
glyph will create a display that will visualize each CDS's reading
frame.

<p>

To see this work, add the following stanza to the bottom of the
configuration file:

<blockquote class="example"><pre>
[ReadingFrame]
feature            = mRNA
glyph              = cds
ignore_empty_phase = 1
category           = Genes
key                = Frame usage
</pre></blockquote>

<p>

When you reload the page and turn this track on, you'll see a "musical
staff" representation of the frame usage (Figure 10). From this we can
see that the alternative splicing in fact changes the reading frame of
the second exon.

<p>

The "feature" option tells the glyph to take its data from the mRNA
subfeatures of the main gene features. Note that depending on which
data adaptor you use, you may need to specify the attribute "Index=1"
for each of the mRNA subfeatures in order for the glyph to find them
inside the gene object. However, this is usually unnecessary.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/cds1.gif"><br>
<i>Figure 10:  The "cds" glyph shows the reading frame using a musical
staff notation</i>
</blockquote>

<h3><a name="grouping">2.4. Grouped Features</a></h3>

<p>

In some circumstances you may wish to group features together to
create a multipart feature. The gene object is actually just a special
case of this. To show you the general case, we'll creature a feature
of type "BAC", whose subparts are of type "clone_start" and
"clone_end" (possibly corresponding to a BAC clone mapping
experiment).  Here is the GFF3 representation of this:

<blockquote class="example"><pre>
ctgA example BAC         1000  20000 . . . ID=b101.2;Name=b101.2;Note=Fingerprinted BAC with end reads
ctgA example clone_start 1000   1500 . + . Parent=b101.2
ctgA example clone_end   19500 20000 . - . Parent=b101.2
</pre></blockquote>

<p>

As you can see, we've created a top-level feature of type "BAC" with
two children of type "clone_start" and "clone_end" respectively. The
start and end have opposite strands, indicating that they were
sequenced off different strands of the BAC. The three features are
tied together using the ID and Parent attributes that should be
familiar to you from the gene examples.

<p>

This data lives in <a href="data_files/volvox_bacs.gff3">volvox_bacs.gff3</a>.
Go ahead and add this into the database now.  To visualize this add
the appropriate stanza to the bottom of volvox.conf:

<blockquote class="example"><pre>
[Clones]
feature      = BAC
glyph        = segments
bgcolor      = yellow
connector    = dashed
strand_arrow = 1
description  = 1
category     = Clones
key          = Fingerprinted BACs
</pre></blockquote>

<p>

With this new track turned on, look at ctgA:1..24200.  It will show
that GBrowse has correctly picked up and rendered the relationship
between the whole BAC and its two end reads (Figure 11). We have seen
all these display options before with the exception of the "connector"
option. This controls the appearance of the connecting line between
subparts of a feature and can be one of "none", "solid", "dashed",
"hat" or "quill". Try them and see what happens! (Note, you will have
to change the strandedness of the BAC parent feature from "." to "+"
in order to see anything special happen with the quill connector.)

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/custom_aggregators1.gif"><br>
<i>Figure 11: Displaying a simple multipart feature</i>
</blockquote>

<h3><a name="graph">2.5 Showing Quantitative Data (basic)</a></h3>

<p>

GBrowse can plot quantitative data such as alignment scores,
confidence scores from gene prediction programs, and microarray
intensity data. The data can be displayed either with glyphs that
change color to indicate score levels (see the
"heterogeneous_segments", "graded_segments" and "redgreen_box"
glyphs), or using a general-purpose XY-plot glyph.

<p>

Congratulations, Affymetrix has built a tiling array for the volvox
genome!  There's now a transcriptional profile for volvox, with an
intensity reading every 100 bp across all of ctgA. The simulated data
for this is in the file <a
href="data_files/volvox_array.gff3">volvox_array.gff3</a>, an excerpt of which is
shown here:

<blockquote class="example"><pre>
ctgA affy microarray_oligo   1 100 281 . . Name=Expt1
ctgA affy microarray_oligo 101 200 183 . . Name=Expt1
ctgA affy microarray_oligo 201 300 213 . . Name=Expt1
ctgA affy microarray_oligo 301 400 191 . . Name=Expt1
ctgA affy microarray_oligo 401 500 288 . . Name=Expt1
ctgA affy microarray_oligo 501 600 184 . . Name=Expt1
...
</pre></blockquote>

<p>

The file contains 500 features, each of which is exactly 100 bp long.
The features are of type "microarray_oligo" and of
source "affy."  Each one has a score (column 6) between 0 and 1000,
where higher scores means more transcriptional activity.  This is the
first time we've used the score column.

<p>

All of the 500 features share the same Name (column 9) of
"Expt1". Sharing the same name will allow us to group them together
into a single transcriptional profiling experiment. However, we do not
give them the same ID for reasons that are explained later. If we had
multiple experiments to show, they would be named Expt1, Expt2 and so
on.

<p>

We would like to generate a line graph that shows the transcriptional
profile level across the current region.  To do this, we need to group
all members of the same experiment together into a single graph, and
then to assign the "xyplot" glyph to the data. The following
configuration stanza will do this:

<blockquote class="example"><pre>
[TransChip]
feature        = microarray_oligo
glyph          = xyplot
graph_type     = line
fgcolor        = black
bgcolor        = black
height         = 50
min_score      = 0
max_score      = 1000
scale          = right
group_on       = display_name
category       = Quantitative Data
key            = Transcriptional Profile
</pre></blockquote>

<p>

The options shown here create a track named TransChip to display the
tprofile feature with the xyplot glyph.  The "graph_type", "height",
"scale", "min_score", and "max_score" options all configure various
aspects of the xyplot glyph's appearance.

<blockquote class="example"><i>You can read all about xyplot's options using
<b>perldoc Bio::Graphics::Glyph::xyplot</b></i></blockquote>

<p>

When you reload the page and turn on the Transcriptional Profile
track, you should see something like that shown in Figure 12.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/graph1.gif"><br>
<i>Figure 12: A transcriptional profile rendered with the xyplot glyph</i>
</blockquote>

<p>

<blockquote class="example"> <i>Using the info that perldoc provides, play around with
the xyplot options a bit.  For example, see what happens when you
change graph_type to "boxes."</i> </blockquote>

<h3><a name="wiggle">2.6 Quantitative Data (advanced)</a></h3>

The recipe in <a href="#graph">2.5</a> works well for several thousand
data points, but if you have very dense data, such as that produced by
genomic tiling arrays, then you will want to use a specialized binary
representation known as "wiggle" format. A wiggle track consists of a
file that contains all the quantitative data, and a single feature in the
database proper that points at that data file. Loading this data is a
three-step process:

<ol>
  <li>Create a WIG file.
  <li>Convert the WIG file into the wiggle binary file and a gff3
      line.
  <li>Load the gff3 line.
</ol>

<p>

<a href="http://genome.ucsc.edu/goldenPath/help/wiggle.html">WIG
format</a> is a specialized format for describing quantitative
data. It was created by Jim Kent for use in the UCSC genome
browser. Details on creating WIG files are described at <a
href="http://genome.ucsc.edu/goldenPath/help/wiggle.html">http://genome.ucsc.edu/goldenPath/help/wiggle.html</a>.

<p>

WIG files are plain text files. They always begin with a "track"
header, which, at a minimum, looks like this:

<blockquote class="example"><pre>
track type=wiggle_0 name="ArrayExpt1" description="20 degrees, 2 hr"
</pre></blockquote>

<p>The "type" attribute is required, and must have a value of
"wiggle_0". "name" and "description" are optional, but suggested, and
indicate the name and description of the data series -- these will
become the "Name" and "Note" fields of the generated GFF3 feature.

Following the track line comes the data for one or more chromosomal
regions. As described in the UCSC documentation, there are three ways
of formatting the data: (1)"Bed Format", (2) "variableStep", and (3)
"fixedStep" format. The first format is essentially the same as GFF3
and does not give you any performance advantages over using straight
GFF3. variableStep format describes intervals of the genome that have
a fixed width, but begin at arbitrary locations, while fixedStep
format describes features of the genome that are evenly spaced and
have a fixed width (e.g. tiling array features).

<p>

For <i>variableStep</i> data, the format is:

<blockquote class="example"><pre>
 variableStep chrom=chr19 span=150
 59304701 10.0
 59304901 12.5
 59305401 15.0
 59305601 17.5
 59305901 20.0
 59306081 17.5
 59306301 15.0
 59306691 12.5
 59307871 10.0
</pre></blockquote>

<p> The data is introduced by a line beginning with the keyword
"variableStep", and the arguments "chrom" and "span", which indicate
the chromosome on which the features are located, and the width of
each feature, in base pairs. This is followed by a series of
two-element lines indicating the start position of each feature, and
its quantitative value. Values can be any sort of numeric data,
including integers, negative numbers and floating point.

<p>

For <i>fixedStep</i> data, the format is:

<blockquote class="example"><pre>
fixedStep chrom=chr19 start=59307401 step=300 span=200
1000
 900
 800
 700
 600
 500
 400
 300
 200
 100
</pre></blockquote>

<p> The data is introduced by a line beginning with the keyword
"fixedStep", and the arguments "chrom", "span", "start" and "step".
The first two arguments are the same as before, while "start" and
"step" indicate the starting position of the first feature, and the
spacing between each feature. This is followed by a numeric value for
each step. In this case, we have described 10 features beginning at
position 59307401. Each feature begins 300 bp from the next and is 200
bp wide. In practice, this means that the first 200 bp of each
interval is filled with known data, while information on the last 100
bp is "missing."

<p>

To see how this works in practice, let us reformat our example
microarray data using the fixedStep version of WIG format. The
complete data for this is in the file <a
href="data_files/volvox_microarray.wig">volvox_microarray.wig</a>. It
begins like this:

<blockquote class="example"><pre>
track type=wiggle_0 name="example" description="20 degrees, 2 hr"
fixedStep chrom=ctgA start=1 step=100 span=100
281
183
213
191
288
...
</pre></blockquote>

<p>

Compare this to the microarray data in <a href="#graph">Showing
Quantitative Data (basic)</a>, and you will see that the five entries
in the WIG file correspond to the first five features in the GFF3
files.

<p>

We'll now create the binary file for the data using the wiggle2gff3.pl
script.  First, copy the <a
href="data_files/volvox_microarray.wig">volvox_microarray.wig</a> into
the volvox database directory and then change directories (cd) into that
directory.  <b>Also, delete the volvox_array.gff3 file so we don't see the 
same set of data twice.</b>

<p>

We want it to live in the volvox database directory, so we
have to specify this path when creating it:

<blockquote class="example"><pre>
% wiggle2gff3.pl --path=/var/lib/gbrowse2/databases/volvox volvox_microarray.wig \
                 > volvox_microarray.gff3
</pre></blockquote>


<p>

After this script runs, it will write out a line of GFF3 data, which
we save to volvox_microarray.gff3. This file will look like this:

<blockquote class="example"><pre>
##gff-version 3

ctgA . microarray_oligo 1 50000 . . . Name=example;Note=20%20degrees%2C%202%20hr;wigfile=/var/lib/gbrowse2/databases/volvox/track001.ctgA.1200440492.wig
</pre></blockquote>

<p>

This file contains a single feature that spans the region indicated by
the WIG file. The feature has the indicated name and description, and
has a new attribute "wigfile" that points to the place where the
quantitative data within the region can be found. You are free to edit
this file to change the source or type, You can also set the source
and type in wiggle2gff3.pl by passing it --source and --type options
on the command line. If you move the binary wiggle file, please change
the value of the "wigfile" attribute to indicate its new location.

<p>

One last step is needed to make the data display properly,
however. You must set the glyph type to either "wiggle_xyplot" or
"wiggle_density." These are the only glyphs that recognize and
properly format wiggle-style data. You can also remove the min and max
options, since the wiggle binary files store this information
internally and it is no longer needed.

<p> In the config file, change the [TransChip] stanza to look like
this:

<blockquote class="example"><pre>
[TransChip]
feature        = microarray_oligo
glyph          = wiggle_xyplot
graph_type     = boxes
height         = 50
scale          = right
description    = 1
category       = Quantitative Data
key            = Transcriptional Profile
</pre></blockquote>

When you reload the page, the quantitative data should display
correctly. You might notice a speed improvement; this becomes much
more noticeable on large data sets.

<p>

Now, for some fun, change the [TransChip] section to use the
"wiggle_density" glyph. Also set the bgcolor to "blue" and delete the
unneeded graph_type and scale options.

<blockquote class="example"><pre>
[TransChip]
feature        = microarray_oligo
glyph          = wiggle_density
height         = 30
bgcolor        = blue
description    = 1
category       = Quantitative Data
key            = Transcriptional Profile
</pre></blockquote>

<p>

This is what the modified track will look like:

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/wiggle_density.png"><br>
<i>Figure 13: A transcriptional profile rendered with the wiggle_density glyph</i>
</blockquote>

<h3><a name="dna">2.7. DNA and 3-frame translations</a></h3>

<p>

GBrowse can take advantage of DNA sequence data in several ways:

<ol>
  <li>It can display a GC content graph of the reference sequence at
      low magnifications and the DNA sequence itself at higher
      magnifications.
  <li>It can display three and six-frame translations of the reference
      sequence DNA.
  <li>It can display the protein translation of coding regions.
  <li>It can display aligned nucleotide sequences, creating a poor
      man's multiple alignment.
  <li>It can directly display next generation sequencing data that has
      been stored in <a href="http://samtools.sourceforge.net">SAM or BAM</a> format (see the <a href="http://gmod.org/wiki/GBrowse_NGS_Tutorial">GBrowse NGS Tutorial</a>).  </ol>
<p>

So we've been working with feature coordinates, but no actual DNA
sequence has been loaded into the volvox database.  We will again
rebuild the database, this time loading in a simulated DNA file in
fasta format.  Download the file <a
href="data_files/volvox.fa">volvox.fa</a>, and copy it into the volvox
database directory.  At this point in the tutorial, when you do a
directory listing of the volvox database directory (with "ls" on
unix systems, or "dir/w" on Windows systems) it should look like
this:

<blockquote class="example"><pre>
% <b>ls /var/lib/gbrowse2/databases/volvox/</b>
track001.ctgA.1202327456.wig  volvox_domains.gff3   volvox_genes.gff3  volvox.fa
volvox_remarks.gff3		      volvox_matches.gff3   volvox_bacs.gff3  
volvox2b.gff3		      volvox_genes_simple.gff3  volvox_est.gff3
volvox_microarray.gff3
</pre></blockquote>

<p>

If you haven't done so already, please be sure that you have made the
database directory writeable by the web server user, either by making
it world writeable (as described at the beginning of this tutorial),
or by changing the directory's group ownership to match the Apache web
server's group account (it varies from system to system, but "nobody",
"www", "apache" and "www-data" are the most common possibilities).

This is all you need to do to load the DNA.  To see that the DNA is
indeed being loaded, add two new stanzas to the volvox.conf
configuration file:

<blockquote class="example"><pre>
[DNA]
glyph          = dna
global feature = 1
height         = 40
do_gc          = 1
gc_window      = auto
fgcolor        = red
axis_color     = blue
strand         = both
category       = DNA
key            = DNA/GC Content

[Translation]
glyph          = translation
global feature = 1
height         = 40
fgcolor        = purple
start_codons   = 0
stop_codons    = 1
translation    = 6frame
category       = DNA
key            = 6-frame translation
</pre></blockquote>

<p>

The "DNA" track uses a specialized glyph called "dna".  At low
magnifications (zoomed way out), this glyph draws a GC content plot.
At high magnifications (zoomed way in), this glyph draws the dna.  Of
the various options given in the example stanza, the most important
one is "global feature", which is set to a true value (1).  This tells
GBrowse that the stanza doesn't correspond to a specific feature type,
but should be displayed globally.  Other options control whether to
draw one or both strands, whether to draw the GC content histogram,
the window size to use when smoothing the histogram, and what colors
to use.

<p>

Similarly, the "Translation" track uses a glyph called "translation",
which draws three or six-frame conceptual translations.  At low
magnifications (zoomed way out), this glyph draws little symbols
indicating where start and stop codons are.  At high magnifications,
the actual amino acid sequence comes into view. Again, the most
important option is "global feature", which is set to a true value to
tell GBrowse that the track isn't attached to a particular feature
type, but is to be generated automatically.  Other options control the
height of the glyph, whether to draw start and/or stop codon symbols,
and whether to generate a 3frame or 6frame translation.

<p>

Figures 13a and 13b show the browser at low and high magnification,
with both tracks activated.  Notice that the coding track (the "cds"
glyph) detects that the DNA is available and generates the
transcripts' protein translations automatically.

<blockquote class="center">
<b>(14A)</b><br>
<img  style="border:solid;border-width:1px" src="figures/dna1.png">
<br><br><b>(14B)</b><br>
<img  style="border:solid;border-width:1px" src="figures/dna2.png"><br>
<i>Figure 14: Viewing DNA/GC content and 6-frame translation. (a) low
magnification; (b) high magnification</i>
</blockquote>

<p>

<blockquote  class="example" ><i>
If you happen to do a listing of the volvox database directory after
adding the DNA file, you might notice that a new file named
"directory.index" has appeared.  This index directory is created
automatically by GBrowse in order to speed up access to the .fa file
and to reduce memory requirements.  If the database directory is
<b>not</b> writable by all users, GBrowse will not be able to create
this directory, and the display will be somewhat slower whenever a
DNA track is turned on.
</i></blockquote>

<h3><a name="multiple_alignments">2.8 ESTs and Other
Alignments</a></h3>

<p>

This section will lead you through creating a plausible EST track, and
show you how grouping of 5' and 3' EST reads works.

<p>

We'll start with a simple data set containing information on three
pairs of EST reads.  You'll find this data set in <a
href="data_files/volvox_est.gff3">volvox_est.gff3</a>.  Here is the first pair
described in the data file:

<blockquote class="example"><pre>
ctgA est EST_match 1050 1500 . + . ID=Match1;Name=agt830.5
ctgA est EST_match 3000 3202 . + . ID=Match1;Name=agt830.5

ctgA est EST_match 5410 5500 . - . ID=Match2;Name=agt830.3
ctgA est EST_match 7000 7503 . - . ID=Match2;Name=agt830.3

ctgA est EST_match 1050 1500 . + . ID=Match3;Name=agt221.5
ctgA est EST_match 5000 5500 . + . ID=Match3;Name=agt221.5
ctgA est EST_match 7000 7300 . + . ID=Match3;Name=agt221.5
...
</pre></blockquote>

<p>

What's going on here is the same as the alignments shown in <a
href="data_files/volvox_matches.gff3">volvox_matches.gff3</a>.  There are two EST
reads named agt830.5 (the 5' read) and agt830.3 (the 3' read).  Each
of them matches the ctgA genome in two discontinuous regions because,
presumably, they cross a splice site. As in the earlier example, we
represent each EST as a single "EST_match" feature that spans several
lines. The lines are linked together by sharing the same ID attribute.

<p>

There are two other things to notice.  One is that the source field
(column 2) is "est" and the type (column 3) is "EST_match." Either of
these fields can be used to distinguish the EST matches in this file
from the generic "match" matches used in the earlier example. The
second item of interest is that the strand field (column 7) is + for
the 5' EST and - for the 3' EST, indicating that the 3' EST aligned to
the reverse complement of ctgA.

<p>

Add this file to the volvox database directory, and add the following
to the configuration file:

<blockquote class="example"><pre>
[EST]
feature      = EST_match:est
height       = 6
glyph        = segments
bgcolor      = orange
category     = Genes
key          = ESTs
</pre></blockquote>

<p>

This will give a display similar to that shown in Figure 15.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/multiple_alignments1.gif"><br>
<i>Figure 15:  A simple representation of EST matches.</i>
</blockquote>

<p>

For reasons described earlier, the feature option reads
"EST_match:est" rather than simply "match" in order to distinguish the
EST matches from the example matches that we loaded previously.

<p>

This display is OK, but it could be better. One problem is that the
relationship between the 5' and 3' EST read pairs is not shown.  We'd
like to place the two members of the pair together on the same line,
and connect them with a dotted line to show that they are the two ends
of the same cDNA clone. An easy way to do this is to add a
"group_pattern" option to the [EST] stanza:

<p>

<blockquote class="example"><pre>
[EST]
feature       = EST_match:est
glyph         = segments
height        = 6
bgcolor       = orange
group_pattern = /\.[53]$/
category      = Genes
key           = ESTs
</pre></blockquote>

<p>

The new group_pattern option tells GBrowse to use a Perl regular
expression pattern matching operation to find and group related EST
matches based on their names.  It helps to understand how Perl regular
expressions work, but basically the pattern match breaks down this
way:

<blockquote class="example"><pre>
  /            begin the pattern match
  \.           match a dot
  [53]         match either the numbers 5 or 3
  $            match the end of the string
  /            end the pattern match
</pre></blockquote>

<p>

What this is saying is to look for pairs of EST names that are similar
except for the terminal .5 or .3, and pair them.  When we reload the
page, we get Figure 16.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/multiple_alignments2.gif"><br>
<i>Figure 16: The group_pattern option allows EST pairs to be grouped</i>
</blockquote>

<p>

Here are regular expressions that will work for other common EST
pairing schemes:

<table border="1">
  <tr>
    <th>5' EST</th>
    <th>3' EST</th>
    <th>group_pattern</th>
  </tr>
  <tr>
    <td>agt123f</td>
    <td>agt123r</td>
    <td>/[fr]$/</td>
  </tr>
  <tr>
    <td>agt123p</td>
    <td>agt123q</td>
    <td>/[pq]$/</td>
  </tr>
  <tr>
    <td>f.agt123</td>
    <td>r.agt123</td>
    <td>/^[fr]\./</td>
  </tr>
  <tr>
    <td>5.agt123</td>
    <td>3.agt123</td>
    <td>/^[53]\./</td>
  </tr>
  <tr>
    <td>agt123.for</td>
    <td>agt123.rev</td>
    <td>/\.(for|rev)$/</td>
  </tr>
</table>

<p>

Another nice enhancement would be to give the 5' and 3' ESTs different
colors so as to distinguish one from another.  This can be
accomplished using a Perl <i>callback</i>.  Open up volvox.conf once
more, and find the bgcolor option in the [EST] track.  Replace it with
this (you may want to cut and paste from here in order to avoid
introducing any typos):

<blockquote class="example"><pre>
bgcolor      = sub {
		my $feature = shift;
		my $name    = $feature->display_name;
		if ($name =~ /\.5$/) {
		   return 'red';
		} else {
		   return 'orange';
		}
	}
</pre></blockquote>

<p>

You'll need to know the basics of the Perl programming language in
order to do this type of thing yourself.  Suffice to say that instead
of hard-coding the color "orange" into the bgcolor option, we are
asking GBrowse to run a Perl subroutine each time it needs to render
an EST.  The subroutine is passed the feature that is about to be
drawn.  It asks the feature for its human-readable name (display_name)
and assigns that name to a variable named $name.  It then performs a
pattern match on the name to see if it ends in a "5".  If the name
matches, the subroutine returns the color "red" to GBrowse.  Otherwise
it returns the color "orange."

<p>

The effect is shown in Figure 17.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/multiple_alignments3.gif"><br>
<i>Figure 17: Using a callback to distinguish 5' and 3' ESTs</i>
</blockquote>

<p>

For your convenience, a configuration file with all the stanzas
defined up to this point can be found in <a href="conf_files/volvox_quarter.conf">volvox_quarter.conf</a>.

<h4><a name="adding_dna_to_alignments">2.8.1. Adding DNA to Alignments</a></h4>

<p>

The last thing we'll do with the EST data set is to add DNA to the
ESTs so that at high magnification GBrowse will show the multiple
alignment.  This information is also used by the "dump alignments"
plugin to generate a text-based multiple alignment.

<blockquote class="example"> <i> NOTE: Currently only nucleotide to nucleotide
alignments can be displayed at the level of individual nucleotides
(e.g. BLASTN, BLAT, Exonerate).  Protein to nucleotide alignments,
such as those produced by Genewise or BLASTX, are not supported at the
residue level</i>
</blockquote>

<p>

To make this work, we need to add two additional pieces of information
to the EST alignment data:

<ol>
  <li>The DNA sequences of the volvox ESTs.
  <li>The alignment positions in EST coordinates.
</ol>

In case the need for item (2) isn't immediately clear, consider this
blow-up of an alignment:

<blockquote class="example"><pre>
ctgA      1050 gattgccattgaccttggccattggccaagctgaa 1086
               |||||||||| ||||||| ||||||||||||||||
agt830.5     1 gattgccattcaccttgggcattggccaagctgaa 135
</pre></blockquote>

<p>

What we currently have in the GFF file are the <b>source</b> genomic
positions of the alignments (in ctgA-relative coordinates).  We need
to add the <b>target</b> positions in agt830.5-relative coordinates in
order for GBrowse to fetch and display the appropriate segments of the
EST DNA.

<p>

The fasta file <a href="data_files/ests.fa">ests.fa</a> provides the DNA
sequences for the six EST reads.  The GFF load file
<a href="data_files/volvox_est_targets.gff3">volvox_est_targets.gff3</a> contains the revised
coordinates.  If you look at this file you'll see that it is
dissimilar to previous load files:

<blockquote class="example"><pre>
ctgA est EST_match 1050 1500 . + . ID=Match1;Name=agt830.5;Target=agt830.5 1 451
ctgA est EST_match 3000 3202 . + . ID=Match1;Name=agt830.5;Target=agt830.5 452 654
ctgA est EST_match 5410 5500 . - . ID=Match2;Name=agt830.3;Target=agt830.3 505 595
ctgA est EST_match 7000 7503 . - . ID=Match2;Name=agt830.3;Target=agt830.3 1 504 
ctgA est EST_match 1050 1500 . + . ID=Match3;Name=agt221.5;Target=agt221.5 1 451
ctgA est EST_match 5000 5500 . + . ID=Match3;Name=agt221.5;Target=agt221.5 452 952
ctgA est EST_match 7000 7300 . + . ID=Match3;Name=agt221.5;Target=agt221.5 953 1253
...
</pre></blockquote>

<p>

The first eight columns are identical to what we've been using before,
but the ninth column follows a new convention used for nucleotide to
nucleotide and protein to nucleotide alignments.  There is now a
special attribute, "Target", that tells GBrowse specifies the name of
the EST sequence (found in a FASTA file), the start position of the
alignment in EST coordinates, and the end position of the alignment in
EST coordinates.  the combination of a target sequence and its
coordinates. For example, the first segment of the first alignment,
agt830.5, spans positions 1050 to 1500 in genome coordinates, and
positions 1-451 in EST sequence coordinates.

<p>

There is a subtlety here. Notice that for minus strand ESTs, the
target coordinates are not reversed; the start position is always less
than the end position.  For example, for the first agt830.3 HSP, we
are told that genomic region 5410..5500 aligns to EST region 505..596.
The strand field is used to determine the direction of the alignment.

<p>

Since this data file contains a revised version of volvox7.gff,
<b>remove volvox_est.gff3 from the database directory and replace it with
<a href="data_files/volvox_est_targets.gff3">volvox_est_targets.gff3</a> </b>.  Also copy <a
href="data_files/ests.fa">ests.fa</a> into the database directory.  If
you perform a directory listing, it should look like this:

<blockquote><pre>
directory.index           volvox_remarks.gff3	volvox_domains.gff3
volvox_genes_simple.gff3  volvox_bacs.gff3      volvox_est_targets.gff3
ests.fa		          volvox_matches.gff3   volvox_genes.gff3
volvox_array.gff3         volvox.fa
</pre></blockquote>

<p>

<blockquote> <i>NOTE: If you see doubled EST features after this
point, make sure that you have removed volvox7.gff.  Another thing to
watch out for is that some sort of bug in the BioPerl layer (up
through at least version 1.4) causes the EST DNA display to get messed
up at this point on Windows systems.  To fix the latter problem, go to
the volvox database directory and remove the files directory.dir and
directory.pag.  These are automatically-generated DNA file indexes
that GBrowse develops, and will be regenerated for you the next time
you access a page.</i> </blockquote>


<p>

We're not done with making configuration file changes, but <a
href="conf_files/volvox_halfway.conf">volvox_halfway.conf</a> contains all
configuration file enhancements up to this point.  If you like, you
can copy it over the live volvox.conf.  It contains the following
version of the [EST] track:

<blockquote class="example"><pre>
[EST]
feature          = EST_match:est
glyph            = segments
height           = 6
draw_target      = 1
show_mismatch    = 1
canonical_strand = 1
label_position   = left
bgcolor      = sub {
		my $feature = shift;
		my $name    = $feature->display_name;
		if ($name =~ /\.5$/) {
		   return 'red';
		} else {
		   return 'orange';
		}
	}
group_pattern    = /\.[53]$/
key              = ESTs
</pre></blockquote>

<p>

The key addition to this track configuration is the "draw_target",
"show_mismatch" and "canonical_strand" options.  All options are
true/false flags, where 0 means false and 1 means true.  draw_target
tells the segments glyph to draw the DNA sequence of the target ESTs
when the magnification allows.  show_mismatch instructs the glyph to
highlight mismatches between the genome and the EST in pink.
canonical_strand instructs the glyph to display the plus strand
sequence even when the EST matches the minus strand. We've also added
a "label_position" option that tells GBrowse to draw each EST's label
to its left. This allows the multiple alignments to pile up nicely.

<p>

To see this work, reload the page, turn on the EST track and search
for region "ctgA:1065..1165".  This will show the aligned 5' ends of
agt221.5, agt830.5 and agt767.5 (Figure 18).  Notice that one of the
T's towards the beginning of agt830.5 is highlighted to show that it
doesn't match the corresponding genomic base.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/adding_dna_to_alignments1.gif"><br>
<i>Figure 18: Multiple alignments at the DNA level</i>
</blockquote>

<p>

If you don't see the EST sequence appearing, make sure that ests.fa is
in the volvox database directory and is world readable.  If it still
isn't working, you may need to "touch" the file in order to update its
modification date.  This tells GBrowse that it is new and needs to be
reindexed.  In Unix:

<blockquote class="example"><pre>
% <b>touch /var/lib/gbrowse2/databases/volvox/ests.fa</b>
</pre></blockquote>

<p>If you are still having problems, remove the directory.index file
completely in order to force reindexing.

<h3><a name="trace">2.9. Trace Data</a></h3>

<p>

If you have sequence trace information (in SCF format) associated with
the reference sequence, this can be displayed in gbrowse using the
trace glyph. To use this glyph, you must have installed:

<dl>
  <dt>The Staden io-lib package
  <dd><a href="http://staden.sourceforge.net">staden.sourceforge.net</a>
  <dt>zlib
  <dd><a href="http://www.zlib.net">www.zlib.net</a>
  <dt>The Bio::SCF perl module
  <dd>Available from CPAN
</dl>

Note that at this time, it is not possible to use the trace glyph
with Windows servers, since we do not know of a version of the Staden
io-lib package that has been compiled for Windows.
<p>

The data file <a href="data_files/volvox_trace.gff3">volvox_trace.gff3</a>
contains an example trace entry.

<p>

<blockquote class="example"><pre>
ctgA	example	read	44401	45925	.	+	.	Name=trace;trace=volvox_trace.scf
</pre></blockquote>

<p>

This aligns the full trace sequence to the reference sequence.  The
trace file in this case is named "volvox_trace.scf", and it is located
in <a href="data_files/volvox_trace.scf">/var/www/gbrowse2/tutorial/data_files/volvox_trace.scf</a>.

<p>

Due to sequence quality, the first few bases of a trace file usually
don't align.  Even so, these need bases need to be included in the gff
file.  For instance, if the bases 10-700 of the trace file aligns to
the bases 100-800 of the reference sequence, the feature would be
90-800 to account for the first 10 bases (starting at base 0).

<p>

<blockquote class="example"> <i> NOTE: The trace glyph currently
doesn't deal with insertions or deletions.  If an indel occurs, the
alignment after the indel will be off.</i> </blockquote>

<p>

Copy this file into the volvox database directory. Then, to display
the trace, copy the following into the volvox.conf (or copy <a
href="conf_files/volvox5.conf">volvox5.conf</a> over the current
volvox.conf file).

<p>

<blockquote class="example"><pre>
[Traces]
feature      = read
glyph        = trace
fgcolor      = black
bgcolor      = orange
strand_arrow = 1
height       = 6
description  = 1
a_color      = green
c_color      = blue
g_color      = black
t_color      = red
show_border  = 1
trace_height = 80
trace_prefix = http://localhost/gbrowse2/tutorial/data_files/
key          = Traces
</pre></blockquote>

<p>

The fgcolor, bgcolor, strand_arrow and height control the bar that
shows the location and directionality of the trace.

<p>

The trace_prefix option is important because it gives the path to the
trace files.  This is prepended to the trace file name defined in the
gff file.  It can be a direct path to the directory (eg
"/usr/local/trace_files/") or a web address (as above).

<p>

The a/c/g/t_color options allow configuration of the base colors.  The
trace_height refers to the height of the trace itself.  Play around
with it to find a height that you like.

<p>

If show_border is set to 1, a black box will be drawn around the
trace.

<p>

After configuring the trace glyph, reload the browser page and enable
traces.  Zoomed out you will see:

<p>

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/trace1.png"><br>
<i>Figure 19: The trace glyph zoomed out.</i>
</blockquote>

<p>

Zooming in will show you the trace diagram:

<p>

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/trace2.png"><br>
<i>Figure 19: The trace glyph zoomed in.</i>
</blockquote>

<p>

<h3><a name="trace">2.10. Next Generation Sequencing Data</a></h3>

<p>

GBrowse allows you to load up next generation sequencing data such as
produced by the Illumina and SOLiD platforms. See the <a
href="http://samtools.sourceforge.net">SAM or BAM</a> format (see the
<a href="http://gmod.org/wiki/GBrowse_NGS_Tutorial">GBrowse NGS
Tutorial</a> for instructions on how to do this.

<hr>

<h2><a name="enhancements">3. GBrowse Enhancements</a></h2>

<p>In this section of the tutorial we'll discuss customizing the look
and feel of GBrowse by adding a region view section, adding feature
tracks to the region view and/or overview sections, configuring
semantic zooming, and adding functionality with plugins.

<h3><a name="region">3.1. Adjusting the "Region" Panel</a></h3>

<p>

The region panel is the intermediate-sized view that appears between
the overview and the detailed views. You can adjust its behavior by
changing the following options in either the main GBrowse.conf config
file, or one of the data source configs:

<blockquote class="example"><pre>
default region         = 5000
region sizes           = 1000 5000 10000 20000
region segment         = 200000
</pre></blockquote>

<p>

The "default region" is the length (in bp) of the region shown when
the user first loads GBrowse. "region sizes" is a space-delimited list
of lengths that the user can select in his or her preferences. "region
segment" is the largest size that the region can be set to.

<p>

To disable the region display entirely, put the following into the
[GENERAL] section of the main config file or the data source config
file:

<blockquote class="example"><pre>
region section = hide
</pre></blockquote>

<p>

You can do the same thing with the overview "overview section=hide"
but it is unclear why you'd want to.

<h3><a name="overview">3.2. Putting Features into the Overview &amp; Regionview</a></h3>

<p>

In many cases it is handy to add tracks directly to the overview
and/or region panel. These tracks can be turned on and off just like
normal tracks, and can serve as reference points for well-known genes,
cytogenetic bands, or genetic markers.

<p>

We will illustrate how to do this by placing a copy of the Motifs
track into the overview.  Add the following to the bottom of the
volvox.conf configuration file:

<blockquote class="example"><pre>
[Motifs:overview]
feature      = polypeptide_domain
glyph        = span
height       = 5
description  = 0
label        = 1
key          = Motifs
</pre></blockquote>

<p>

This stanza is identical to the [Motifs] track that we created
earlier, except that its name is qualified with ":overview".  This
tells GBrowse that this is not an ordinary track to be placed in the
detail image, but one that should be placed in the overview.

<p>

We also want the overview motifs track to be displayed by default, so
go to the top of the configuration file, and modify the "default
features" option to look like this:

<blockquote class="example"><pre>
# list of tracks to turn on by default
default features = ExampleFeatures  Motifs:overview
</pre></blockquote>

<p>

Reload the page.  Viol&aacute;!  See Figure 22.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/overview1.gif"><br>
<i>Figure 20:</i> Any number of tracks can be placed in the overview or region
</blockquote>

<p>

You can add as many tracks to the overview as you like.  The main
warning is that if you add lots of features to the overview it can get
pretty crowded in there.  Performance can also suffer, since each
feature must be fetched and rendered each time the overview is
displayed.</p>

<p>

To add a track to the region panel, simply replace ":overview" with
":region" in the track stanza:

<blockquote class="example"><pre>
[Motifs:region]
feature      = polypeptide_domain
glyph        = span
height       = 5
description  = 0
label        = 1
key          = Motifs
</pre></blockquote>


<h3><a name="semantic_zooming">3.3. Semantic Zooming</a></h3>

<p>

One of the cooler features of GBrowse is its ability to support
semantic zooming.  Semantic zooming is a feature in which objects show
different levels of detail depending on the level of magnification.
We've already seen this behavior in the "dna" and "segments" glyphs,
which show the DNA sequence only when there's sufficient room to
display it.

<p>

GBrowse has several types of semantic zooming:

<dl>
  <dt>glyph-based, automatic
  <dd>The dna and segments glyphs, and others that support semantic
      zooming out of the box.  This happens automatically and can't
      be modified.
  <dt>semantic labeling
  <dd>When there's sufficient room, GBrowse will print the label and
      descriptions next to the glyphs.  The threshold at which this
      happens is under your control.
  <dt>semantic bumping
  <dd>When there's sufficient room, GBrowse will "bump" features to
      prevent them from colliding on the screen.  When this would
      cause the display to become to high, bumping is suppressed.
      This threshold is also under your control.
  <dt>semantic options
  <dd>You can set track configuration sections up so that when a
      preset size threshold is exceeded, one configuration replaces
      another.
</dl>

<p>

The thresholds for labeling and bumping are set by configuration
options named "label density" and "bump density" respectively.  The
standard values can be found in the defaults track named [TRACK
DEFAULTS].  They are originally set so that labels are suppressed when
there are more than 25 features per track, and bumping is suppressed
when there are more than 100 features per track.  You can these values
globally by editing their values in [TRACK DEFAULTS], or you can add
"label density" and/or "bump density" options to individual track
configuration sections in order to override the settings for specific
tracks.

<p>

The process of setting up semantic options is a bit more interesting.
To illustrate, we will create semantic zooming for the [Alignments]
track ("Example Alignments").  We would like the track to shift from
showing the individual segments to showing solid rectangles when the
user is zoomed out to 30K and beyond, and turn bumping off when the
user is zoomed out to 45K and beyond.  The process is simple.  Beneath
the [Alignments] stanza, we add a stanza qualified for zoomlevels of
>= 30,000 and another stanza qualified for zoomlevels of >= 45,000:

<blockquote class="example"><pre>
[Alignments]
feature      = match
glyph        = segments
key          = Example alignments

[Alignments:30000]
glyph        = box
label        = 0

[Alignments:45000]
glyph        = box
bump         = 0
label        = 0
</pre></blockquote>

<p>

The format for semantic options is [<i>Trackname:distance</i>], where
<i>Trackname</i> must be the same as the non-qualified track, and
<i>distance</i> is the length of the region at which the semantic
options will kick in.  Only options that are different from the
non-qualified track need to be listed.  According to the configuration
given above, when the user is looking at a region 30,000 bp or longer,
the glyph option will change to "box," which is a solid rectangle that
doesn't show any internal details.  All other options, such as feature
and key, will be inherited from the [Alignments] track.

<p>

At 45,000 bp, the glyph is again set to box, and in addition the
"bump" option is set to zero, turning off collision control.  Notice
that options are inherited from the unqualified track stanza, and not
from the previous semantic zoom level.  If we had neglected to specify
the glyph option in [Alignments:45000], the glyph would have reverted
to "segments."

<p>

Make these changes to volvox.conf, turn on the "Example Alignments"
track, and view the contig at 20K, 40K and 50K.  At 40K, you'll see
the alignments lose their internal structure and be replaced by solid
boxes (Figure 21).  At 50K they'll begin to overlap and the feature
labels will be suppressed.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/semantic_zooming1.gif"><br>
<i>Figure 21: Semantically zoomed alignments at 40K</i>
</blockquote>

<h3><a name="grouping_tracks">3.4. Grouping Tracks</a></h3>

<p>The bottom of the GBrowse window contains an expandable set of
checkboxes that allows the users to turn tracks on and off. By
default, the tracks are grouped into sections corresponding to tracks
belonging to the overview panel, those belonging to the region panel,
tracks created by external (third-party) annotations, and tracks
created by plugins. By default, all other tracks are grouped together
in a catch-all section named "General."</p>

<p>As we have seen, you can easily define new track groups to make
navigation easier by adding a "category" option to each of the track
stanzas. This option defines the name of the category. You can extend
this into subcategories and sub-subcategories by separating category
names with a ":" character. For example:</p>

<blockquote class="example"><pre>
[Motifs]
feature      = polypeptide_domain
glyph        = span
height       = 5
description  = 1
category     = Genes:Translation
key          = Example motifs

[Translation]
glyph          = translation
global feature = 1
height         = 40
fgcolor        = purple
start_codons   = 0
stop_codons    = 1
category       = Genes:Translation
translation    = 6frame
key            = 6-frame translation

[Genes]
feature      	   = gene
glyph              = gene
bgcolor            = peachpuff
label_transcripts  = 1
draw_translation   = 1
label_transcripts  = 1
category           = Genes:Structure
key                = Protein-coding genes

[CDS]
feature      	   = CDS:predicted
glyph              = gene
bgcolor            = white
category           = Genes:Structure
key                = Predicted genes
</pre></blockquote>

<p>In this way we can create sections named "Alignments," "Examples,"
"Genes" and "Proteins" and assign the appropriate tracks to them. The
Tracks control section will look something like Figure 22:</p>


<blockquote class="center"> <img  style="border:solid;border-width:1px" src="figures/enhancements3.gif"><br> <i>Figure
22:</i> You can group tracks into categories, subcategories and
sub-subcategories to an arbitrarily level.  </blockquote>


<h3><a name="group_tables">3.5. Grouping Tracks into a Table</a></h3>

<p>A further refinement to display track information within the
 category is a table display with headings for the rows and columns
 (see Figure 23 for an example).  This layout is useful for displaying
 data that highlights the experimental design as in microarray or
 ChIP-on-Chip experiments.</p>

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/categorytable.png"><br>
<i>Figure 23:</i> An example of a category table containing a 9 track table,  
organized as 3 rows x 3 columns each with a heading. 
</blockquote>

<p>This was constructed by adding an option named "category tables" to
the [GENERAL] section. The first argument in this option refers to the
category you wish to add the table to, the second is a space separated
list of column headings, the third a space separated list of row
headings.</p>

<blockquote class="example"><pre>
# category table configuration
category tables = 'ArrayExpts' 'strain-A strain-B strain-C' 'temperature anaerobic aerobic'
</blockquote>

It is then important that your stanzas within the category are in
column followed by row order (see example below and compare with
Figure 25). So stanza 1 is column 1/row 1, stanza 2 is column 1/row 2,
stanza 3 is column 1/row 3, stanza 4 is column 2/row 1, stanza 5 is
column 2/row 2 etc. This means each cell in the table must have a
stanza. Any surplus tracks within that category will be ignored. For
example if there was a stanza 10, this would not be shown.  If there
are empty tracks they can be disabled using the 'disabled = 1' option
in the stanza. So to display the category table in figure 27 you would
use the following configuration.

<blockquote class="example"><pre>
[temp_strainA]
category       = ArrayExpts
feature        = temp_strainA_agg
glyph          = xyplot
bgcolor        = red
neg_color      = green
fgcolor        = black
graph_type     = boxes
height         = 80
min_score      = -2.0
max_score      = 2.0
scale          = both
key            = Temp strain A (1 expt)

[anaerobic_strainA]
category       = ArrayExpts
feature        = anaerobic_strainA_agg
glyph          = xyplot
bgcolor        = red
neg_color      = green
fgcolor        = black
graph_type     = boxes
height         = 80
min_score      = -2.0
max_score      = 2.0
scale          = both
key            = Anaerobic Strain A (0 expt)
disabled       = 1

[aerobic_strainA]
category       = ArrayExpts
feature        = aerobic_strainA_agg
glyph          = xyplot
bgcolor        = red
neg_color      = green
fgcolor        = black
graph_type     = boxes
height         = 80
min_score      = -2.0
max_score      = 2.0
scale          = both
key            = Aerobic Strain A (0 expt)
disabled       = 1


[temp_strainB]
category       = ArrayExpts
feature        = temp_strainB_agg
glyph          = xyplot
bgcolor        = red
neg_color      = green
fgcolor        = black
graph_type     = boxes
height         = 80
min_score      = -2.0
max_score      = 2.0
scale          = both
key            = Temp strain B (2 expts)

[anaerobic_strainB]
category       = ArrayExpts
feature        = anaerobic_strainB_agg
glyph          = xyplot
bgcolor        = red
neg_color      = green
fgcolor        = black
graph_type     = boxes
height         = 80
min_score      = -2.0
max_score      = 2.0
scale          = both
key            = Anaerobic Strain B (0 expt)
disabled       = 1

[aerobic_strainB]
category       = ArrayExpts
feature        = aerobic_strainB_agg
glyph          = xyplot
bgcolor        = red
neg_color      = green
fgcolor        = black
graph_type     = boxes
height         = 80
min_score      = -2.0
max_score      = 2.0
scale          = both
title          = blah
key            = Aerobic strain B (3 expts)

[temp_strainC]
category       = ArrayExpts
feature        = temp_strainC_agg
glyph          = xyplot
bgcolor        = red
neg_color      = green
fgcolor        = black
graph_type     = boxes
height         = 80
min_score      = -2.0
max_score      = 2.0
scale          = both
key            = Temp strain C (1 expt)

[anaerobic_strainC]
category       = ArrayExpts
feature        = anaerobic_strainC_agg
glyph          = xyplot
bgcolor        = red
neg_color      = green
fgcolor        = black
graph_type     = boxes
height         = 80
min_score      = -2.0
max_score      = 2.0
scale          = both
key            = Anaerobic strain C (3 expts)

[aerobic_strainC]
category       = ArrayExpts
feature        = aerobic_strainC_agg
glyph          = xyplot
bgcolor        = red
neg_color      = green
fgcolor        = black
graph_type     = boxes
height         = 80
min_score      = -2.0
max_score      = 2.0
scale          = both
key            = Aerobic strain C 3 (2 expts)
</blockquote>

<p>

If you need to have multiple category tables, simply use continuation
lines for the "category tables" option:

<blockquote class="example"><pre>
# category table configuration
category tables = 'ArrayExpts' 'strain-A strain-B strain-C' 'temperature anaerobic aerobic'
                  'CHiP-Chip'  'TFX1 ONE-CUT PHA4' '16-cell-stage 320-cell-stage adult'
</blockquote>

<h3><a name="plugins">3.6. Using Plugins</a></h3>

<p> Another cool GBrowse feature is its ability to take advantage of
plugins, which are small modules of Perl code that extend GBrowse in
various ways. In this section, we will show how to activate two
popular plugins, <i>RestrictionAnnotator</i> and <i>Aligner</i>.  The
first generates a track of restriction sites.  The second dumps a
text-based multiple alignment of the current region on view.

<p>

To see these plugins at work, first make sure that the database files
are up to date with this position in the tutorial.  If you are in any
doubt, remove the current contents of the volvox database directory
and replace them with the files <a
href="data_files/volvox_all.gff3">volvox_all.gff3</a> and <a
href="data_files/volvox_all.fa">volvox_all.fa</a>.  <p>

Now find the option "plugins=" at the top of volvox.conf, and modify
it to activate the Aligner and RestrictionAnnotator plugins (the
TrackDumper plugin is already turned on):

<blockquote class="example"><pre>
plugins = TrackDumper Aligner RestrictionAnnotator
</pre></blockquote>

<p>

When you reload the page, you will see a new popup menu appear under
the image labeled "Dumps, searches and other operations."  You will
also see an automatic track labeled "plugin:Restriction Sites" appear
in the track list.  When you turn on this track, you will be presented
with a restriction map (Figure 26).  You can then adjust which
restriction sites are shown by selecting "Annotate Restriction Sites"
from the popup menu and pressing the "Configure" button.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/plugins1.gif"><br>
<i>Figure 26: The RestrictionAnnotator Plugin</i>
</blockquote>

<p>

To see the Aligner at work, center your view on a region that contains
the EST alignments (for example, ctgA:1000..5000), select "Dump
Alignments" from the plugin popup menu, and press "Go".  This will
return a text-based multiple alignment of the genome and the EST
tracks.

<p>

The Aligner plugin has some additional configuration that you can
perform.  We'll look at this now as an example of how to configure
plugins. Open up volvox.conf and add the following configuration
section:

<p>

<blockquote class="example"><pre>
########################
# Plugin configuration
########################

[Aligner:plugin]
alignable_tracks   = EST
upcase_tracks      = CDS Motifs
upcase_default     = CDS
</pre></blockquote>

<p>

It doesn't matter where the section goes, but it is probably a good
idea to place this towards the middle of the file after the [GENERAL]
section (at the top) and before the [TRACK DEFAULTS] section.
Otherwise it is easy for you or someone else maintaining the
configuration file to mistake this for some sort of track
configuration.

<p>

Plugin configuration sections are distinguished from track
configuration by having names of the format
<b><i>PluginName</i>:plugin</b>.  In this case, the three
configuration options are applied to the Aligner plugin.  For the
Aligner plugin, the configuration options are:

<table border="1">
  <tr>
    <th>Option</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>alignable_tracks</td>
    <td>Space-delimited list of tracks to include in the multiple
	alignment.  The genome is always included.  If this option is
	not present, then GBrowse will automatically include any track
	that has the "draw_target" option set.
    </td>
  </tr>
  <tr>
    <td>upcase_tracks</td>
    <td>Space-delimited list of tracks that will be used to UPCASE the
	genomic DNA.  This is very useful if you want to embed the
	positions of coding regions or other features inside the
	multiple alignment.  Uppercasing will not be turned on by
	default.  The user must press the "Configure" button, and
	select which of the uppercase tracks are to be activated  from
	a list of checkboxes.
    </td>
  </tr>
  <tr>
    <td>upcase_default</td>
    <td>A space-delimited list of tracks that will be uppercased by
	default unless the user turns them off during configuration.
    </td>
  </tr>
  <tr>
    <td>ragged_default</td>
    <td>A small integer indicating that the aligner should include
	some unaligned bases from the end of each sequence.  This is
	useful for seeing the sequencing primer or cloning site in ESTs.
    </td>
  </tr>
</table>

<p>

With the changes in place, select the aligner from the popup menu and
press Configure.  Turn on uppercasing of the coding region track and
see how it affects the display (Figure 27).

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/plugins2.gif"><br>
<i>Figure 27: The Aligner plugin produces multiple alignments.</i>
</blockquote>

<p>

Plugin files live in /etc/gbrowse2/plugins. To view plugin
documentation, find the plugin file, which usually lives under
gbrowse.conf/plugins, and run the perldoc command with the -F ("file")
option:

<blockquote class="example"><pre>
% <b>perldoc -F Aligner.pm</b>
</pre></blockquote>

<p>

Here's the list of plugins that come with the standard distribution:

<table border="1">
  <tr>
    <th>Plugin</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>Aligner</td>
    <td>Dump multiple alignments</td>
  </tr>
  <tr>
    <td>AlignTwoSequences</td>
    <td>Execute NCBI's bl2seq on the current view (requires the bl2seq
	executable).</td>
  </tr>
  <tr>
    <td>AttributeHiliter</td>
    <td>Highlight (by colorizing) features whose attributes match some
	user-specified values.</td>
  </tr>
  <tr>
    <td>BatchDumper</td>
    <td>Allows the user to cut and paste a series of landmarks on the
	genome and dumps out all overlapping features using a variety
	of formats (e.g. GenBank format)
    </td>
  </tr>
  <tr>
    <td>Blat</td>
    <td>Plugin to align sequences against the genome using the BLAT
	algorithm (requires BLAT executable).
    </td>
  </tr>
  <tr>
    <td>CMapDumper</td>
    <td>Produces files that can be read by the <a
  href="http://www.gmod.org/wiki/index.php/Cmap">CMap comparative map browser.</a>
    </td>
  </tr>
  <tr>
    <td>CreateBlastDB</td>
    <td>Creates a Blast-formatted database from a GBrowse database.
    </td>
  </tr>
  <tr>
    <td>FastaDumper</td>
    <td>Produce pretty-printed FASTA dumps of the current region, with
	selected features highlighted with colors or font styles.
    </td>
  </tr>
  <tr>
    <td>FilterTest</td>
    <td>Small demonstration of how to write a plugin that filters
	features (makes them visible or invisible) based on arbitrary criteria.
    </td>
  </tr>
  <tr>
    <td>GeneFinder</td>
    <td>Runs Phil Green's genefinder gene prediction program within
	GBrowse (requires genefinder executable).
    </td>
  </tr>
  <tr>
    <td>GFFDumper</td>
    <td>Dump out the current region in GFF format (redundant with BatchDumper).</td>
  </tr>
  <tr>
    <td>OligoFinder</td>
    <td>Lets the user search for landmarks on the basis of unique
	11-mers or greater.</td>
  </tr>
  <tr>
    <td>PrimerDesigner</td>
    <td>Interactively design PCR primers (requires <a
	href="http://primer3.sourceforge.net/">primer3</a>
	executable).</td>
  </tr>
  <tr>
    <td>ProteinDumper</td>
    <td>Dump translated protein sequences of the current region in various formats</td>
  </tr>
  <tr>
    <td>RandomGene</td>
    <td>Small demonstration of how to connect a plugin to a gene
	prediction program. Doesn't actually predict genes, but
	generates simulated ones.</td>
  </tr>
  <tr>
    <td>RestrictionAnnotator</td>
    <td>Creates restriction maps.</td>
  </tr>
  <tr>
    <td>Spectrogram</td>
    <td>Generate DNA spectrograms to highlight low complexity regions,
	repetitive regions, coding regions and other regions with
	periodicity. Requires Math::FFT Perl module.</td>
  </tr>
  <tr>
    <td>Submitter</td>
    <td>Helper plugin for the rubber-band select menu. See <a
	href="http://www.gmod.org/wiki/index.php/GBrowse_Rubber_Band_Selection.pm">GBrowse
	Rubber-band selection</a>.</td>
  </tr>
  <tr>
    <td>test</td>
    <td>This dumps the current view in FASTA format, and is used
	for regression testing the plugin architecture.</td>
  </tr>
</table>

<hr>

<h2><a name="external">4. Adding Features from External Sources</a></h2>

<p>

It is often useful to have independent annotation data sets that can
be visualized together but updated separately.  For example, you may
be working on a genome that has a core set of stable annotations that
everyone shares, such as the set of protein-coding genes, and
independent sets of annotations that change frequently, such as
promoter predictions and experimental data.

<p>

GBrowse provides several mechanisms for making this type of modular
annotation possible.  You can:

<ol>
  <li>Upload one or more files of annotations temporarily, and view them in the
      context of the core annotations.  These annotations will be private to the
      user who uploads the annotations; others cannot see the data.
  <li>Put one or more GFF files in a web-accessible location, such as
      an FTP or Web site, and point GBrowse at it.  These annotations
      will be accessible to anyone who knows the correct URLs.
  <li>Point one GBrowse at another GBrowse.  All the tracks in the second
      instance of GBrowse will be available to the first GBrowse.
      This method uses the Distributed Annotation System (DAS) and can
      handle very large data sets.
</ol>

<p>

This section will lead you through the various ways to view third
party annotations on top of GBrowse. The examples are somewhat
contrived since we only have one computer to work with, and by
necessity both the main data and the third-party feature data will
have to reside on the same computer.  Don't be confused by this, and
keep in mind that in the real world, GBrowse will be running on one
computer, and the third-party annotation data will be loaded from
another network-accessible computer.

<h3><a name="upload">4.1. Uploading an Annotation File</a></h3>

<p>
First, we'll look at how to upload private tracks to the browser. This
method is intended for users who wish to view their own data in the
context of the genome.

<p>

Instead of using the artificial volvox data, we will now use some real
genome annotations from the <i>C. elegans</i> genome project.  This is
a region around <i>C. elegans</i> cosmid C01F4.  The core data that
we'll be using is contained in the files <a
href="data_files/elegans_core.gff3">elegans_core.gff3</a>, and <a
href="data_files/elegans.fa">elegans.fa</a>.

<p>

Refer back to the <a href="#basics">beginning of the tutorial</a> now
and create a GBrowse database directory named "elegans_core".  Copy <a
href="data_files/elegans_core.gff3">elegans_core.gff3</a>, and <a
href="data_files/elegans.fa">elegans.fa</a> into it. Remember to make
this directory writable by the web server user!

<p>

Copy the data source configuration file <a
href="conf_files/elegans_core.conf">elegans_core.conf</a> into <a
href="file:////etc/gbrowse2">/etc/gbrowse2/</a>, and create a suitable stanza in
GBrowse.conf to describe this new data source:

<blockquote><pre>
[elegans_core]
description = Core C. elegans annotations
path        = elegans_core.conf
</pre></blockquote>

<p>

Confirm that you can browse the database.  Figure 28 is a picture of
the entire data set with all core tracks turned on.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/third_party1.gif"><br>
<i>Figure 28:</i> The core <i>C. elegans</i> dataset.
</blockquote>


<p>

We will now add some third-party annotations to the display.  These
are contained in the files "elegans_acceptor.gff3",
"elegans_expression.gff3", "elefans_sts.gff3", "elegans_deletion.gff3", and
"elegans_repeats.gff3":

<p>

<table border="1" width="100%">
  <TR>
    <td>
	<a href="data_files/elegans_acceptor.gff3">elegans_acceptor.gff3</a>
    </td>
    <td>
	Annotations of <i>C. elegans</i> spliced leader acceptor sites.
    </td>
  </TR>
    
  <TR>
    <td>
	<a href="data_files/elegans_expression.gff3">elegans_expression.gff3</a>
    </td>
    <td>
	Positions assayed for gene expression level in <i>C. elegans</i>
	microarrays.
    </td>
  </TR>
    
  <TR>
    <td>
	<a href="data_files/elegans_sts.gff3">elegans_sts.gff3</a>
    </td>
    <td>
	Primer pairs available for the region produced by the
	<i>C. elegans</i> ORFeome project.
    </td>
  </TR>

  <TR>
    <td>
	<a href="data_files/elegans_deletion.gff3">elegans_deletion.gff3</a>
    </td>
    <td>
	Deletion endpoints from a targeted gene knockout project.
    </td>
  </TR>
    
  <TR>
    <td>
	<a href="data_files/elegans_repeats.gff3">elegans_repeats.gff3</a>
    </td>
    <td>
	Complex repetitive elements found using the RepeatMasker program.
    </td>
  </TR>
    
</table>

<p>

We can load each of these files to private storage located on the
server using the file upload feature.  Copy these five files to your
home directory where you can find them easily.  Go to the tab named
<b>Upload and Share Tracks</b> and choose the link marked "Add custom
track(s) [From a file]". When the upload button appears, press it,
select one of the annotation files, and then press the "Upload" button
to upload the file to the server.  The annotations contained in the
file should now appear on the display.  If you now do this for all
five of the annotation files, you will eventually get a display like
that shown in Figure 29.

<p>

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/third_party2.gif"><br>
<i>Figure 29:</i> After uploading five annotation files.
</blockquote>

<p>

<blockquote> NOTE: <i>This upload function works even if the gbrowse
you are uploading to is located on a remote server.  The uploaded
files are stored in a private directory on the server away from the
main data set.  Other users cannot see your data, unless you
explicitly share the track as described later.</i> </blockquote>

<p>

Although this display is functional, the tracks are fairly uniform in
appearance.  Fortunately, we can customize the uploaded files quite
easily. One way is to use the graphical configuration editor. Find the
"deletion:Allele" project track and click on the "?" next to its
name. This will bring up a graphical settings block as shown in Figure
30. Change the glyph to "triangle" and the fill and line colors to
"red". When you press the "Change" button the track will be updated
with the desired settings.

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/setting_popup.png"><br>
<i>Figure 30:</i>You can change the appearance of a custom track by
clicking on the "?" next to the track.
</blockquote>

This customization only takes place for the current user. If you share
this track with someone else, he will only see the original, default,
appearance of the track.

<p>

Let us change the "elegans_sts.gff3" file so that the primer pairs use
the "primers" glyph. Go back to the "Upoad and Share Tracks" tab,
scroll down to the file named "elegans_sts.gff3" and click the
"[edit]" link next to the line labeled "Configuration" (Figure
31). You will see a now-familiar GBrowse configuration stanza.

<p>

<blockquote class="center">
<img  style="border:solid;border-width:1px" src="figures/config_editing.png" style="border:solid;border-width:1px"><br>
<i>Figure 31:</i>You can configure a track in-place using the built-in configuration editor.
</blockquote>

Now edit the configuration stanza to change the glyph to "primers" and
the key to "PCR primers generated by the Orfeome project:"

<blockquote><pre>
[reagent_1]
feature   = reagent:Orfeome_project
glyph     = primers
bgcolor   = yellow
fgcolor   = black
label     = 1
connector = solid
balloon hover = $description
category    = My Tracks:Uploaded Tracks:elegans_sts.gff3
key         = PCR primers generated by the Orfeome Project
</pre></blockquote>

Now click "Submit."  After a few moments, the track will reconfigure
and the changes will be permanent. If you mess up the config file, you
can either reload the file from scratch, or click the "[edit]" link
next to the uploaded file name itself. This will bring up a similar
in-place editor with the original file's contents. When you click
"Submit" the automatic configuration file will be regenerated.

<p>

<i><b>Hint:</b> You can also place configuration stanza(s) at the top
of the uploaded file. GBrowse will use them instead of generating its
own.</i>

<p>

When you are done, press "Submit Changes..." and the display will be
updated to show the track with a more readable track name and the
primers glyph. If you like, you can customize each of the files.  Here
is a suggested set of customizations:

<blockquote class="example"><pre>
# for the file elegans_repeats.gff3
[Repeats]
feature = repeat
bgcolor = white
key     = Complex repeats

# for the file elegans_acceptor.gff3
[Acceptors]
feature = trans-splice_acceptor_site
glyph   = diamond
bgcolor = red
key     = Trans-splice Acceptors

# for the file elegans_deletion.gff3
[Deletions]
feature=deletion
glyph  = span
key    = Gene knockouts

# for the file elegans_expression.gff3
[Expression]
feature = microarray_oligo
bgcolor = orange
height  = 4
key     = Microarray expression probe
</pre></blockquote>

<blockquote><b>Note: </b><i>For security reasons Perl subroutines are
not allowed in the configuration sections of uploaded files.  However
links and link patterns are allowed.</i></blockquote>

<p>

There is no particular reason that each of the annotation sets were
broken into separate files.  We could easily combine them into a
single GFF file just as you do for the core annotations.

<h3><a href="sharing">4.2. Sharing an Annotation File</a></h3>

<p>

Once you have an uploaded annotation file set up the way you like, you
might want to share it with others. This is easily done by going to
the main GBrowse tab, finding the track you want to share, and
clicking on the icon that looks like an RSS feed (<img
src="../images/buttons/share.png">). This will pop up a balloon
containing a URL. Cut and paste this URL into an email message. When
the recipient clicks on it, he will be taken to GBrowse with your
track displayed. You can do this for several tracks.

<p>

Another way to load a file into GBrowse is to put a copy of it on a
local web or FTP server and then <b>import</b> it by URL. This
requires that you have access to a web or FTP server, but if you are
working with GBrowse, then we can assume you do!

<p>

To watch this in action, we will place one of the annotation files
onto the local web server and then load it from within the local
GBrowse.  This contrived example doesn't make much sense until you
realize that the same trick will work when the GBrowse server and the
web-accessible annotation file can be on separate machines halfway
across the world.

<p>

We will demonstrate using a new version of the elegans_sts.gff3 file.
Create a file named "test_annotations.gff3" in the directory "/var/www/gbrowse2".
This will place it at the top of the Web server document tree, but
<b>outside</b> the location of GBrowse databases.  This file should
contain these lines:

<blockquote><pre>
[Orfeome Primers]
feature = reagent
glyph   = primers
height  = 6
key     = ORFeome project primer pairs

##gff-version 3
C01F4	Orfeome_project	reagent	3319	17668	.	+	.	Name=mv_ZK783.1;amplified=0
C01F4	Orfeome_project	reagent	18584	20445	.	-	.	Name=mv_G_YK5686;amplified=1
C01F4	Orfeome_project	reagent	24509	25425	.	-	.	Name=mv_ZK783.3;amplified=1
C01F4	Orfeome_project	reagent	26525	33359	.	-	.	Name=mv_ZK783.4;amplified=0
C01F4	Orfeome_project	reagent	38660	49506	.	+	.	Name=mv_C18H2.1;amplified=1
</pre></blockquote>

<i><b>NOTE:</b> The fields must be separated by tabs, not spaces.</i>

<p>

Confirm that the file is correctly installed by fetching the URL
"http://<i>your.web.server</i>/gbrowse2/test_annotations.gff3".

<p>

Now go back to your browser, open the "Upload and Share Tracks" tab,
and click on "[Import a track]" at the bottom of the page. This will
open up a text field. Type in the URL you used above and click
"Upload." The contents of the file will now be visible as a new
"ORFeome project primer pairs" track on the browser. The neat thing
about this is that whenever you change the file on the web server, the
track changes as well.

<p>

You can use this feature to share custom tracks without uploading your
data to the browser. Simply send the URL of the file to your
colleagues, and instruct them to import them into GBrowse in the way
you just did yourself.

<!--

January 2010: DAS SUPPORT IS CURRENTLY PENDING PORTING OF THE DAS
SCRIPT TO THE 2.0 MULTI-DATABASE ARCHITECTURE, SO THIS PART OF THE
TUTORIAL IS COMMENTED OUT.

<h3><a name="DAS">4.3 Using GBrowse as a DAS Server or Client</a></h3>

<p>

The Distributed Annotation System protocol (DAS; <a
href="http://www.biodas.org">http://www.biodas.org</a>) is a system
for exchanging genomic annotations across the Internet.  It works
similarly to the idea of sharing the URLs of web-accessible GFF files,
except that it is designed to support large data sets.  When a client
application needs to fetch just a subset of the data, such as a small
piece of a chromosomal arm, the DAS protocol allows only the relevant
annotations to be retrieved, rather than the whole data set.

<p>

To take advantage of DAS functionality, you will have to install the
Perl Bio::Das module.  This is available from CPAN (the Comprehensive
Perl Archive Network (<a href="http://www.cpan.org"
target="_new">http://www.cpan.org</a>) or from the GMOD PPM
repository.  Unix users can install Bio::Das with this command:

<blockquote><pre>
% <b>perl -MCPAN -e 'install Bio::Das'</b>
</pre></blockquote>

<p>

Windows users can use the PPM tool:

<blockquote><pre>
C:\Windows&gt; <b>ppm</b>
ppm&gt; <b>install Bio::Das</b>
ppm&gt; <b>quit</b>
</pre></blockquote>

You may need to issue the command "rep add gmod
http://www.gmod.org/ggb/ppm" if PPM complains that it cannot find
Bio::Das.

<p>

When you installed GBrowse, you also installed a CGI script that
enables your web server to act as a DAS server. The CGI script is
named "/usr/lib/cgi-bin/gb2/das", and it runs off the same configuration files as
GBrowse itself.  Only a very small bit of extra configuration is
required to enable full DAS server functionality.  In this part of the
tutorial we will first turn on the DAS server, and then use it to
serve out annotations on the <i>C. elegans</i> database.

<p>

To start, open the elegans_core.conf configuration file and add the
following line to the configuration file.  It can go anywhere before
the start of the track definition stanzas, but it is probably a good
idea to place it towards the top between "plugins" and "default
features."

<blockquote>
<pre>
# DAS reference server
das mapmaster      = SELF
</pre>
</blockquote>

<p>

What this line is doing is to declare to the DAS system that our
server is authoritative for the coordinates on the current
<i>C. elegans</i> genome example.  This is appropriate if you are
starting out a genome for the first time.  If, however, you want to
annotate against an existing set of genome coordinates, you should
replace SELF with the URL of the DAS reference server that serves that
genome.  For example release <i>hg16</i> of the human genome at UCSC
corresponds to DAS URL http://genome.cse.ucsc.edu/cgi-bin/das.  A list
of reference servers for various model organisms can be found at <a
href="http://www.biodas.org">http://www.biodas.org</a>.

<p>

The next step is to go through the configured tracks and add a "das
category" to each of them.  DAS uses the idea of the "category" of a
feature in order to filter sets of features by their purpose.
Categories include:

<table border="1">
  <tr>
    <th>transcription</th>
    <td>features that have to do with
	RNA transcription</td>
  </tr>
  <tr>
    <th>translation</th>
    <td>features that have to do with
	protein translation and function</td>
  </tr>
  <tr>
    <th>variation</th>
    <td>mutations, deletions, polymorphisms</td>
  </tr>

  <tr>
    <th>structural</th>
    <td>contigs, clones, reads, PCR primers</td>
  </tr>

  <tr>
    <th>repeat</th>
    <td>repetitive elements</td>
  </tr>

  <tr>
    <th>experimental</th>
    <td>a catch-all for experimental data</td>
  </tr>

  <tr>
    <th>miscellaneous</th>
    <td>anything that doesn't fit in one fo the other categories</td>
  </tr>


</table>

<p>

Find the [Transcripts] stanza and modify it to to have a das category
of "transcription" as shown here:

<blockquote><pre>
[Genes]
feature      = gene
glyph        = gene
height       = 8
bgcolor      = blue
description  = 1
das category = transcription
key          = Protein-coding genes
</pre>
</blockquote>

Similarly, modify the [Alignments] track to have a das category of
"similarity."  You do not need to add a category to the DNA track, as
it is treated specially by das.  You're all done!  Be sure to save the
configuration file before you try the next step.

<p>

Using a web browser fetch the URL <a
href="/cgi-bin/das/dsn"
target="_new">http://localhost/cgi-bin/das/dsn</a>.  This will return
an XML document giving information about each of the data sources that
you have configured. 

<blockquote><pre>
&lt;?xml version="1.0" standalone="yes"?&gt;
&lt;!DOCTYPE DASDSN SYSTEM "http://www.biodas.org/dtd/dasdsn.dtd"&gt;
&lt;DASDSN&gt;
   &lt;DSN&gt;
      &lt;SOURCE id="elegans_core"&gt;elegans_core&lt;/SOURCE&gt;
      &lt;MAPMASTER&gt;http://localhost/cgi-bin/das/elegans_core&lt;/MAPMASTER&gt;
      &lt;DESCRIPTION&gt;C. elegans Core Annotations&lt;/DESCRIPTION&gt;
   &lt;/DSN&gt;
&lt;/DASDSN&gt;
</pre></blockquote>

This is showing that there is one configured DAS source, the
"elegans_core" data set.

<p>

Next test that the DAS "types" request is working.  This request
returns all the feature types that the database knows about.  Using a
web browser fetch the URL <a href="/cgi-bin/das/elegans_core/types"
target="_new">http://localhost/cgi-bin/das/elegans_core/types</a>.
This should return another short document confirming that the "gene"
and "EST_match:BLAT_EST_BEST" feature types are available.

<p>

The final test that the DAS server is performing correctly is to
browse to the <a href="/cgi-bin/gbrowse/elegans_core"
target="_new">elegans_core</a> database and to <b>turn off</b> all the
tracks except for DNA/GC content.  This should give you an empty
details panel.  Now scroll down to the first empty URL entry field and
type in <b>http://localhost/cgi-bin/das/elegans_core</b> and press
"Update URLs."  The page should now reload and display the gene models
and the EST alignments.  However, the data is now not coming directly
from the local database, but from the database via the DAS protocol.

<h4><a name="das_combining">4.3.1. Combining Databases with DAS</a></h4>

We can now use DAS to integrate the core gene model and EST alignment
annotations with the STSs, expression data, trans-splice acceptors and
other third party annotations.  To do this, we will create a GBrowse
database that contains the third party annotations, but not the core
data.  This new database will be used as a DAS source.

<p>

Create a new database directory called <i>elegans_extra</i> in the
"/var/lib/gbrowse2/databases" directory, and add to it a copy of the
file <a href="data_files/elegans_extra.gff3">elegans_extra.gff3</a>.
This GFF file is simply the result of concatenating together the
individual annotation files we looked at earlier (elegans_sts.gff3,
etc), and removing the redundant comment lines from the top of the
file.  Now copy the configuration file <a
href="conf_files/elegans_extra.conf">elegans_extra.conf</a> into the
/etc/gbrowse2/ directory.  Have a look at this config file, and
note that it contains the appropriate "das mapmaster" and "das
category" configuration objects.

<p>

Once the config file is installed, confirm that you can browse the
extra annotations by fetching <a
href="/cgi-bin/gbrowse/elegans_extra"
target="_new">http://localhost/cgi-bin/gbrowse/elegans_extra</a>.

<p>

Now we're ready to layer the extra annotations onto the core
annotations using DAS.  Open up a browser window on the <a
href="/cgi-bin/gbrowse/elegans_core"
target="_new">http://localhost/cgi-bin/gbrowse/elegans_core</a>
database.  Delete any URLs that are already listed in the "Add remote
annotations" area, and add the URL
"http://localhost/cgi-bin/das/elegans_extra."  When you reload, the
core annotations will be shown on top, and the annotations from the
elegans_extra database will be shown in four tracks at the bottom of
the display.

<p>

The power of this feature is that we can use it across the Internet to
integrate databases that are independently maintained.  For example,
try adding the DAS URL
http://dev.wormbase.org/db/seq/das/elegans_even_more, and see what
appears.

<p>

By default, when you enter a DAS URL, the system will load all the
feature types that the DAS server makes available.  If this is not
desirable, you can limit the tracks by type and/or category.  To find
out what feature types a DAS server supports, retrieve a URL like the
following: <a
href="/cgi-bin/das/elegans_extra/types">http://localhost/cgi-bin/das/elegans_extra/types</a>.
This will provide a list of feature type names and their functional
categories.  From this we can see that the elegans_extra database
exports types of "repeat", "trans-splice_acceptor," "Deletion_allele,"
and "Expression."  Of course, we already knew this since we set the
database up ourselves!

<p>

Using this information, you can now limit the number of tracks
retrieved from the DAS server to just those that are of interest to
us. In the "Add remote annotations" text field, replace the current
DAS URL with this one:
http://localhost/cgi-bin/das/elegans_extra?type=repeat.  When you
reload, you will see only the repeat track and not the other three.

<p>

What if we want to see two of the four tracks?  We just add additional
type= sections, separated by semicolons.  To see both the "repeat" and
"Expression" tracks, we could request
http://localhost/cgi-bin/das/elegans_extra?type=repeat;type=Expression
(Figure 33).

<blockquote>
<img  style="border:solid;border-width:1px" src="figures/DAS1.gif"><br> <i>Figure 33:</i>
The <i>C. elegans</i> core annotations database with the "repeat" and
"Expression" tracks  superimposed on it using DAS.
</blockquote>

<p>

To fetch features that match a particular category, we can add the
category= option to the URL.  For example, to fetch only features that
have to do with RNA transcription, you can request
http://localhost/cgi-bin/das/elegans_extra?category=transcription.

<p>

We can take advantage of this feature to add a menu of external DAS
annotations to the browser.  Open
"/etc/gbrowse2/elegans_core.conf" and insert the following
section right after the "plugins =" line:

<blockquote><pre>
# remote DAS data to make available for optional loading
remote sources =
   "DAS mRNA features"      http://localhost/cgi-bin/das/elegans_extra?category=transcription
   "DAS protein features"   http://localhost/cgi-bin/das/elegans_extra?category=translation
   "DAS repeat features"    http://localhost/cgi-bin/das/elegans_extra?category=repeat
   "DAS variation features" http://localhost/cgi-bin/das/elegans_extra?category=variation
   "DAS experimental features" http://localhost/cgi-bin/das/elegans_extra?category=experimental
</pre></blockquote>

When you reload, the page will now show a popup menu of pre-defined
DAS sources that users can choose.  The DAS sources can be local, as
shown here, or located on one or more remote web sites.

<h4><a name="das_exporting">4.3.2. Exporting DAS Tracks to Ensembl and other Genome Browsers<a></h4>

<p>

GBrowse DAS tracks can be layered onto <a
href="http://www.ensembl.org">Ensembl</a> and other DAS-aware genome
browsers. There are a couple of things to bear in mind:

</p>

<ol>
  <li>Only the tracks explicitly labeled with "das category" will be
      exported.
  <li>The range of glyphs supported by Ensembl is more limited than GBrowse.
</ol>

<p>The last is a gotcha. The official list of DAS-recognized glyphs
can be found <a
href="http://www.biodas.org/documents/spec.html#glyphid">here</a>, but
gbrowse has a larger number of glyphs. Because of this, DAS-exported
features may not look on Ensembl the way they look on GBrowse. There
are three workarounds for this:

<dl>
  <dt>The <i>das flatten</i> option
  <dd>Set this option to flatten a multi-part feature, such as a gene,
  into a simpler "flat" structure that will display correctly on the
      Ensembl contig viewer. Also be sure to specify "grouping true"
      when you configure Ensembl for this DAS source.
  <dt>The <i>das glyph</i> option
  <dd>Set this option in an exported track stanza in order to force
  the glyph
      to a standard DAS glyph, such as "box". For example:
      <blockquote><pre>
      das glyph = box
      </pre></blockquote>
  <dt>The <i>das type</i> option
  <dd>Ensembl and possibly other browsers treat certain feature types
      specially. In particular, if a feature has a type of "gene" then
      Ensembl will display it with angled introns. Set <i>das type</i>
      in a track stanza to force the reported type to one of these
      special values. Example:
      <blockquote><pre>
      das type = gene
      </pre></blockquote>
</dl>

<h4><a name="das_entire">4.3.3. Running GBrowse off DAS Entirely</a></h4>

<p>

If you wish, you can even run GBrowse off a remote DAS server entirely
and keep no data locally (or just maintain private annotation tracks).
This works by replacing the Bio::DB::GFF database adaptor that we have
been using up to now with an adaptor named "Bio::Das". However,
because of a poorly characterized interaction between the Bio::Das
module and Perl 5.6, it is recommended that you use Perl 5.8.1 or
higher for this. Otherwise you may experience out of memory errors.

<p>To watch this in action, we will run GBrowse off the UCSC genome
browser, which exports its data in DAS format.

<p>

We will need a configuration file to do this.  DAS-based configuration
files are almost identical to the ones we have been using up to now
for local databases.  The main change is to replace the "db_adaptor" and
"db_args" options with ones appropriate for the DAS data source.  For
example, for the "hg16" human genome database maintained at UCSC, the
appropriate options will be:

<blockquote><pre>
[GENERAL]
description   = Human July 2003 Genome at UCSC
db_adaptor    = Bio::Das
db_args       = -source http://genome.cse.ucsc.edu/cgi-bin/das
	        -dsn    hg16
</pre></blockquote>

Conveniently enough, recent versions of the GBrowse distribution
include a utility called "make_das_conf.pl" that will build a basic
DAS browser configuration file for you.  This utility was installed
for you when you installed GBrowse. To run it, you will need to know
the base URL of the DAS server you're going to display.  For our
example, we'll use the UCSC DAS server at
http://genome.cse.ucsc.edu/cgi-bin/das.

<p>

This is a command-line utility.  To find out the databases served by
UCSC, type in the following command at the Unix or Windows command
line:

<blockquote><pre>
% <b>make_das_conf.pl http://genome.cse.ucsc.edu/cgi-bin/das</b>
The following DAS URLs are available at this server.  Please call the script again
using one of the following URLs:

http://genome.cse.ucsc.edu/cgi-bin/das/dm1
	Fruitfly Jan. 2003 Genome at UCSC

http://genome.cse.ucsc.edu/cgi-bin/das/hg13
	Human Nov. 2002 Genome at UCSC

http://genome.cse.ucsc.edu/cgi-bin/das/hg15
	Human April 2003 Genome at UCSC

http://genome.cse.ucsc.edu/cgi-bin/das/hg16
	Human July 2003 Genome at UCSC

http://genome.cse.ucsc.edu/cgi-bin/das/rn3
	Rat Jun 2003 Genome at UCSC
[... many many more ...]
</pre></blockquote>

We're looking for the hg16 release, so we reissue make_das_conf.pl
again using UCSC DAS server's URL with the hg16 release number
appended to the end:

<blockquote><pre>
% <b>make_das_conf.pl http://genome.cse.ucsc.edu/cgi-bin/das/hg16</b>
[GENERAL]
description   = Human July 2003 Genome at UCSC
db_adaptor    = Bio::Das
db_args       = -source http://genome.cse.ucsc.edu/cgi-bin/das
	        -dsn    hg16

# examples to show in the introduction
examples = 10 10_random 11 12 13 13_random 14 15 15_random
      16 17 17_random 18 18_random 19 19_random 1 1_random
      20 21 22 2 2_random 3 3_random 4 4_random 5 5_random
      6 6_random 7 7_random 8 8_random 9 9_random M
      Un_random X X_random Y

das mapmaster = http://genome.cse.ucsc.edu:80/cgi-bin/das/hg16

aggregators = ECgene{ECgene}
       affy10K{affy10K}
       affyGeno{affyGeno}
       affyRatio{affyRatio}
       affyTranscriptome{affyTranscriptome}
       affyU133{affyU133}
       affyU95{affyU95}
[...much much more...]
</pre>
</blockquote>

If you tried this at the command line, you saw a lot of text scroll up
your screen and disappear forever.  Run the command again, and this
time <b>redirect</b> its output into a new configuration file named
"ucsc_hg16.conf":

<blockquote><pre>
% <b>make_das_conf.pl http://genome.cse.ucsc.edu/cgi-bin/das/hg16 &gt;/etc/gbrowse2/ucsc_hg16.conf</b>
</pre></blockquote>

That should be all you need to do, unless you are behind a firewall
that uses an HTTP proxy. In this case, you will need to edit the
"db_args" option in the generated configuration file to include a
-proxy option. This tells gbrowse to fetch the remote data using the
indicated proxy. For example:

<blockquote><pre>
[GENERAL]
description   = Human July 2003 Genome at UCSC
db_adaptor    = Bio::Das
db_args       = -source http://genome.cse.ucsc.edu/cgi-bin/das
	        -dsn    hg16
                -proxy  http://my.proxy.address
</pre></blockquote>

Try browsing the new data source by requesting <a
href="/cgi-bin/gbrowse/ucsc_hg16"
target="_new">http://localhost/cgi-bin/gbrowse/ucsc_hg16</a>, and you
should be able to browse through a rudimentary version of the Human
genome display.

<p>

Once you have a basic configuration file for a remote DAS source, you
can pretty it up by changing track styles, key names, and so
forth. Bear in mind that the make_das_conf.pl does its best to guess
about the right landmarks to use in the list of examples in the
instructions, which feature types should be made the defaults for
searching, and how to aggregate multi-part features together.  You
will almost certainly need to customize these options to meet your
needs.

<h4><a name="das_aggregators">4.3.4. Fixing DAS Displays with Aggregators</a></h4>

<p>

Going back to the elegans_core.conf file used at the beginning of this
section, you may have noticed an option called "aggregators" in the
elegans_core.conf file. This option overcomes a limitation of DAS,
which is that it is best for sharing simple one-part features that do
not have subparts. When DAS exports a multipart feature, it breaks the
feature up into its component bits and sends them
individually. Aggregators tell GBrowse how to reassemble the pieces
correctly. Without aggregators declared in GBrowse, you wouldn't see a
whole gene, but instead a bunch of pieces labeled "mRNA", "CDS", etc.

<p>

The elegans_core.conf aggregators line looks like this:

<blockquote class="example"><pre>
aggregators = EST_match{EST_match} gene{CDS,five_prime_UTR,three_prime_UTR/mRNA}
</pre></blockquote>

<p>

This is defining two aggregators, one named "EST_match" and the other
named "gene". The first one is easier. The syntax is:
"<i>type_of_reassembled_feature</i>{<i>type_of_subparts</i>"}. In
other words, this aggregator is telling GBrowse to look for a bunch of
features of type "EST_match" and replace them with a single feature
also named "EST_match".

<p>

Genes are more complex because they have several internal parts and
two levels of nesting with mRNAs at the top and CDS and UTR features
underneath them (refer back to the GFF3 file for details). In this
case, the aggregator uses the full syntax:
"<i>type_of_reassembled_feature</i>{<i>subpart1,subpart2,subpart3...</i>/<i>type_of_main_part</i>"}. In
other words, look for features of type "mRNA" and aggregate them
together with features of type "CDS," "five_prime_UTR," and
"three_prime_UTR." The reassembled feature will be of type "gene."

<p>

An unfortunate limitation on GBrowse DAS support is that the
aggregators must be defined in the instance of GBrowse that is
<b>displaying</b> the DAS annotations rather than the instance that is
producing them. This means that you have to customize GBrowse
in advance according to the DAS sources you wish to display. If you
find this limiting, you may wish to check out the experimental <a
href="http://gmod.org/wiki/index.php/Gbgff">Gbgff annotation-sharing system.</a>

<p>

More information about using aggregators can be found in the GFF2
tutorial located at <a href="dbgff/tutorial.html">Using GBrowse
with Bio::DB::GFF.</a>

-->

<hr>


<h2><a name="other_backends">5. Using Other Backends</a></h2>

<p>Till now, we've been using the Bio::DB::SeqFeature::Store in-memory
adaptor. This adaptor is suitable for small databases, but does not
scale well to realistically-sized genomes. This section will show you
how to create large genome annotation databases using the Berkeleydb
and Mysql adaptors. For a full-featured genome database that includes
annotations of gene structure and function, as well as genetic maps,
diversity information and phenotypic information, be sure to check out
the <a href="http://gmod.org/Chado">Chado
database</a> which is significantly more feature-rich than those
described here.</p>

<h3><a name="berkeleydb">5.1. The Berkeleydb Backend</a></h3>

<p>

The in-memory database is great for smaller data sets, and can handle
GFF files of up to about 20,000 features (more if you have lots of
memory).  For larger data sets, however, you'll want to use a database
management system.  GBrowse handles a number of DBMS through its
"database adaptor" system.  This section shows how to use the
Bio::DB::GFF berkeleydb adaptor that comes for free when you install
BioPerl; this will enable you to create databases of 10 million or
more features. The next section shows you how to install a MySQL
relational database that will support even larger data sets.  You may
skip these sections and move on to working with third-party
annotations if you do not wish to install a berkeleydb-based server at
this time.

<p>

The Berkeleydb database adaptor comes with BioPerl 1.51 or higher
(still under development at the time this tutorial was written). If
you have an older version of BioPerl, GBrowse will install the adaptor
for you. As its name implies, this adaptor uses the Berkeleydb
database system (http://www.sleepycat.com) to create indexed database
files from GFF feature files. The adaptor also requires the Perl
DB_File interface to Berkeleydb. If you are using a Linux or Mac OSX
system, you almost certainly have both Berkeleydb and DB_File already
installed. For Windows users of ActiveState Perl, you should confirm
that DB_File is installed by running the following command: </p>

<blockquote class="example">
<pre>
C:\&gt; perl -MDB_File -e 'print $DB_File::VERSION'
</pre>
</blockquote>

<p>
If this prints out a number, then you are golden. If you get an error,
you should reinstall DB_File by running the PPM tool:</p>

<blockquote class="example"><pre>
C:\&gt; ppm
PPM interactive shell (2.1) - type 'help' for available commands.
PPM> install DB_File
</pre></blockquote>

<p>It is an extremely simple task to convert an existing in-memory
database to use the Berkeleydb database. We will now convert the
Volvox example database to Berkeleydb.</p>

<p>Take the most recent version of the volvox.conf configuration file,
and edit the top few lines of the new file so that it looks like
this:</p>

<blockquote class="example"><pre>
[GENERAL]
description   = Volvox Berkeleydb Database
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor berkeleydb
	        -dir     '/var/lib/gbrowse2/databases/volvox'
</pre></blockquote>

<p> We made just two changes. First, we changed the description of the
database to "Volvox Berkeleydb Database" to distinguish it from the
in-memory database. Second, we changed the value of the
<b>-adaptor</b> option from "memory" to "berkeleydb".
</p>

<p>Now reload the volvox page in your browser. There will be a slight
delay as the Berkeleydb adaptor constructs its indexes, and then the
page will reappear. You should now be able to browse and search the
database exactly as before. Depending on how fast the memory adaptor
was to begin with, you may not notice a speed improvement; however,
with large GFF files, the performance improvement will be very
marked.</p>

<p>If you look in the volvox database directory, you will see a new
subdirectory named "index". This contains a set of index files that
allow gbrowse to find features quickly. They are automatically created
and updated as needed when the underlying GFF or FASTA files are
changed.</p>

<p>If you get an "Internal Server Error" or similar message, check the
server error log file for messages that explain what went wrong. The
most common problem is that the volvox database directory is not
writeable by the web server user. As described earlier, this directory
must be "world writeable" in order to allow the web server to create
and maintain the databases</p>

<h4><a name="bp_seqfeature_load">5.1.1. The bp_seqfeature_load.pl script</h4>

<p>Although it is convenient to maintain the Berkeleydb indexes
automatically, this mechanism has a number of disadvantages. One
disadvantage is that this mechanism requires the database directory to
be world writeable (or at least writeable by the web user), which may
not be acceptable in some installations. Another disadvantage is that
the indexing may take a long time, up to 10 minutes for a GFF
databases containing a million lines. Some web servers will time out
during this process. For large databases, it is better to explicitly
create the database index files using the <i>bp_seqfeature_load.pl</i>
program.</p>

<p><i>bp_seqfeature_load.pl</i> is a BioPerl utility that is described
in more detail in <a href="#mysql">The MySQL Backend</a>. It
takes as its input a series of GFF and FASTA files and creates the
appropriate database files. To see how to use it, we will create a
fresh database directory. Go to the GBrowse database located at
<b>/var/lib/gbrowse2/databases</b> and create a new subdirectory called
"volvox_bdb:"</p>

<blockquote class="example"><pre>
 % cd /var/lib/gbrowse2/databases
 % mkdir volvox_berkeley
</pre></blockquote>

<p>

On Windows systems you can use the file manager to create this new
folder.

<p>

You do <b>not</b> have to make this directory world writeable, but it
should be readable and executable by the user that the web server runs
as. Now enter the tutorial data files directory
(/var/www/gbrowse2/tutorial/data_files) and load the GFF and sequence
files using the following command:</p>

<blockquote class="example"><pre>
<b>% bp_seqfeature_load.pl -c -a berkeleydb -f -d /var/lib/gbrowse2/databases/volvox_berkeley volvox_all.fa volvox_all.gff3</b>
loading volvox_all.fa...
                                                                                
Building object tree... 0.00s
Loading bulk data into database... 0.01s
load time:  0.02s
loading volvox_all.gff3...
                                                                                
Building object tree... 0.00s
Loading bulk data into database... 0.00s
load time:  0.08s
</pre></blockquote>

The arguments to <i>bp_load_gff.pl</i> are:

<table>
  <tr>
    <td><b>-a</b></td>
    <td>Use the berkeleydb database <b>a</b>daptor.
  </tr>
  <tr>
    <td><b>-c</b></td>
    <td><b>c</b>lear (initialize) the database
  </tr>
  <tr>
    <td><b>-f</b></td>
    <td>use the <b>f</b>ast loading option
  </tr>
  <tr>
    <td><b>-d /var/lib/gbrowse2/databases/volvox_berkeley</b></td>
    <td>Load the data into the indicated <b>d</b>atabase directory.
  </tr>
  <tr>
    <td><b>volvox_all.fa volvox_all.gff3</b></td>
    <td>The data files to load.
  </tr>
</table>

<p> If all goes well, this will create the index files in
<tt>/var/lib/gbrowse2/databases/volvox_bdb</tt>. If you look in that
directory now, you'll see a series of index files.</p>

<p>
The last step is to modify the volvox.conf to point to this
directory. Open it in a text editor and modify the top part so that it
looks like this:
</p>

<blockquote class="example"><pre>
[GENERAL]
description   = Volvox Berkeleydb Database
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor berkeleydb
	        -dsn    '/var/lib/gbrowse2/databases/volvox_berkeley'
</pre></blockquote>

<p>The change here is to replace the <b>-dir</b> argument with
<b>-dsn</b> ("data source name").  This tells the Berkeleydb adaptor
that pre-made index files can be found in the indicated directory. It
will not attempt to update the index files automatically.</p>

<p>If you wish to update the indexes with new GFF or sequence data,
you should run the bp_load_gff.pl script again to update the
indexes. Using the <b>-c</b> flag will reinitialize the indexes from
scratch, erasing whatever was there before. Without this flag, the
provided GFF and/or sequence data will be incrementally added to the
indexes.</p>


<h3><a name="SQLite">5.2. The SQLite Backend</a></h3>

<p>

The Bio::SeqFeature::Store SQLite adaptor is an interface to the open
source SQLite database management system. Its performance is
significantly better than that of the Berkeleydb adaptor, and is
highly recommended for production environments. This section describes
how to set up GBrowse to use this adaptor.

<p>

First you'll need to confirm that SQLite is installed. It is installed
by default on most Linux systems and Macintosh OSX, but will not be present
on Windows. Go to <a
href="http://www.sqlite.org/download.html">The SQLite Home Page</a>
and download and install the source code or binary package that is
most suitable to you.

<p>

Next, you'll need to install the Perl interface to SQLite. Again, this
is preinstalled on many systems, but if you need to install it, you
can get it via the CPAN installer (Linux, OSX), or PPM (Windows).

<p>

Via CPAN:

<blockquote><pre>
% <b>perl -MCPAN -e shell</b>
cpan&gt; <b>install DBD::SQLite</b>
cpan&gt; <b>quit</b>
</pre></blockquote>

Via PPM:

<blockquote><pre>
C:\Windows&gt; <b>ppm</b>
ppm&gt; <b>install DBD::SQLite</b>
ppm&gt; <b>quit</b>
</pre></blockquote>

Users of Debian systems can simply install the package
libdbd-sqlite3-perl.

<p>

You'll now load the .gff3 and .fa files into a new SQLite database.
There are actually two steps needed.  The first is to "initialize" the
database with all the data definitions needed to hold genomic feature
data, and the second is to actually load the data.  Fortunately, both
these steps are handled by the same command-line tool,
<i>bp_seqfeature_load.pl</i>, which is part of the BioPerl suite.

<p>

Copy the files <a
href="data_files/volvox_all.gff3">volvox_all.gff3</a> and <a
href="data_files/volvox_all.fa">volvox_all.fa</a> to some convenient
place. Now choose a location for the database on the local
filesystem. In this example we will use
/var/lib/gbrowse2/databases/Volvox.sqlite. Then run the following command from
the command line:

<blockquote class="example"><pre>
% <b>bp_seqfeature_load.pl -a DBI::SQLite -c -f -d /var/lib/gbrowse2/databases/Volvox.sqlite volvox_all.fa volvox_all.gff3</b>
loading volvox_all.fa...
                                                                                
Building object tree... 0.00s
Loading bulk data into database... 0.00s
load time:  0.02s
loading volvox_all.gff3...
                                                                                
Building object tree... 0.00s
Loading bulk data into database... 0.02s
load time:  0.23s
</pre></blockquote>

<p>

The arguments to <i>bp_seqfeature_load.pl</i> are:

<table width="50%">
  <tr>
    <td><b>-c</b></td>
    <td><b>c</b>lear (initialize) the database
  </tr>
  <tr>
    <td><b>-d /var/lib/gbrowse2/databases/Volvox.sqlite</b></td>
    <td>Load into the <b>d</b>atabase at the given path
  </tr>
  <tr>
    <td><b>-f</b></td>
    <td>Use the <b>f</b>ast loading algorithm.
  </tr>
  <tr>
    <td><b>volvox_all.fa volvox_all.gff3</b></td>
    <td>The data files to load.
  </tr>
  
</table>

The SQLite database is all ready to go.  Now, in order to tell GBrowse
to start using the SQLite database rather than the in-memory database,
you need to make a small change to the volvox.conf configuration
file.  Find the few lines of the file and change them to look like
this:

<blockquote class="example"><pre>
[GENERAL]
description   = Volvox Example Database
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor DBI::SQLite
	        -dsn     /var/lib/gbrowse2/databases/Volvox.sqlite
</pre></blockquote>

<p>

The <b>-adaptor</b> argument is telling GBrowse to use the DBI::mysql"
database adaptor, which is the BioPerl interface to SQLite databases.
The <b>-dsn</b> argument tells GBrowse to use the <b>d</b>ata
<b>s</b>ource <b>n</b>ame "volvox". The <b>-fast</b> option turns on
some optimizations that will make features load faster.

<p>

When you reload the web page, GBrowse will now be using SQLite.
Depending on the speed of your CPU and disk, you might notice that it
seems a bit snappier than the in-memory version.  See <a
href="../CONFIGURE_HOWTO.txt">CONFIGURE_HOWTO.txt</a> for more
information on configuring GBrowse to use relational databases.

<p>

To add more data to an existing SQLite database, simply run the
bp_seqfeature_load.pl command without the <b>-c</b> switch. This will
load additional GFF3 and FASTA files into the database.

<p>

To delete data from the database, use the
<b>bp_seqfeature_delete.pl</b> script. To dump out the contents of the
database, run <b>bp_seqfeature_gff3.pl</b>

<h3><a name="mysql">5.3. The MySQL Backend</a></h3>

<p>

The Bio::SeqFeature::Store MySQL adaptor is an interface to the open
source MySQL database management system. Its performance is equal to
that of the SQLite adaptor, but it has better provisions for error
recovery and is safe to use in environments where multiple users read
and write to the database simultaneously. Because it runs as a server,
it is particularly useful in high-performance GBrowse <a
href="http://gmod.org/wiki/Running_a_GBrowse2_render_farm">render farm
environments.</a>

<p>

You'll have to install MySQL if it is not installed already. MySQL's
home page for source code and binaries is <a
href="http://www.mysql.com">www.mysql.com</a>.

<p>

Next, you'll need to install the Perl interface to MySQL.  The module
is called "DBD::mysql". Install it using the PPM or CPAN shells as
described in the SQLite section. On Debian systems, you may install
the package "libdbd-mysql-perl".

<p>

Setting up a new MySQL database is just a bit more complicated than
setting up a SQLite database. First you'll set up a new empty
database named "volvox."  Using the <b>mysql</b> command-line tool,
create the database, grant yourself read/write privileges, and grant
the "nobody" user read privileges:

<blockquote class="example"><pre>
% <b>mysql -uroot -p</b>
Enter password: *********

mysql> <b>create database volvox;</b>
Query OK, 1 row affected (0.04 sec)

mysql> <b>grant all privileges on volvox.* to lstein@localhost;</b>
Query OK, 0 rows affected (0.00 sec)

mysql> <b>grant select on volvox.* to nobody@localhost;</b>
Query OK, 0 rows affected (0.00 sec)

mysql> <b>quit</b>
Bye
</pre></blockquote>

<p>

Depending on how mysql was installed, you may not need to provide a
password, in which case just type "mysql -uroot" without the "-p"
argument.  When granting privileges to yourself, replace "lstein" with
your own login name.  If you are on a Windows system, you may be able
to skip this step entirely.

<p>

You'll now load the .gff3 and .fa files into this newly created
database.  Again, you'll use the <i>bp_seqfeature_load.pl</i>.

<blockquote class="example"><pre>
% <b>bp_seqfeature_load.pl -c -f -a DBI::mysql -d volvox volvox_all.fa volvox_all.gff3</b>
loading volvox_all.fa...
                                                                                
Building object tree... 0.00s
Loading bulk data into database... 0.00s
load time:  0.02s
loading volvox_all.gff3...
                                                                                
Building object tree... 0.00s
Loading bulk data into database... 0.02s
load time:  0.23s
</pre></blockquote>

<p> The invocation of the script is identical except that we specify
the DBI::mysql adaptor, and use the mysql server's database name of
"volvox" rather than a specific path.  <p>

Change the [GENERAL] section of the volvox configuration file to use
the DBI::mysql adaptor and to specify the user that has access to the
database (in this case the "nobody" user that you granted permission
to earlier).

<blockquote class="example"><pre>
[GENERAL]
description   = Volvox Example Database
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor DBI::mysql
	        -dsn     volvox
                -user    nobody
</pre></blockquote>

<p>

Reload the browser to confirm that it is working as expected (check
the server error log if there is a problem). As with the SQLite
adaptor, you may use <b>bp_seqfeature_delete.pl</b> to delete
features, <b>bp_seqfeature_gff3.pl</b> to list the contents of the
database, and <b>bp_seqfeature_load.pl</b> to load in additional GFF3
files.

<h3><a name="other_backends">5.3. Other backends</a></h3>

<p>

The Bio::DB::SeqFeature::Store database backend supports the three
adaptors we have already used in this tutorial. For information see
the following perldoc manual pages:

<dl>
  <dt><b>perldoc Bio::DB::SeqFeature::Store::DBI::mysql</b>
  <dd>The MySQL adaptor.
      <p>
  <dt><b>perldoc Bio::DB::SeqFeature::Store::memory</b>
  <dd>The in-memory adaptor.
      <p>
  <dt><b>perldoc Bio::DB::SeqFeature::Store::berkeleydb</b>
  <dd>The Berkeleydb adaptor
</dl>
      
<p>

Another set of adaptors, in the Bio::DB::Das set, let GBrowse run on
top of the rich biological database schemas <a
href="http://gmod.org/Chado">Chado</a> and <a
href="http://www.biosql.org/wiki/Main_Page">BioSQL</a>:

<dl>
  <dt><b>perldoc Bio::DB::Das::Chado</b>
  <dd>An adaptor for PostgreSQL databases using the <i>Chado</i>
      schema (see the <a href="http://gmod.org/Chado">Chado
      home page</a>.)
      <p>
  <dt><b>perldoc Bio::DB::Das::BioSQL</b>
  <dd>An adaptor for PostgreSQL and MySQL databases using the
  <i>BioSQL</i> schema (see <a
  href="http://www.biosql.org">www.biosql.org</a>).
      <p>
  <dt><b>perldoc Bio::Das</b>
  <dd>An adaptor for Distributed Annotation System genome annotation
  (version 1). We discuss this in more detail under <a
  href="#DAS">Using GBrowse as a DAS Server or Client</a></dd>
</dl>

<p>

Lastly, there is an older family of adaptors that use the Bio::DB::GFF
database system. These are best-suited for loading data stored in GFF
version 2 files , but will work, with limitations, with GFF3 files as
well. These adaptors work with a wider range of relational database
backends.

<dl>
  <dt><b>perldoc Bio::DB::GFF::Adaptor::dbi::mysql</b>
  <dd>The MySQL adaptor.
      <p>
  <dt><b>perldoc Bio::DB::GFF::Adaptor::dbi::oracle</b>
  <dd>The Oracle adaptor.
      <p>
  <dt><b>perldoc Bio::DB::GFF::Adaptor::dbi::pg</b>
  <dd>The PostgreSQL adaptor.
      <p>
  <dt><b>perldoc Bio::DB::GFF::Adaptor::dbi::biofetch</b>
  <dd>An adaptor that will fetch data automatically from
      GenBank/EMBL and load it into a local MySQL database.
      <p>
  <dt><b>perldoc Bio::DB::GFF::Adaptor::memory</b>
  <dd>An adaptor for in-memory databases running off files.
      <p>
</dl>

<hr>

<h2><a name="multiple_databases">6. Multiple Database Backends</a></h2>

<p>

As your GBrowse configuration grows, you may find that it makes sense
to divide the datasets among multiple databases for ease of updating
and maintenance. Fortunately this is easy to do with a small bit of
extra syntax in the configuration file.

<p>

We will illustrate using the in-memory adaptor. (In real life, you
will most likely want to do this with database backends such as
SQLite.) In /var/lib/gbrowse2/databases, create three new directories named
"volvox_scaffold", "volvox_alignments" and "volvox_genes". Make them
writable by the web user, as usual. Distribute the various volvox
database files among them as follows:

<blockquote class="example"><pre>
volvox_basic/
      volvox_remarks.gff3
      volvox_bacs.gff3
      volvox.fa

volvox_genes/
      volvox_genes.gff3
      volvox_genes_simple.gff3
      volvox_geneproducts.gff3
      volvox_domains.gff3

volvox_alignments/
      volvox_matches.gff3
      volvox_est_targets.gff3
      volvox_trace.gff3
      ests.fa

volvox_expression/
      volvox_microarray.gff3
      volvox_microarray.wig
      track001.ctgA.1202327456.wig
</pre></blockquote>

<p>

Copy the configuration file <a
href="conf_files/volvox_refactored.conf">volvox_refactored.conf</a>
into <b>/etc/gbrowse2</b>, and edit GBrowse.conf to point to it:

<blockquote class="example"><pre>
[volvox]
description = Refactored Volvox Tutorial
path        = volvox_refactored.conf
</pre></blockquote>

<p>

When you reload the tutorial datasource, it should look and act just
like the original version. Now let's take a peek at the
volvox_refactored.conf configuration file and see what has changed:

<blockquote class="example"><pre>
[GENERAL]
database      = basic

plugins     = Aligner RestrictionAnnotator BatchDumper TrackDumper
default features = ExampleFeatures
...
</pre></blockquote>

The first thing you may notice is that the <b>db_adaptor</b> and <b>db_args</b>
options, which specify the type and location of the backend database,
have been replaced by a single line "database = basic". This is
because the refactored version of this example uses multiple
databases, each with a different symbolic name. The <b>database</b> option
in the [GENERAL] section simply sets up the default backend to use if
others are not specified.

<p>

Now moving down in the configuration file, you will see a new section
consisting of a series of new database sections. There are four of
them:

<blockquote class="example"><pre>
[basic:database]
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor memory
		-dir '/var/lib/gbrowse2/databases/volvox_basic'

[genes:database]
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor memory
		-dir '/var/lib/gbrowse2/databases/volvox_genes'

[alignments:database]
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor memory
		-dir '/var/lib/gbrowse2/databases/volvox_alignments'

[expression:database]
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       = -adaptor memory
		-dir '/var/lib/gbrowse2/databases/volvox_expression'
</pre></blockquote>

Each section begins with a stanza header of the form
[<i>database_name</i>:database]. Each one has the <b>db_adaptor</b>
and <b>db_args</b> options. In this example, we are using the
in-memory adaptor for each of the databases, but the nice thing about
this system is that you are free to mix and match any of the backends
that GBrowse supports, for example using in-memory adaptors for small,
rapidly-changing data sets, and the MySQL or Oracle adaptors for
larger datasets.

<p>

Now look at a few track stanzas:

<blockquote class="example"><pre>
[Motifs:overview]
database     = genes
feature      = polypeptide_domain
glyph        = span
height       = 5
description  = 1
label        = 1
key          = Motifs

[ExampleFeatures]
feature      = remark
glyph        = generic
stranded     = 1
bgcolor      = orange
height       = 10
key          = Example Features

[Alignments]
database     = alignments
feature      = match
glyph        = segments
category     = Alignments
key          = Example alignments
</pre></blockquote>

Notice the <b>database</b> option in most of them. This option is
telling GBrowse to fetch the data from the indicated database when it
goes to display each track. The name given in the database option must
match the name in the corresponding [...:database] stanza, or you will
get an error. If no database option appears, then GBrowse will use the
<b>database</b> option given in either the [TRACK DEFAULTS] or
[GENERAL] section, looking in TRACK DEFAULTS first and then GENERAL.

<hr>

<h2><a name="conclusion">7. Conclusion</a></h2>

<p>

This is just a short introduction to the many things that you can do
with GBrowse.  Major features not discussed were:

<ul>
  <li>multi-language support
  <li>third-party feature loading
  <li>the ability to view GenBank, Chado, and BioSQL feature databases
  <li>advanced callbacks
</ul>

All this information, and more can be found in <a
href="http://gmod.cshl.edu/wiki/index.php/GBrowse_Configuration_HOWTO">GBrowse
Configuration HOWTO</a> and in the documentation for <a
href="http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/DB/SeqFeature/Store.pm">Bio::DB::SeqFeature::Store</a>
and <a href="http://search.cpan.org/~sendu/bioperl-1.5.2_102/Bio/Graphics/Panel.pm">Bio::Graphics</a>.

<p>

For questions, bug reports and support requests, please use the <a
href="mailto:gmod-gbrowse@lists.sourceforge.net">GMOD GBrowse Mailing
List</a>.

<p>

Have fun!

<hr>
<address>Lincoln D. Stein, lstein@cshl.org<br>
<a href="/">Cold Spring Harbor Laboratory</a></address>
<!-- hhmts start -->
Last modified: Wed Jan 27 21:39:29 EST 2010
<!-- hhmts end -->
</body> </html>
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)