Bio::Das::Segment - Genomic segments from Distributed Annotation System
use Bio::Das; # contact a DAS server using the "elegans" data source my $das = Bio::Das->new('http://www.wormbase.org/db/das' => 'elegans'); # fetch a segment my $segment = $das->segment(-ref=>'CHROMOSOME_I',-start=>10_000,-stop=>20_000); # get features and DNA from segment my @features = $segment->features; my $dna = $segment->dna; my @entry_points = $segment->entry_points; my @types = $segment->types;
Bio::Das provides access to genome sequencing and annotation databases that export their data in Distributed Annotation System (DAS) format. This system is described at http://biodas.org.
The Bio::Das::Segment class is used to retrieve information about a genomic segment from a DAS server. You may retrieve a list of (optionally filtered) annotations on the segment, a summary of the feature types available across the segment, or the segment's DNA sequence.
Bio::Das::Segment objects are usually created by calling the segment() method of a Bio::Das object created earlier. See Bio::Das for details. Under some circumstances, you might wish to create an object directly using Bio::Das::Segment->new():
Create a segment using the indicated reference sequence ID, between the indicated start and stop positions. $source contains a reference to the Bio::Das object to be used to access the data. The $start and $stop arguments are optional, and if not provided will assume the defaults described in Bio::Das.
Once created, a number of methods allow you to query the segment for its features and/or DNA.
The features() method returns annotations across the length of the segment. Two forms of this method are recognized. In the first form, the @filter argument contains a series of category names to retrieve. Each category may be further qualified by a regular expression which will be used to filter features by their type ID. Filters have the format "category:typeID", where the category and type are separated by a colon. The typeID and category names are treated as an unanchored regular expression (but see the note below). As a special cse, you may use a type of "transcript" to fetch composite transcript model objects (the union of exons, introns and cds features).
Example 1: retrieve all the features in the "similarity" and "experimental" categories:
@features = $segment->features('similarity','experimental');
Example 2: retrieve all the similarity features of type EST_elegans and EST_GENOME:
@features = $segment->features('similarity:^EST_elegans$','similarity:^EST_GENOME$');
Example 3: retrieve all similarity features that have anything to do with ESTs:
@features = $segment->features('similarity:EST');
Example 4: retrieve all the transcripts and experimental data
@genes = $segment->features('transcript','experimental')
In the second form, the type and categories are given as named arguments. You may use regular expressions for either typeID or category. It is also possible to pass an array reference for either argument, in which case the DAS server will return the union of the features.
Example 5: retrieve all the features in the "similarity" and "experimental" categories:
@features = $segment->features(-category=>['similarity','experimental']);
Example 6: retrieve all the similarity features of type EST_elegans and EST_GENOME:
@features = $segment->features(-category=>'similarity', -type =>/^EST_(elegans|GENOME)$/ );
Example 7: retrieve all features that have anything to do with ESTs:
@features = $segment->features(-type=>/EST/);
The return value from features() is a list of Bio::Das::Segment::Feature objects. See Bio::Das::Segment::Feature for details. Also see the section below on automatic feature merging.
NOTE: Currently (March 2001) the WormBase DAS server does not allow you to use regular expressions in categories.
Return the DNA corresponding to the segment. The return value is a simple string, and not a Bio::Sequence object. This method may return undef when used with a DAS annotation server that does not maintain a copy of the DNA.
This methods summarizes the feature types available across this segment. The items in this list can be used as arguments to features().
Called with no arguments, this method returns an array of Das::Segment::Type objects. See the manual page for details. Called with a TypeID, the method will return the number of instances of the named type on the segment, or undef if the type is invalid. Because the list and count of types is cached, there is no penalty for invoking this method several times.
The entry_points() method returns a list of landmarks across the segment. These landmarks can in turn be used as reference sequences for further calls into the genome.
The return value is an array of Bio::Das::Segment objects, or an empty listif this segment contains no entry points.
NOTE: This is not the recommended way to fetch the assembly. It is better to filter the segment for annotations in the "structural" category that are marked by the server as belonging to the assembly (the particular typeID to use is server-dependent).
The following accessors can be used to examine and change Bio::Das::Segment settings. Called with no arguments, the accessors return the current value of the setting. Called with a single argument, the accessors change the setting and return its previous value.
Accessor Description -------- ----------- refseq() Get/set the reference sequence start() Get/set the start of the segment relative to the reference sequence stop() Get/set the end of the segment relative to the reference sequence
Bio::Das::Segment detects and merges two common type of annotation: gene models and gapped alignments.
Features of type "intron", "exon" and "CDS" that share the same DAS group ID are combined into Bio::Das::Segment::Transcript objects. These are similar to Bio::Das::Segment::Feature, except for having methods for retrieving their component introns, exons and CDSs. Merged transcript objects have type "transcript" and category "transcription". See Bio::Das::Segment::Transcript for more information.
Features of category "similarity" or "homology" are combined together into single Bio::Das::Segment::GappedAlignment objects if they share the same group ID. These objects are similar to Bio::Das::Segment::Feature except that they have methods for retrieving the individual aligned segments. Gapped alignment objects have the type and category of the first alignmented component. See See Bio::Das::Segment::GappedAlignment for more information.
Bio::Das::Segment provides a convenience method for retrieving transcripts:
Retrieves all transcript models by fetching features of type 'exon', 'intron', and 'cds'. If $curated is a true value, then only curated transcripts are returned. Otherwise the list includes both curated and uncurated transcripts (which may contain both curated and uncurated parts). This may not work with every DAS server, as it relies on hard-coded type IDs.
In addition to the methods listed above, Bio::Das::Segment implements all the methods required for the Bio::RangeI class.
The Bio::Das::Segment class is overloaded to produce a human-readable string when used in a string context. The string format is:
The start and end positions may be omitted if they are unspecified. The overloaded stringify method is toString().
Lincoln Stein <email@example.com>.
Copyright (c) 2001 Cold Spring Harbor Laboratory
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty.