CracTools::Annotator - Generic annotation base on CracTools::GFF::Query::File
version 1.251
# Construct tha annotator object that will index the GFF file in # a genomic interal-tree based structure my $annotator = CracTools::Annotator->new("annotation.gff"); # Query the annotator object for overlapping annotations my $annot = $annotator->getBestAnnotationCandidate("chr1",12345,12380); if(defined $annot->{exon}) { print STDERR "Found overlapping exon\n"; } else { # If no overlapping exons have been found, we check for the closest gene # in the downstream direction my $closest_annot = $annotator->getAnnotationNearestDownCandidates()->[0]; if(defined $closest_annot && defined $closest_annot->{gene}) { print STDERR "Closest gene annotation is ".12345 - $closest_annot->{gene}->end."bp away\n"; } }
This module is based on CracTools::Interval::Query::File and provides powerfull methods to query annotation files and prioritize hits to fit specific application needs.
Annotator work with 0-based coordinate system and closed [a,b] intervals.
The principle behind CracTools::Annotator is to build a genomic interval tree that holds the annotations. Then, the user can query this datastructure to retrieve annotations. In order to organized the retrieved annotations, we build candidates hashes that are a branch of the annotation tree. For a classic GFF annotation file, if the queried interval overlap and exon, the branch of the annotation tree, will go from an exon leaf up to the gene root passing by an mRNA internal node.
An annotation candidate is a hash datastructure, where keys are GFF features (exon, gene, mRNA) and values are CracTools::GFF::Annotation object (a parsed GFF line).
It also contains an entry parent_feature that holds the parenting links between features, and an entry leaf_feature that holds the feature name of the leaf ("exon" for example).
parent_feature
leaf_feature
my $candidate = { "exon" => CracTools::GFF::Annotation, "gene" => CracTools::GFF::Annotation, "feature" => CracTools::GFF::Annotation, ..., parent_feature => {exon => mRNA, featureA => featureB, ...}, leaf_feature => "exon", };
Each annotation query can be parametrized with priorization methods that will choose a set of "best" annotation(s) to be returned to the user. In this module we propose default priorization method, but you can create your own in order to fit your application needs.
There is two kind of priorization method, prioritySub and comparSub.
prioritySub
comparSub
The priority subroutine (by default "getCandidatePriorityDefault") recieve as input the queried interval (start and end pos) and an annotation candidate. As output the subroutine must return a priority level (the lower being more important), and a string variable that is a literal version of the priority level.
The compare subroutine (by default "compareTwoCandidatesDefault") recieve as input two annotation candidates and the queried interval. As output the subroutine must return the best candidate between the two, or neither (undef) if the subroutine cannot determine.
Arg [1] : String - $gff_file GFF file used to perform annotation Arg [2] : String - $mode Execution mode : "fast" or "light" ("light" by default) Example : my $annotator = CracTools::GFF::Annotator->new($gff_file); Description : Create a new CracTools::GFF::Annotator object based on the provided GFF file. If "light" mode is specified, CracTools::Annotator will be less memory consuming but will have a time execution overhead. ReturnType : CracTools::GFF::Annotator
Description : Return the mode used to create the annotator ReturnType : string ("light" or "fast")
Arg [1] : String - chr Arg [2] : String - pos_start Arg [3] : String - pos_end Arg [4] : String - strand Description : Return true if any overlapping annotation has been found ReturnType : Boolean
Arg [1] : String - chr Arg [2] : String - pos_start Arg [3] : String - pos_end Arg [4] : String - strand Description : Return true if an overlapping gene annotation has been found ReturnType : Boolean
Arg [1] : String - chr Arg [2] : String - pos_start1 Arg [3] : String - pos_end1 Arg [4] : String - pos_start2 Arg [5] : String - pos_end1 Arg [6] : String - strand Description : Return true if a same gene overlaps the two intervals. ReturnType : Boolean
Arg [1] : String - chr Arg [2] : String - pos_start Arg [3] : String - pos_end Arg [4] : String - strand Arg [5] : (Optional) Subroutine - see C<getCandidatePriorityDefault> for more details Arg [6] : (Optional) Subroutine - see C<compareTwoCandidatesDefault> for more details Description : Return best annotation candidate according to the priorities given by the subroutine(s) in argument. ReturnType : AnnotationCandidate, Int(priority), String(type)
Arg [1] : String - chr Arg [2] : String - pos_start Arg [3] : String - pos_end Arg [4] : String - strand Arg [5] : (Optional) Subroutine - see C<getCandidatePriorityDefault> for more details Arg [6] : (Optional) Subroutine - see C<compareTwoCandidatesDefault> for more details Description : Return best annotation candidates according to the priorities given by the subroutine(s) in argument. ReturnType : ArrayRef of AnnotationCandidates, Int(priority), String(type)
Arg [1] : String - chr Arg [2] : String - pos_start Arg [3] : String - pos_end Arg [4] : String - strand Description : Return an array with all annotation candidates overlapping the chromosomic region. ReturnType : ArrayRef of AnnotationCandidate
Arg [1] : String - chr Arg [2] : String - pos_start Arg [3] : String - strand Description : Return an array with all annotation candidates nearest down the query region (without overlap). ReturnType : ArrayRef of AnnotationCandidate
Arg [1] : String - chr Arg [2] : String - pos_end Arg [3] : String - strand Description : Return an array with all annotation candidates nearest up the query region (without overlap). ReturnType : ArrayRef of AnnotationCandidate
Arg [1] : String - pos_start Arg [2] : String - pos_end Arg [3] : hash - candidate Description : Default method used to give a priority to a candidate. You can create your own priority method to fit your specific need for selecting the best annotation. The best priority is 0. A priority of -1 means that this candidate should be avoided. ReturnType : Array($priority,$type) where $priority is an integer and $type a string
Arg [1] : hash - candidate1 Arg [2] : hash - candidate2 Arg [3] : pos_start (position start that has been queried) Arg [4] : pos_end (position end that has been queried) Description : Default method used to chose the best candidat when priority are equals You can create your own priority method to fit your specific need for selecting the best candidat. ReturnType : AnnotationCandidate - best candidate or undef if we cannot decide which candidate is the best
Description : init method, load GFF annotation into a CracTools::GFF::Query object.
Arg [1] : String - annot_id Arg [2] : Hash ref - candidate Since this method is recursive, this is the object that we are constructing Arg [3] : Hash ref - annot_hash annot_hash is a hash reference where keys are annotion IDs and values are CracTools::GFF::Annotation objects. Description : _constructCandidate is a recursive method that build a candidate hash. A candidate is defined as a path into the annotation (multi-rooted) tree from a leaf (ex: an exon) to a root (ex: a gene). ReturnType : Candidate Hash ref where keys are GFF features and values are CracTools::GFF::Annotation objects : { "exon" => CracTools::GFF::Annotation, "gene" => CracTools::GFF::Annotation, feature => CracTools::GFF::Annotation, ..., parent_feature => {featureA => featureB}, leaf_feature => "exon", }
Arg [1] : Hash ref - annotations Annotions is a hash reference where keys are coordinates given by CracTools::Interval::Query::File objects. Description : _constructCandidate is a recursive method that build a candidate hash. ReturnType : Candidate array ref of all candidates built by _constructCandidate
Nicolas PHILIPPE <nphilippe.research@gmail.com>
Jérôme AUDOUX <jaudoux@cpan.org>
Sacha BEAUMEUNIER <sacha.beaumeunier@gmail.com>
This software is Copyright (c) 2017 by IRMB/INSERM (Institute for Regenerative Medecine and Biotherapy / Institut National de la Santé et de la Recherche Médicale) and AxLR/SATT (Lanquedoc Roussilon / Societe d'Acceleration de Transfert de Technologie).
This is free software, licensed under:
The GNU Affero General Public License, Version 3, November 2007
To install CracTools, copy and paste the appropriate command in to your terminal.
cpanm
cpanm CracTools
CPAN shell
perl -MCPAN -e shell install CracTools
For more information on module installation, please visit the detailed CPAN module installation guide.