The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

get_feature_info.pl

A script to collect feature information from a BioPerl SeqFeature::Store db.

SYNOPSIS

get_feature_info.pl <filename>

  Options:
  --in <filename> 
  --db <name>
  --attrib <attribute1,attribute2,...>
  --type <primary_tag>
  --out <filename>
  --gz
  --version
  --help
  
  Attributes include:
   Chromosome
   Start
   Stop
   Strand
   Score
   Name
   Alias
   Note
   Type
   Primary_tag
   Source
   Length
   Midpoint
   Phase
   RNA_count (number of transcript subfeatures)
   Exon_count (number of exon subfeatures)
   Gene_length (sum of all merged, collapsed, transcript exon lengths)
   Transcript_length (sum of exon lengths)
   Parent (name)
   Primary_ID
   <tag>

OPTIONS

The command line flags and descriptions:

--in <filename>

Specify an input file containing either a list of database features or genomic coordinates for which to collect data. The file should be a tab-delimited text file, one row per feature, with columns representing feature identifiers, attributes, coordinates, and/or data values. The first row should be column headers. Text files generated by other BioToolBox scripts are acceptable. Files may be gzipped compressed.

--db <name>

Specify the name of a Bio::DB::SeqFeature::Store annotation database from which gene or feature annotation may be derived. A database is required for generating new data files with features. For more information about using annotation databases, see https://code.google.com/p/biotoolbox/wiki/WorkingWithDatabases.

--attrib <attribute>

Specify the attribute to collect for each feature. Standard GFF attributes may be collected, as well as values from specific group tags. These tags are found in the group (ninth) column of the source GFF file. Standard attributes include the following

   - Chromosome
   - Start
   - Stop
   - Strand
   - Score
   - Name
   - Alias
   - Note
   - Type
   - Primary_tag
   - Source
   - Length
   - Midpoint
   - Phase
   - RNA_count (number of transcript subfeatures)
   - Exon_count (number of exon subfeatures)
   - Gene_length (sum of all merged, collapsed, transcript exon lengths)
   - Transcript_length (sum of exon lengths)
   - Parent (name)
   - Primary_ID
   - <tag>

If attrib is not specified on the command line, then an interactive list will be presented to the user for selection. Especially useful when you can't remember the feature's tag keys in the database.

--type <primary_tag>

When the input file does not have a type column, a type or primary_tag may be provided. This is especially useful to restrict the database search when there are multiple features with the same name.

--out <filename>

Optionally specify an alternate output file name. The default is to overwrite the input file.

--gz

Indicate whether the output file should (not) be compressed by gzip. If compressed, the extension '.gz' is appended to the filename. If a compressed file is opened, the compression status is preserved unless specified otherwise.

--version

Print the version number.

--help

Display this help.

DESCRIPTION

This program will collect attributes for a list of features from the database. The attributes may be general attributes, such as chromsome, start, stop, strand, etc., or feature specific attributes stored in the original group field of the original source GFF file.

AUTHOR

 Timothy J. Parnell, PhD
 Howard Hughes Medical Institute
 Dept of Oncological Sciences
 Huntsman Cancer Institute
 University of Utah
 Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.