BioX::Seq::Utils - miscellaneous sequence-related functions
if ( is_nucleic($seq) ) { $seq = rev_com( $seq ); } my @orfs = all_orfs( $seq, 3, # ORF mode 200, # min length ); my $re = build_ORF_regex( 0, # ORF mode 300, # min length );
BioX::Seq::Utils contain a number of sequence-related functions. They are general functions that are used often enough to warrant inclusion in a library but not often enough to warrant addition to the core BioX::Seq class. They may also include commonly-used functions that do not make sense to include as BioX::Seq methods, as well as functions that mirror BioX::Seq methods but can be used on raw strings. They act on simple scalars and arrays rather than objects.
BioX::Seq::Utils
BioX::Seq
NOTE: Use of this module is considered deprecated. It is retained within the <BioX::Seq> package as a number of existing software tools rely on it, but at some point in the future these functions will likely find a new home elsewhere.
my $re = rev_com($seq);
Takes a single scalar argument and returns a scalar containing the reverse complement. Throws an exception if the input value doesn't look like a nucleic acid sequence.
if ( is_nucleic($seq) ) { # do something }
Takes a single scalar argument and returns a boolean value indicating whether the scalar "looks like" a nucleic acid string (i.e. contains no characters but valid IUPAC nucleic acid codes).
my @orfs = all_orfs( $seq, 2, # ORF mode 100, # min length ); for my $orf (@orfs) { my ($seq, $start, $end) = @{$orf}; }
Takes one required argument (a sequence string) and two optional arguments (ORF mode and minimum length) and returns an array of array references representing all ORFs in all reading frames of the sequence. Each reference contains three values: the sequence, the start position, and the stop position. The strand can be determined by comparing start and stop position (ORFs on the reverse strand will have start > stop). See build_ORF_regex() for an explanation for the possible values for ORF mode.
build_ORF_regex()
my $re = build_ORF_regex( 3, 300, );
Builds a regular expression for matching opening reading frames in a nucleic acid sequence string. Takes two required arguments that are used for building the regular expression:
mode - an integer from 0-3 defining the type of open reading frame detected.
0 - any set of codons not containing a start codon
1 - must end with stop codon
2 - must begin with start codon
3 - must begin with start codon and end with stop codon
C min_len - an integer representing the minimum number of nucleic acids an open reading frame must contain to be returned (not including the stop codon)
The return value is a compiled expression that can be used to search a sequence string. The pos() function should be used on the string to set the frame to be searched (0-2) prior to applying the regex.
pos()
Please reports bugs to the author.
Jeremy Volkening <jeremy *at* base2bio.com>
Copyright 2014 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
To install BioX::Seq, copy and paste the appropriate command in to your terminal.
cpanm
cpanm BioX::Seq
CPAN shell
perl -MCPAN -e shell install BioX::Seq
For more information on module installation, please visit the detailed CPAN module installation guide.