BioX::Seq::Fetch - Fetch records from indexed FASTA non-sequentially
use BioX::Seq::Fetch; my $parser = BioX::Seq::Fetch->new($filename); my $seq = $parser->fetch('seq_ABC'); my $sub = $parser->fetch('seq_XYZ', 8 => 15);
BioX::Seq::Fetch provides non-sequential access to records from indexed sequence files. Currently only FASTA files indexed using
samtoools faidx or another compatible method are supported. The module will now create samtools-compatible index files automatically if they are missing.
my $parser = BioX::Seq::Fetch->new( $filename, with_descriptions => 1, );
Create a new
BioX::Seq::Fetch parser. Requires an input filename (STDIN or open filehandles are not supported, as a filename is needed to find the corresponding index file and to ensure than
seek()-ing is supported). Takes one optional boolean argument ('with_descriptions') indicating whether to enable backtracking to find and include any sequence description present (normally this is absent as the FASTA index includes the offset to the sequence itself and not the defline). This option is currently experimental and may slow down sequence fetches, so it is turned off by default.
my $seq = $parser->fetch_seq( $name, $start, $end, );
Returns the requested sequence as a
BioX::Seq object, or undef if no matching sequence is found. Requires a valid sequence identifier and optionally 1-based start and end coordinates to retrieve a substring (the entire sequence is returned by default). A fatal error is thrown if the provided coordinates are outside the range of [1-length(sequence)].
$parser->write_index(); $parser->write_index( 'path/to/file.fa.fai' );
Writes a samtools-compatible index file for the underlying sequence file. Accepts one optional argument specifying the path of the file to create (the default, which should usually not be changed, is the same as the underlying sequence file with a '.fai' extension added).
This method is now called automatically if a FASTA file is opened with no index file present.
my @seq_ids = $parser->ids;
Returns an array of sequence IDs, ordered by their occurence in the underlying file.
my $len = $parser->length( $seq_id );
Returns the length of the sequence given by
$seq_id. May be marginally faster than fetching the sequence object and then finding the length.
BioX::Seq::Fetch supports files compressed with blocked gzip (BGZIP), typically using the
bgzip utility. This allows for pseudo-random access without the need for full file decompression. The
Compress::BGZIP module is required for this functionality.
Please report any bugs or feature requests to the issue tracker at https://github.com/jvolkening/p5-BioX-Seq.
Jeremy Volkening <jeremy *at* base2bio.com>
Copyright 2014-2017 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.