PerlIO::via::SeqIO - PerlIO layer for biological sequence formats
use PerlIO::via::SeqIO; # open a FASTA file for reading: open( my $f, "<:via(SeqIO)", 'my.fas'); # open an EMBL file for writing open( my $e, ">:via(SeqIO::embl)", 'my.embl'); # convert print $e $_ while (<$f>); # add comments (this really works) while (<$f>) { # get the real sequence object my $seq = O($_); if ($seq->desc =~ /Pongo/) { print $e "# this one is almost human..."; } print $e $_; } # a one-liner, sort of $ alias scvt="perl -Ilib \"-MPerlIO::via::SeqIO qw(open)\" -e \"open(STDIN, '<:via(SeqIO)'); open(STDOUT, '>:via(SeqIO::'.shift().')'); while (<STDIN>) { print }\"" $ cat my.fas | scvt gcg > my.gcg
PerlIO::via::SeqIO attempts to provide an easy option for harnessing the magic sequence format I/O of the BioPerl (http://bioperl.org) toolkit. Opening a biological sequence file under via(SeqIO) yields a filehandle that can be used to read and write Bio::Seq objects sequentially with an absolute minimum of setup code.
PerlIO::via::SeqIO
via(SeqIO)
via(SeqIO) also allows the user to mix plain text and sequence formats on a single filehandle transparently. Different sequence formats can be written to a single file by a simple filehandle tweak.
Here's the basic idea, in code converting FASTA to EMBL format:
open($in, '<:via(SeqIO)', 'my.fas'); open($out, '>:via(SeqIO::embl)', 'my.embl'); while (<$in>) { print $out $_; }
On reading, you can rely on Bio::SeqIO's format guesser by invoking an unqualifed
open($in, '<:via(SeqIO)', 'mystery.txt');
or you can specify the format, like so:
open($in, '<:via(SeqIO::embl)', 'mystery.txt');
On writing, a qualified invocation is required;
open($out, '>:via(SeqIO)', 'my.fas'); # throws open($out, '>:via(SeqIO::fasta)', 'my.fas'); # that's better
This does what you mean:
However, $_ here is not the sequence object itself. To get that use the all-purpose object getter O():
$_
while (<$in>) { print join("\t", O($_)->id, O($_)->desc), "\n"; }
If you
use subs qw(O);
then this DWYM:
while (<$in>) { print O->id; }
Use the T() mapper to convert a Bio::Seq object into a thing that can be formatted by via(SeqIO):
open($seqfh, ">:via(SeqIO::embl)", "my.embl"); my $result = Bio::SearchIO->new( -file=>'my.blast' )->next_result; while(my $hit = $result->next_hit()){ while(my $hsp = $hit->next_hsp()){ my $aln = $hsp->get_aln; print $seqfh T($_) for ($aln->each_seq); } }
Interspersing plain text among your sequences is easy; just print the desired text to the handle. See the "SYNOPSIS".
Even the following works:
open($in, "<:via(SeqIO)", 'my.fas') open($out, ">:via(SeqIO::embl)", 'annotated.txt'); $seq = <$in>; print $out "In EMBL format, the sequence would be rendered:", $s;
You can use the Perlio layer PerlIO::via::gzip to decompress and compress via(SeqIO) input and output.
Compressed output:
open(my $tfh,"<:via(SeqIO)", "test.fas"); open(my $zfh,'>:via(SeqIO::embl):via(gzip)', 'test.embl.gz'); while (<$tfh>) { print $zfh $_; } close($zfh);
GOTCHA: the close is required.
close
Decompressed input:
open($tfh,"<:via(gzip):via(SeqIO::fasta)", "test.fas.gz"); open(my $zfh,'>:via(SeqIO::embl)', 'test.embl'); while (<$tfh>) { print $zfh $_; }
When reading via gzip, the sequence format must be explicitly specified in the via(SeqIO) mode spec.
Conversion, gzip to gzip:
open(my $tfh, "<:via(gzip):via(SeqIO::fasta)", "test.fas.gz"); open(my $zfh, ">:via(gzip):via(SeqIO::embl)", "test.embl.gz"); local $/; print $zfh <$tfh>; close($zfh);
Import the open() function provided by the module, like so
open()
use PerlIO::via::SeqIO qw(open);
This will provide the following kind of two-argument open functionality
open
open(STDIN, '<:via(SeqIO)'); open(STDOUT, '>:via(SeqIO::gcg)'); while (<STDIN>) { print; }
which will allow
cat my.gcg | perl your.pl > out
your.pl can read STDIN and acquire the sequence objects by using the object getter O():
your.pl
use PerlIO::via::SeqIO qw(open O); open (STDIN, '<:via(SeqIO)'); while (<STDIN>) { $seqobj = O($_); ... }
The format of the input in this case will be guessed by the Bio::SeqIO machinery.
Bio::SeqIO
The imported open() should pass through other uses of open unharmed. This is tested in 001_passthru.t. Please ping the "AUTHOR" if there are issues.
001_passthru.t
You can also easily switch write formats. (Why? Because...who knows?) Use set_write_format right off the handle:
open($in, "<:via(SeqIO)", 'my.fas') open($out, ">:via(SeqIO::embl)", 'multi.txt'); $seq1 = <$in>; print "This is sequence 1 in embl format:\n"; print $out $seq1; $out->set_write_format('gcg'); print $out "while this is sequence 1 in GCG format:\n" print $out $seq1;
The supported formats are contained in @PerlIO::via::SeqIO::SUPPORTED_FORMATS. Currently they are
@PerlIO::via::SeqIO::SUPPORTED_FORMATS
fasta, embl, gcg, genbank, pir
The O() and T() methods are exported by default.
O()
T()
The open hook needs to be available for the 2-argument open redirections (see "DETAILS") to work. Do
Title : O Usage : $o = O($sym) # not an object method Function: get the object "represented" by the argument Returns : the right object Args : PerlIO::via::SeqIO GLOB, or *PerlIO::via::SeqIO::TFH (tied fh) or scalar string (sprintf-rendered Bio::SeqI object) Example : $seqobj = O($s = <$seqfh>);
Title : T Usage : T($seqobj) # not an object method Function: Transform a real Bio::Seq object to a via(SeqIO)-writeable thing Returns : A thing writeable as a formatted sequence by a via(SeqIO) filehandle Args : a[n array of] Bio::Seq or related object[s] Example : print $seqfh T($seqobj);
Title : set_write_format Usage : $fh->set_write_format($format) Function: Set a write handle to write a specified sequence format Returns : true on success Args : scalar string; a supported format (see @PerlIO::via::SeqIO::SUPPORTED_FORMATS) Note : call off filehandle directly
PerlIO, PerlIO::via, Bio::SeqIO, Bio::Seq, http://bioperl.org
Email maj -at- fortinbras -dot- us http://fortinbras.us http://bioperl.org/wiki/Mark_Jensen
To install PerlIO::via::SeqIO, copy and paste the appropriate command in to your terminal.
cpanm
cpanm PerlIO::via::SeqIO
CPAN shell
perl -MCPAN -e shell install PerlIO::via::SeqIO
For more information on module installation, please visit the detailed CPAN module installation guide.