
Bio::FASTASequence - Perl extension for Bioinformatics. Parsing sequence informations.

use Bio::FASTASequence; my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human). QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS ~; my $seq = Bio::FASTASequence->new($fasta);

Bio::FASTASequence is a perl module to parse information out off a Fasta-Sequence.

This perl module is a simple utility to simplify the job of bioinformatics. It parses several information about a given FASTA-Sequence such as:
my $accession = $seq->getAccessionNr();
returns the AccessionNr of the FASTA-Sequence
my $description = $seq->getDescription();
returns the description standing in the first line of the FASTA-format (without the accession number)
my $sequence = $seq->getSequence();
returns the sequence
my $crc64_checksum = $seq->getCrc64();
returns the crc64 checksum of the sequence. This checksum corresponds with the crc64 checksum of SWISS-PROT
$seq->addDBRef(DB, REFERENCE_AC);
DB is the name of the referenced database
REFERENCE_AC is the accession number in the referenced database
$seq->seq2file(FILENAME);
FILENAME is the path of the file where the sequence has to be stored.
my $indexes = $seq->allIndexesOf(EXPR);
returns a reference on an array, which contains all indexes of EXPR in the sequence
my $length = $seq->getSequenceLength();
returns the length of the sequence
my $hashref = $seq->getDBRefs();
returns a hashreference. The hash contains all references hashref = {'SWISS-PROT' => 'P01815'},
my $fasta_sequence = $seq->getFASTA();
returns the sequence in FASTA-format
use Bio::FASTASequence;
my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY
YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS
~;
my $seq = Bio::FASTASequence->new($fasta);
print 'The sequence of '.$seq->getAccessionNr().' is '.$seq->getSequence(),"\n";
print 'This sequence contains '.scalar($seq->allIndexesOf('C').' times Cystein at the following positions:';
print $_+1.', ' for(@{$seq->allIndexesOf('C')});

This module can parse the following formats:
The structure of the hash for the example is:
$VAR1 = {
'seq_length' => 120,
'accession_nr' => 'P01815',
'text' => 'QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKYYNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS',
'crc64' => '158A8B29AE7EEB98',
'dbrefs' => {},
'description' => 'Ig heavy chain V-II region COR - Homo sapiens (Human).'
}
if you miss something please contact me.

There is no bug known. If you experienced any problems, please contact me.

http://modules.renee-baecker.de # not available yet - this site is under construction
the crc64-routine is based on the SWISS::CRC64 module.

More FASTA-Description lines are accepted.

Renee Baecker, <module@renee-baecker.de>
feel free to contact me.

Copyright 2004 by Renee Baecker
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.