
genomics - Perl extension for various DNA sequence analysis tools

use genomics::FilterSeq;

This module condenses a fasta formated file to a 'unique' list of sequences. This is done rcursively by Hash{key} lookups. A unique key is sampled from each sequence and listed in a %HASH, thereby making all seqeucnes with identcal keys equivelent. The sequences are scanned +- the scanning window for other keys. Duplicates are squashed based on key prevelence or 5'->3' directionality. =head2 EXPORT Usage: Call the subroutine by sending in order: 1. \%SEQUENCE - a reference to a hash with %SEQUENCE{$name}=$sequence structure 2. $filter_start - the staring position in the sequence to gab a key 3. $filter_length - the length of the key (shorter keys produce more 'pruned' sets) 4. $filter_window - window +- to scan for keys 5. $filter_type - "M" = leave ambigous sequences, "T" = force ambigous to most 3' position, "F" = force ambigous to most 5' position
my ( $RefKeyHash_R,$RefKeyHashSeq_R,$EST_PER_SITE_R,$SITES_CHOSEN_R,$STATS_R )= genomics::FilterSeq(\%SEQUENCE,$filter_start,$filter_length,$filter_window,$filter_type);
subroutine returs the following: 1. $RefKeyHash_R - hash_reference to hash containing references to arrays with sequence names by key. [ %hash{$key}=@ref_to_names ] 2. $RefKeyHashSeq_R, - similar, only returns condensed sequence by key 3. $EST_PER_SITE_R, a reference to a hash containg the key count value (number of keys represented) 4. $SITES_CHOSEN_R, a reference to a hash containg the key count value (number of sites represented) 5. $STATS_R reference to a hash of various counts.
my $seq_count = $$STATS_R{"seq_count"}; my $Refseq_ID_count = $$STATS_R{"Refseq_ID_count"}; my $position_squashed_count = $$STATS_R{"position_squashed_count"}; my $key_count = $$STATS_R{"key_count"}; my $my_length_ave = $$STATS_R{"length_ave"};
print "Out of $seq_count sequences ($my_length_ave), $Refseq_ID_count Id's were placed into $position_squashed_count sites (exact key), further reduced to $key_count sites by positional iteratation<BR>\n";
foreach(keys(%$RefKeyHash_R)){ print "$_ "; my $my_name_arr = $$RefKeyHash_R{$_}; print @$my_name_arr; print "\n"; print ${$$RefKeyHashSeq_R{$_}}; print "\n"; }

Mention other useful documentation such as the documentation of related modules or operating system documentation (such as man pages in UNIX), or any relevant external documentation such as RFCs or standards.

ltboots, <jesse.salisbury@cpan.org<gt>

Copyright (C) 2005 by root
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.1 or, at your option, any later version of Perl 5 you may have available.