The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::Align::Subset - A BioPerl module to generate new alignments as subset from larger alignments

VERSION

Version 1.27

SYNOPSIS

    use strict;
    use warnings;
    use Data::Dumper;
    
    use Bio::Align::Subset;
    
    # The alignment in a file
    my $filename = "alignmentfile.fas";
    # The format
    my $format = "fasta";
    
    # The subset of codons
    my $subset = [1,12,25,34,65,100,153,156,157,158,159,160,200,201,202,285];
    
    # Create the object
    my $obj = Bio::Align::Subset->new(
                                      file => $filename,
                                      format => $format
                                    );
    
    # View the result
    # This function returns a Bio::SimpleAlign object
    print Dumper($obj->build_subset($subset));

DESCRIPTION

Given an array of codon positions and an alignment, the function Bio::Align::Subset->build_subset returns a new alignment with the codons at those positions from the original alignment.

CONSTRUCTOR

Bio::Align::Subset->new()

    $Obj = Bio::Align::Subset->new(file => 'filename', format => 'format')

The new class method constructs a new Bio::Align::Subset object. The returned object can be used to retrieve, print and generate subsets from alignment objects. new accepts the following parameters:

file

A file path to be opened for reading or writing. The usual Perl conventions apply:

   'file'       # open file for reading
   '>file'      # open file for writing
   '>>file'     # open file for appending
   '+<file'     # open file read/write
   'command |'  # open a pipe from the command
   '| command'  # open a pipe to the command
format

Specify the format of the file. Supported formats include fasta, genbank, embl, swiss (SwissProt), Entrez Gene and tracefile formats such as abi (ABI) and scf. There are many more, for a complete listing see the SeqIO HOWTO (http://bioperl.open-bio.org/wiki/HOWTO:SeqIO).

If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then SeqIO will throw a fatal error.

The format name is case-insensitive: 'FASTA', 'Fasta' and 'fasta' are all valid.

Currently, the tracefile formats (except for SCF) require installation of the external Staden "io_lib" package, as well as the Bio::SeqIO::staden::read package available from the bioperl-ext repository.

OBJECT METHODS

build_subset($index_list)

    my $subset = $obj->build_subset([1,12,25,34,65,100,153,156,157,158,159]);

Build a new alignment with the specified codons in $index_list. It returns a Bio::SimpleAlign object.

ACCESSOR METHODS

get_count

    Title   : get_count
    Usage   : $instance_no = $obj->get_count
    Function: 
    Returns : Number of istances for this class
    Args    :

get_file

    Title   : get_file
    Usage   : $file_path = $obj->get_file
    Function:
    Returns : The file name of the alignment
    Args    :

get_format

    Title   : get_format
    Usage   : $format = $obj->get_format
    Function:
    Returns : The alignment format (fasta, phylip, etc.)
    Args    :

get_identifiers

    Title   : get_identifiers
    Usage   : $identifiers $obj->get_identifiers
    Function:
    Returns : An array reference with all the identifiers in an alignment
    Args    :

get_seq_length

    Title   : get_seq_length
    Usage   : $long = $obj->get_seq_length
    Function:
    Returns : The longitude of all the sequences in an alignment
    Args    :

get_sequences

    Title   : get_sequences
    Usage   : $sequences = $obj->get_sequences
    Function:
    Returns : An array reference with all the sequences in an alignment
    Args    :

MUTATOR METHODS

set_file

    Title   : set_file
    Usage   : $obj->set_file('filename')
    Function: Set the file path for an alignment
    Returns : 
    Args    : String

set_format

    Title   : set_format
    Usage   : $obj->set_format('fasta')
    Function: Set the file format for an alignment
    Returns :
    Args    : String

set_identifiers

    Title   : set_identifiers
    Usage   : $obj->set_identifiers(\@array_ids)
    Function: Change the identifiers for all the sequences in the alignment
    Returns :
    Args    : List

set_sequences

    Title   : set_sequences
    Usage   : $obj->set_sequences(\@array_seqs)
    Function: Change the sequences in the alignment
    Returns :
    Args    : List

AUTHOR - Hector Valverde

Hector Valverde, <hvalverde@uma.es>

CONTRIBUTORS

Juan Carlos Aledo, <caledo@uma.es>

BUGS

Please report any bugs or feature requests to bug-bio-align-subset at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-Align-Subset. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Bio::Align::Subset

You can also look for information at:

LICENSE AND COPYRIGHT

Copyright 2012 Hector Valverde and Juan Carlos Aledo.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.