The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::ViennaNGS::Peak - An object oriented interface for characterizing peaks in RNA-seq data

SYNOPSIS

  use Bio::ViennaNGS::Peak;

  # get an instance of Bio::ViennaNGS::peak
  my $peaks = Bio::ViennaNGS::Peak->new();

  # parse coverage for [+] and [-] strand from Bio::ViennaNGS::FeatureIO objects
  $peaks->populate_data($filep,$filen);

  # identify regions covered by RNA-seq signal ('raw peaks')
  $peaks->raw_peaks($dest,$prefix,$log);

  # characterize final peaks
  $peaks->final_peaks($dest,$prefix,$log);

DESCRIPTION

This module provides a Moose interface for characterization of peaks in RNA-seq coverage data.

METHODS

populate_data

Title : populate_data

Usage : $obj->populate_data($filep,$filen);

Function : Parses RNA-seq coverage for positive and negative strand into @{$self->data}, a Hash of Arrays data structure.

Args : $filep and $filen are instances of Bio::ViennaNGS::FeatureIO.

Returns : None.

Notes: The memory footprint of this method is rather high. It builds a Hash of Arrays data structure from Bio::ViennaNGS::FeatureIO input objects of roughly the size of the underlying genome (chromosomes are hash keys, and there is an array containing coverage information for every genomic position referenced by hash values).

raw_peaks

Title : raw_peaks

Usage : $obj->raw_peaks($dest,$prefix,$log);

Function : This method identifies genomic regions ('raw peaks') covered by RNA-seq signal by means of a sliding window approach. RNA-seq coverage is read from @{$self->data} (which is populated by e.g. the populate_data method). The sliding window approach processes [+] and [-] strand for all chromosomes in 5' -> 3' direction, whereby the mean value of each window is used as a representative for this window. Thereby both start and end coordinates, as well as position of the maximum elevation are identified. Here the end position of a covered region is defined as the coordinate of the window whose mean is less than a certain value (i.e. $self->threshold * peak maximum).

Raw peaks are stored in %{$self->data}->{peaks}.

Args : $dest contains the output path for results, $prefix the prefix used for all output file names. $log is the name of a log file, or undef if no logging is reuqired.

Returns : None. The output is a position-sorted BED6 file containing all raw peaks.

Notes : It is highly recommended to use normalized input data in order to allow for multiple calls of this method with the same set of parameters on different samples.

final_peaks

Title : final_peaks

Usage : $obj->final_peaks($dest,$prefix,$log);

Function : This method characterizes final peaks from RNA-seq coverage found in %{$self->data}->{peaks}. The latter is supposed to have been populated by $self->raw_peaks.

The procedure for finding final peaks is as follows: For each raw peak found in %{$self->data}->{peaks} the window of maximum coverage is retrieved and a (second) sliding window approach is then applied to regions both upstream and downstream of the maximum. Peak boundaries are set at the position where the mean coverage of the respective window is lower than $self->threshold * peak maximum).

Peaks are reported if their total length (as determined by this routine) is not longer than $self->length.

Args : $dest contains the output path for results, $prefix the prefix used for all output file names. $log is the name of a log file, or undef if no logging is reuqired.

Returns : None. The output is a position-sorted BED6 file containing all candidate peaks.

Notes :

DEPENDENCIES

Moose
Carp
Path::Class
List::Util
namespace::autoclean

SEE ALSO

Bio::ViennaNGS
Bio::ViennaNGS::Util

AUTHOR

Michael T. Wolfinger, <michael@wolfinger.eu>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2018 by Michael T. Wolfinger

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.