The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::ViennaNGS::Bam - High-level access to BAM files

SYNOPSIS

  use Bio::ViennaNGS::Bam;

  # split a single-end  or paired-end BAM file by strands
  @result = split_bam($bam_in,$rev,$want_uniq,$want_bed,$destdir,$logfile);

  # extract unique and multi mappers from a BAM file
  @result = uniquify_bam($bam_in,$outdir,$logfile);

DESCRIPTION

Bio::ViennaNGS::BAM provides high-level access to BAM file. Building on Bio::DB::Sam, it provides code to extract specific portions from BAM files. It comes with routines for splitting BAM files by strand (which is often required for visualization of NGS data) and extracting uniquely and multiply aligned reads from BAM files.

ROUTINES

split_bam($bam,$reverse,$want_uniq,$want_bed,$dest_dir,$log)

Splits BAM file $bam according to [+] and [-] strand. $reverse, $want_uniq and $want_bed are switches with values of 0 or 1, triggering forced reversion of strand mapping (due to RNA-seq protocol constraints), filtering of unique mappers (identified via NH:i:1 SAM argument), and forced output of a BED file corresponding to strand-specific mapping, respectively. $log holds name and path of the log file.

Strand-splitting is done in a way that in paired-end alignments, FIRST and SECOND mates (reads) are treated as _one_ fragment, ie FIRST_MATE reads determine the strand, while SECOND_MATE reads are assigned the opposite strand per definitionem. This also holds if the reads are not mapped in proper pairs and even if there is no mapping partner at all.

Sometimes the library preparation protocol causes inversion of the read assignment (with respect to the underlying annotation). In those cases, the natural mapping of the reads can be obtained by the $reverse flag.

This routine returns an array whose fist two elements are the file names of the newly generate BAM files with reads mapped to the positive, and negative strand, respectively. Elements three and four are the number of fragments mapped to the positive and negative strand. If the $want_bed option was given elements five and six are the file names of the output BED files for positive and negative strand, respectively.

NOTE: Filtering of unique mappers is only safe for single-end experiments; In paired-end experiments, read and mate are treated separately, thus allowing for scenarios where eg. one read is a multi-mapper, whereas its associate mate is a unique mapper, resulting in an ambiguous alignment of the entire fragment.

As mentioned above, the NH:i: SAM attribute is used for discriminating unique and multi mappers, thus requiring this attribute to be present in every SAM record. If this attribute is not found in all SAM entries, a warning will be issued and the log file will contain a note indicating that there were issues with the NH attribute.

uniquify_bam($bam,$dest,$log)

Extract uniquely and multiply aligned reads from BAM file $bam by means of the NH:i: SAM attribute. New BAM files for unique and multi mappers are created in the output folder $dest, which are named basename.uniq.bam and basename.mult.bam, respectively. If defined, a logfile named $log is created in the output folder.

This routine returns an array holding file names of the newly created BAM files for unique and multi mappers, respectively.

NOTE: Not all short read mappers use the NH:i: SAM attribute to decorate unique and multi mappers. As such, this routine will not work unless your BAM file has these attributes.

uniquify_bam2($bam,$dest,$log)

Extract uniquely and multiply aligned reads from BAM file $bam by means of the NH:i: SAM attribute, like the original uniquify_bam routine. Contrary to that, this one expects a name-sorted BAM file and reads in bands of (supposedly paired-end) reads sharing the same id/query name. If all reads in a band are unique mappers, they go to the basename.uniq.band.bam file, else all reads go the basename.mult.band.bam file.

This routine returns an array holding file names of the newly created BAM files for unique and multi mappers, respectively.

NOTE: Not all short read mappers use the NH:i: SAM attribute to decorate unique and multi mappers. As such, this routine will not work unless your BAM file has these attributes.

DEPENDENCIES

BIO::DB::Sam >= 1.37
File::Basename
File::Temp
Path::Class
Carp

AUTHORS

Michael T. Wolfinger <michael@wolfinger.eu>

COPYRIGHT AND LICENSE

Copyright (C) 2013-2017 Michael T. Wolfinger <michael@wolfinger.eu>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.