The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::ToolBox::big_helper

DESCRIPTION

This module helps in the conversion of wig and bed files to bigWig and bigBed files, respectively. It uses external applications to accomplish this, taking care of generating a chromosome file from a database if necessary.

Two exported subroutines are available for wig and bed conversions.

USAGE

Load the module at the beginning of your program and include the name or names of the subroutines to export. None are automatically exported.

        use Bio::ToolBox::big_helper qw(wig_to_bigwig_conversion);
wig_to_bigwig_conversion

This subroutine will convert a wig file to a bigWig file. See the UCSC documentation regarding wig and bigWig file formats. It uses the UCSC wigToBigWig utility to perform the conversion. The utility must be available on the system for the conversion to succeed.

For bedGraph format wig files, the utility bedGraphToBigWig may be substituted if desired, but wigToBigWig can sufficiently handle all wig formats. When no utility is available but Bio::DB::BigFile is installed, then the module may be used for generating the bigWig file.

The conversion requires a list of chromosome name and sizes in a simple text file, where each line is comprised of two columns, "chromosome_name <size_in_bases>". This file may be specified, or automatically generated if given a Bio::DB database name (preferred to ensure genome version compatibility).

After running the utility, the existence of a non-zero byte bigWig file is checked. If it does, then the name of the file is returned. If not, an error is printed and nothing is returned.

Pass the function an array of key => value arguments, including the following:

  Required:
  wig         => The name of the wig source file. 
  db          => Provide an opened database object from which to generate 
                 the chromosome sizes information.
  Optional: 
  chromo      => The name of the chromosome sizes text file, described 
                 above, as an alternative to providing the database name.
  bwapppath   => Provide the full path to Jim Kent's wigToBigWig 
                 utility. This parameter may instead be defined in the 
                 configuration file C<biotoolbox.cfg>. 

Example

        my $wig_file = 'example_wig';
        my $bw_file = wig_to_bigwig_conversion(
                'wig'   => $wig_file,
                'db'    => $database,
        );
        if (-e $bw_file) {
                print " success! wrote bigwig file $bw_file\n";
                unlink $wig_file; # no longer necessary
        }
        else {
                print " failure! see STDERR for errors\n";
        };
open_wig_to_bigwig_fh

This subroutine will open a forked process to the UCSC wigToBigWig utility as a file handle, allowing wig lines to be "printed" to the utility for conversion. This is useful for writing directly to a bigWig file without having to write a temporary wig file first. This is also useful when you have multiple wig files, for example individual wig files from separate forked processes, that need to be combined into a bigWig file.

Note that the wigToBigWig utility does not handle errors gracefully and will immediately fail upon encountering errors, usually also bringing the main Perl process with it. Make sure the chromosome file is accurate and the wig lines are properly formatted and in order!

Pass the function an array of key => value arguments. An IO::File object will be returned. Upon the closing the file handle, the wigToBigWig utility will generate the bigWig file.

  Required:
  bw          => The output file name for the bigWig file.
                 Also accepts the keys file, wig, and out. 
  chromo      => The name of the chromosome sizes text file, described 
                 in wig_to_bigwig_conversion()
  Optional: 
  db          => Alternatively, provide an opened database object from which 
                 to generate a temporary chromosome sizes file. It is up to the 
                 user to delete this file.
  bwapppath   => Provide the full path to the UCSC I<wigToBigWig>utility. 
                 The path may be obtained from the configuration file 
                 F<.biotoolbox.cfg>. 

  Example:
        my $bw_file = 'example.bw';
        my $chromo_file = generate_chromosome_file($db);
        my $bwfh = open_wig_to_bigwig_fh(
                file    => $bw_file,
                chromo  => $chromo_file,
        );
        foreach (@wig_lines) {
                $bwfh->print("$_\n");
        }
        $bwfh->close;
                # this signals the forked wigToBigWig process to write 
                # the bigWig file, which may take a few seconds to minutes
        unlink $chromo_file;
bed_to_bigbed_conversion

This subroutine will convert a bed file to a bigBed file. See the UCSC documentation regarding bed and bigBed file formats. It uses the UCSC bedToBigBed utility to perform the conversion. This must be present on the system for the conversion to succeed.

The conversion requires a list of chromosome name and sizes in a simple text file, where each line is comprised of two columns, "chromosome_name size_in_bases". This file may be specified, or automatically generated if given a Bio::DB database name (preferred to ensure genome version compatibility).

After running the utility, the existence of a non-zero byte bigBed file is checked. If it does, then the name of the file is returned. If not, an error is printed and nothing is returned.

Pass the function an array of key => value arguments, including the following:

  Required:
  bed         => The name of the bed source file. 
  db          => Provide an opened database object from which to generate 
                 the chromosome sizes information.
  Optional: 
  chromo      => The name of the chromosome sizes text file, described 
                 above, as an alternative to providing the database name.
  bbapppath   => Provide the full path to the UCSC bedToBigBed  
                 utility. This parameter may instead be defined in the 
                 configuration file "biotoolbox.cfg". 

Example

        my $bed_file = 'example.bed';
        my $bb_file = bed_to_bigbed_conversion(
                'bed'   => $bed_file,
                'db'    => $database,
        );
        if ($bb_file) {
                print " success! wrote bigBed file $bb_file\n";
        }
        else {
                print " failure! see STDERR for errors\n";
        };
generate_chromosome_file

This subroutine will generate a chromosome sizes files appropriate for the big file conversion utilities from an available database. It is a two column text file, the first column is the chromosome name, and the second column is the length in bp. The file is written in the current directory with a name of chr_sizesXXXXX, where X are random characters as defined by File::Temp.

The chromosome names and lengths are obtained from a Bio::DB database using the Bio::ToolBox::db_helper/get_chromosome_list subroutine.

Pass the subroutine a database name, path to a supported database file, or opened Bio::DB object.

The file will be written, closed, and the filename returned.

AUTHOR

 Timothy J. Parnell, PhD
 Dept of Oncological Sciences
 Huntsman Cancer Institute
 University of Utah
 Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.