The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::DB::Big - Interface to BigWig and BigBed files via libBigWig

SYNOPSIS

    use Bio::DB::Big;
    use Bio::DB::Big::AutoSQL;
    
    # Setup CURL buffers
    Bio::DB::Big->init();

    my $bw = Bio::DB::Big->open('path/to/file.bw');
    # Generic: get the type
    if($bw->is_big_wig()) {
      print "We have a bigwig file\n";
    }

    # Generic: Get headers
    my $header = $bw->header();
    printf("Working with %d zoom levels", $header->{nLevels});
    # Generic: Get chromosomes (comes back as a hash {chrom => length})
    my $chroms = $bw->chroms();
    
    #Get stats, values and intervals
    if($bw->has_chrom('chr1')) {
      my $bins = 10;
      
      # uses the zoom levels and returns an array of 10 bins over chromsome positions 1-100
      my $stats = $bw->get_stats('chr1', 0, 100, $bins, 'mean');
      foreach my $s (@{$stats}) {
        printf("%f\n", $s);
      }

      # Go directly to the raw level and calc on that but ask for maximum value per bin this time
      my $full_stats = $bw->get_stats('chr1', 0, 100, $bins, 'max', 1);

      # Get a value for each base over chromsome positions 1 - 100. Values can be undef if not set
      my $values = $bw->get_values('chr1', 0, 100);

      # Get the real intervals where a value was assigned
      my $intervals = $bw->get_intervals('chr1', 0, 100);
      foreach my $i (@{$intervals}) {
        printf("%d - %d: %f\n", $i->{start}, $i->{end}, $i->{value})
      }
      
      # Or iterate which allows you to move through a file without loading everything into memory
      my $blocks_per_iter = 10;
      my $iter = $bw->get_intervals_iterator('chr1', 0, 100, $blocks_per_iter);
      while(my $intervals = $iter->next()) {
        foreach my $i (@{$intervals}) {
          printf("%d - %d: %f\n", $i->{start}, $i->{end}, $i->{value})
        }
      }
    }

    my $bb = Bio::DB::Big->open('http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb');
    if($bb->is_big_bed) {
      my $with_string = 1;
      # Optionally you do not retrieve the "string" if you don't want to potenitally saving memory
      my $entries = $bb->get_entries('chr21', 9000000, 10000000, $with_string);
      foreach my $e (@{$entries}) {
        printf("%d - %d: %s\n", $e->{start}, $e->{end}, $e->{string});
      }

      # Or you can use an iterator
      my $blocks_per_iter = 10;
      my $iter = $bb->get_entries_iterator('chr21', 0, $bb->chrom_length('chr21'), $with_string, $blocks_per_iter);
      while(my $entries = $iter->next()) {
        foreach my $e (@{$entries}) {
          printf("%d - %d: %s\n", $e->{start}, $e->{end}, $e->{string});
        }
      }
      
      # Finally you can request AutoSQL and parse if available
      if($bb->get_autosql()) {
        my $autosql = $bb->get_autosql();
        my $as = Bio::DB::Big::AutoSQL->new($autosql);
        if($as->has_field('name')) {
          printf("%s: The field 'name' is in position %d\n", $as->name(), $as->get_field('name')->position());
        }
        # Or just get all fields as an arrayref
        my $fields = $as->fields();
      }
    }

DESCRIPTION

This library provides access to the BigWig and BigBed file formats designed by UCSC. However rather than use kent libraries this uses libBigWig from https://github.com/dpryan79/libBigWig as it provides an implementation that avoids exiting when errors happen. libBigWig provides access to BigWig summaries, values and intervals alongside providing access to BigBed entries.

This implementation is read-only. Patches to give it write ability are welcomed however at the time of writing libBigWig only supports writing to BigWigs.

In addition there are a number of AutoSQL parsing objects implemented in Perl to provide some rough parsing capability when handling AutoSQL attached to a BigBed file. These are experimental but seem to work on a wide range of example AutoSQL fields.

Should you wish to use the kent library please consult Bio::DB::BigFile, which is a very complete set of bindings into kent.

INSTALLATION

Installation requires the following libraries to be made available

libBigWig - https://github.com/dpryan79/libBigWig

We assume that libcurl is installed to a central location and is a requirement for libBigWig (especially if you want to access remote files). libBigWig can be located via the following mechanisms:

By providing --libbigwig=/path/to/libbigwig to Build.PL
Setting an environment variable LIBBIGWIG_DIR to the correct path
Setting the --prefix argument
Installing from Alien::LibBigWig
Using pkg-config to find the location
Installing libBigWig to a central location. We attempt /usr, /usr/local, /usr/share, /opt/local

Build.PL looks to see if we can find BigWig.h and libBigWig.a in one of the above locations resolved in the above order. If we cannot find the library then compilation will fail.

ACCESSING REMOTE FILES

If you have compiled libBigWig against libcurl then you can access big files over http, https and ftp. Make sure you call Bio:DB::Big-init()> before running any remote calls.

PROXIES

The underlying library listens to the environment variable http_proxy to set proxies. If you need to go via a proxy please make sure you run something like export http_proxy=http://example.proxy:3128 or http_proxy=http://example.proxy:3128 perl script.pl and set the proxy before the Perl command is run.

INFLUENCING CURL OPTIONS

libBigWig uses libcurl to do its communication. Alongside the above proxy influencing you can alter three variables from this library; the timeout, if you want to follow 301 and 302 headers and if you want to ignore problematic/incorrect/wrong secure certificates. More information is given in the class methods below.

COORDINATE SYSTEMS USED IN THIS LIBRARY

This code is based on UCSC formats. Therefore all coordinates reported are expressed in 0-based, half-open. This means that a genomic coordinate displayed on UCSC or Ensembl e.g. chr1:1-100 is represented as chr1 0 100. To convert from 0-based, half-open to 1-base, fully-closed add 1 to the start.

CLASS METHODS

Bio::DB::Big->init();

Initalises libBigWig. Essential to call if you are going to load remote files. Consider doing this once in a BEGIN block in your code.

Bio::DB::Big->timeout(0);

Sets the libcurl timeout in milliseconds. Setting this to 0 means there is no timeout. See libcurl's CURLOPT_TIMEOUT_MS value for more information.

This is a global variable for the entire library.

Bio::DB::Big->follow_redirects(1);

By default libcurl will not follow 301 or 302 error codes. Switching this on will force it to follow them. See libcurl's CURLOPT_FOLLOWLOCATION value for more information.

This is a global variable for the entire library.

Bio::DB::Big->verify_ssl(1);

Forces libcurl to verify the remote SSL/TLS certificates. By default this is true. Setting it to false will allow any HTTPS communication to occur irrelevant of the attached certificate. See libcurl's CURLOPT_SSL_VERIFYPEER value for more information.

This is a global variable for the entire library.

my $bf = Bio::DB::Big->open('/path/to/big.file');

Perl method that wraps two methods from Bio::DB::Big::File. File type is sniffed using test_big_wig(). If true we open the file using open_big_wig(). If not we open using open_big_bed(). The caller can then use is_big_wig() or is_big_bed() to assert the type of file now available.

WORKING WITH BIG FILES

See Bio::DB::Big::File for more information on the routines available.

WORKING WITH AUTOSQL

See Bio::DB::Big::AutoSQL for more information on routines available. Also see Bio::DB::Big::File for the method get_autosql().

EXCEPTIONS

This library will raise exceptions as and when errors occur. You can trap them using eval or equivalent methods. The following are the class of exceptions raised (identified by the exception's prefix)

Invalid operation

Tried to use a bigwig method on a bigbed file or vice-versa

Open error

An error occured whilst trying to open a file

Invalid type

Unknown summary type given for statistics generation

Invalid chromosome

The chromosome was not found in this file

Invalid range

The specified range was incorrect. Normally caused when start is greater than end or end is greater than the chromosome length

Fetch error

Could not retrieve the requested region

Parse error

Could not parse a record. Normally happens with AutoSQL work.

Config error

Incorrect value or configuration given to a module.

SEE ALSO

Bio::DB::BigFile

LICENSE

Copyright [2015-2017] EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.