Bio::DB::Big - Interface to BigWig and BigBed files via libBigWig
use Bio::DB::Big; use Bio::DB::Big::AutoSQL; # Setup CURL buffers Bio::DB::Big->init(); my $bw = Bio::DB::Big->open('path/to/file.bw'); # Generic: get the type if($bw->is_big_wig()) { print "We have a bigwig file\n"; } # Generic: Get headers my $header = $bw->header(); printf("Working with %d zoom levels", $header->{nLevels}); # Generic: Get chromosomes (comes back as a hash {chrom => length}) my $chroms = $bw->chroms(); #Get stats, values and intervals if($bw->has_chrom('chr1')) { my $bins = 10; # uses the zoom levels and returns an array of 10 bins over chromsome positions 1-100 my $stats = $bw->get_stats('chr1', 0, 100, $bins, 'mean'); foreach my $s (@{$stats}) { printf("%f\n", $s); } # Go directly to the raw level and calc on that but ask for maximum value per bin this time my $full_stats = $bw->get_stats('chr1', 0, 100, $bins, 'max', 1); # Get a value for each base over chromsome positions 1 - 100. Values can be undef if not set my $values = $bw->get_values('chr1', 0, 100); # Get the real intervals where a value was assigned my $intervals = $bw->get_intervals('chr1', 0, 100); foreach my $i (@{$intervals}) { printf("%d - %d: %f\n", $i->{start}, $i->{end}, $i->{value}) } # Or iterate which allows you to move through a file without loading everything into memory my $blocks_per_iter = 10; my $iter = $bw->get_intervals_iterator('chr1', 0, 100, $blocks_per_iter); while(my $intervals = $iter->next()) { foreach my $i (@{$intervals}) { printf("%d - %d: %f\n", $i->{start}, $i->{end}, $i->{value}) } } } my $bb = Bio::DB::Big->open('http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb'); if($bb->is_big_bed) { my $with_string = 1; # Optionally you do not retrieve the "string" if you don't want to potenitally saving memory my $entries = $bb->get_entries('chr21', 9000000, 10000000, $with_string); foreach my $e (@{$entries}) { printf("%d - %d: %s\n", $e->{start}, $e->{end}, $e->{string}); } # Or you can use an iterator my $blocks_per_iter = 10; my $iter = $bb->get_entries_iterator('chr21', 0, $bb->chrom_length('chr21'), $with_string, $blocks_per_iter); while(my $entries = $iter->next()) { foreach my $e (@{$entries}) { printf("%d - %d: %s\n", $e->{start}, $e->{end}, $e->{string}); } } # Finally you can request AutoSQL and parse if available if($bb->get_autosql()) { my $autosql = $bb->get_autosql(); my $as = Bio::DB::Big::AutoSQL->new($autosql); if($as->has_field('name')) { printf("%s: The field 'name' is in position %d\n", $as->name(), $as->get_field('name')->position()); } # Or just get all fields as an arrayref my $fields = $as->fields(); } }
This library provides access to the BigWig and BigBed file formats designed by UCSC. However rather than use kent libraries this uses libBigWig from https://github.com/dpryan79/libBigWig as it provides an implementation that avoids exiting when errors happen. libBigWig provides access to BigWig summaries, values and intervals alongside providing access to BigBed entries.
This implementation is read-only. Patches to give it write ability are welcomed however at the time of writing libBigWig only supports writing to BigWigs.
In addition there are a number of AutoSQL parsing objects implemented in Perl to provide some rough parsing capability when handling AutoSQL attached to a BigBed file. These are experimental but seem to work on a wide range of example AutoSQL fields.
Should you wish to use the kent library please consult Bio::DB::BigFile, which is a very complete set of bindings into kent.
Installation requires the following libraries to be made available
We assume that libcurl is installed to a central location and is a requirement for libBigWig (especially if you want to access remote files). libBigWig can be located via the following mechanisms:
--libbigwig=/path/to/libbigwig
Build.PL
LIBBIGWIG_DIR
--prefix
pkg-config
/usr, /usr/local, /usr/share, /opt/local
Build.PL looks to see if we can find BigWig.h and libBigWig.a in one of the above locations resolved in the above order. If we cannot find the library then compilation will fail.
BigWig.h
libBigWig.a
If you have compiled libBigWig against libcurl then you can access big files over http, https and ftp. Make sure you call Bio:DB::Big-init()> before running any remote calls.
Bio:DB::Big-
The underlying library listens to the environment variable http_proxy to set proxies. If you need to go via a proxy please make sure you run something like export http_proxy=http://example.proxy:3128 or http_proxy=http://example.proxy:3128 perl script.pl and set the proxy before the Perl command is run.
http_proxy
export http_proxy=http://example.proxy:3128
http_proxy=http://example.proxy:3128 perl script.pl
libBigWig uses libcurl to do its communication. Alongside the above proxy influencing you can alter three variables from this library; the timeout, if you want to follow 301 and 302 headers and if you want to ignore problematic/incorrect/wrong secure certificates. More information is given in the class methods below.
This code is based on UCSC formats. Therefore all coordinates reported are expressed in 0-based, half-open. This means that a genomic coordinate displayed on UCSC or Ensembl e.g. chr1:1-100 is represented as chr1 0 100. To convert from 0-based, half-open to 1-base, fully-closed add 1 to the start.
chr1:1-100
chr1 0 100
Initalises libBigWig. Essential to call if you are going to load remote files. Consider doing this once in a BEGIN block in your code.
Sets the libcurl timeout in milliseconds. Setting this to 0 means there is no timeout. See libcurl's CURLOPT_TIMEOUT_MS value for more information.
This is a global variable for the entire library.
By default libcurl will not follow 301 or 302 error codes. Switching this on will force it to follow them. See libcurl's CURLOPT_FOLLOWLOCATION value for more information.
Forces libcurl to verify the remote SSL/TLS certificates. By default this is true. Setting it to false will allow any HTTPS communication to occur irrelevant of the attached certificate. See libcurl's CURLOPT_SSL_VERIFYPEER value for more information.
Perl method that wraps two methods from Bio::DB::Big::File. File type is sniffed using test_big_wig(). If true we open the file using open_big_wig(). If not we open using open_big_bed(). The caller can then use is_big_wig() or is_big_bed() to assert the type of file now available.
test_big_wig()
open_big_wig()
open_big_bed()
is_big_wig()
is_big_bed()
See Bio::DB::Big::File for more information on the routines available.
See Bio::DB::Big::AutoSQL for more information on routines available. Also see Bio::DB::Big::File for the method get_autosql().
get_autosql()
This library will raise exceptions as and when errors occur. You can trap them using eval or equivalent methods. The following are the class of exceptions raised (identified by the exception's prefix)
Tried to use a bigwig method on a bigbed file or vice-versa
An error occured whilst trying to open a file
Unknown summary type given for statistics generation
The chromosome was not found in this file
The specified range was incorrect. Normally caused when start is greater than end or end is greater than the chromosome length
Could not retrieve the requested region
Could not parse a record. Normally happens with AutoSQL work.
Incorrect value or configuration given to a module.
Bio::DB::BigFile
Copyright [2015-2017] EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
To install Bio::DB::Big, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::DB::Big
CPAN shell
perl -MCPAN -e shell install Bio::DB::Big
For more information on module installation, please visit the detailed CPAN module installation guide.