The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

VCF.pm. Module for validation, parsing and creating VCF files. Supported versions: 3.2, 3.3, 4.0, 4.1, 4.2

SYNOPSIS

From the command line: perl -MVCF -e validate example.vcf perl -I/path/to/the/module/ -MVCF -e validate_v32 example.vcf

From a script: use VCF;

    my $vcf = VCF->new(file=>'example.vcf.gz',region=>'1:1000-2000');
    $vcf->parse_header();

    # Do some simple parsing. Most thorough but slowest way how to get the data.
    while (my $x=$vcf->next_data_hash())
    {
        for my $gt (keys %{$$x{gtypes}})
        {
            my ($al1,$sep,$al2) = $vcf->parse_alleles($x,$gt);
            print "\t$gt: $al1$sep$al2\n";
        }
        print "\n";
    }

    # This will split the fields and print a list of CHR:POS
    while (my $x=$vcf->next_data_array())
    {
        print "$$x[0]:$$x[1]\n";
    }

    # This will return the lines as they were read, including the newline at the end
    while (my $x=$vcf->next_line())
    {
        print $x;
    }

    # Only the columns NA00001, NA00002 and NA00003 will be printed.
    my @columns = qw(NA00001 NA00002 NA00003);
    print $vcf->format_header(\@columns);
    while (my $x=$vcf->next_data_array())
    {
        # this will recalculate AC and AN counts, unless $vcf->recalc_ac_an was set to 0
        print $vcf->format_line($x,\@columns);
    }

    $vcf->close();

validate

    About   : Validates the VCF file.
    Usage   : perl -MVCF -e validate example.vcf.gz     # (from the command line)
              validate('example.vcf.gz');               # (from a script)
              validate(\*STDIN);
    Args    : File name or file handle. When no argument given, the first command line
              argument is interpreted as the file name.

validate_v32

    About   : Same as validate, but assumes v3.2 VCF version.
    Usage   : perl -MVCF -e validate_v32 example.vcf.gz     # (from the command line)
    Args    : File name or file handle. When no argument given, the first command line
              argument is interpreted as the file name.

new

    About   : Creates new VCF reader/writer.
    Usage   : my $vcf = VCF->new(file=>'my.vcf', version=>'3.2');
    Args    :
                fh      .. Open file handle. If neither file nor fh is given, open in write mode.
                file    .. The file name. If neither file nor fh is given, open in write mode.
                region  .. Optional region to parse (requires tabix indexed VCF file)
                silent  .. Unless set to 0, warning messages may be printed.
                strict  .. Unless set to 0, the reader will die when the file violates the specification.
                version .. If not given, '4.0' is assumed. The header information overrides this setting.

open

    About   : (Re)Open file. No need to call this explicitly unless reading from a different
              region is requested.
    Usage   : $vcf->open(); # Read from the start
              $vcf->open(region=>'1:12345-92345');
    Args    : region       .. Supported only for tabix indexed files

close

    About   : Close the filehandle
    Usage   : $vcf->close();
    Args    : none
        Returns : close exit status

next_line

    About   : Reads next VCF line.
    Usage   : my $vcf = VCF->new();
              my $x   = $vcf->next_line();
    Args    : none

next_data_array

    About   : Reads next VCF line and splits it into an array. The last element is chomped.
    Usage   : my $vcf = VCF->new();
              $vcf->parse_header();
              my $x = $vcf->next_data_array();
    Args    : Optional line to parse

set_samples

    About   : Parsing big VCF files with many sample columns is slow, not parsing unwanted samples may speed things a bit.
    Usage   : my $vcf = VCF->new();
              $vcf->set_samples(include=>['NA0001']);   # Exclude all but this sample. When the array is empty, all samples will be excluded.
              $vcf->set_samples(exclude=>['NA0003']);   # Include only this sample. When the array is empty, all samples will be included.
              my $x = $vcf->next_data_hash();
    Args    : Optional line to parse