The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Gzip::Faster - simple and fast gzip and gunzip

SYNOPSIS

    # Make a random input string
    my $input = join '', map {int (rand (10))} 0..0x1000;
    use Gzip::Faster;
    # Compress the random string.
    my $gzipped = gzip ($input);
    # Uncompress it again.
    my $roundtrip = gunzip ($gzipped);
    # Put it into a file.
    gzip_to_file ($input, 'file.gz');
    # Retrieve it again from the file.
    $roundtrip = gunzip_file ('file.gz');

VERSION

This documents version 0.21 of Gzip::Faster corresponding to git commit 0b018b1a4df5f509a16c4262936b601382b6c3d6 made on Fri Dec 29 10:54:29 2017 +0900.

DESCRIPTION

This module compresses to and decompresses from the gzip and related formats. See "About gzip" if you aren't familiar with the gzip format.

The basic functions of the module are "gzip" and "gunzip", which convert scalars to and from gzip format. There are also three convenience functions built on these two: "gzip_file" reads a file then compresses it; "gunzip_file" reads a file then uncompresses it; and "gzip_to_file" compresses a scalar and writes it to a file.

Further to this, "deflate" and "inflate" work with the "deflate format", which is the same as the gzip format except it has no header information. "deflate_raw" and "inflate_raw" work with the bare-bones version of this format without checksums.

If you need to control the compression beyond what is offered by "gzip" and "gunzip", create a Gzip::Faster object using "new", and compress and uncompress using the "zip" and "unzip" methods. The type of compression can be toggled with "gzip_format" and "raw". A file name can be set and retrieved from the gzip header with "file_name", and the modification time of the file can be set and retrieved with "mod_time". The level of compression can be altered with "level". Perl flags can be copied into the gzip header using "copy_perl_flags".

FUNCTIONS

gzip

    my $zipped = gzip ($plain);

This compresses $plain into the gzip format. The return value is the compressed version of $plain.

gunzip

    my $plain = gunzip ($zipped);

This uncompresses $zipped and returns the result of the uncompression. It prints a warning and returns the undefined value if $zipped is the undefined value or an empty string. It throws a fatal error if $zipped is not in the gzip format.

gzip_file

    my $zipped = gzip_file ('file');

This reads the contents of file into memory and then runs "gzip" on the file's contents. The return value and the possible errors are the same as "gzip", plus this may also throw an error if open fails. To write a file name, use

    my $zipped = gzip_file ('file', file_name => 'file');

The modification time can also be written:

    my $zipped = gzip_file ('file', file_name => 'file', mod_time => time ());

gzip_file was added in version 0.04. File name writing was added in version 0.18. Modification time writing was added in version 0.19.

gunzip_file

    my $plain = gunzip_file ('file.gz');

This reads the contents of file.gz into memory and then runs "gunzip" on the file's contents. The return value and the possible errors are the same as "gunzip", plus this may also throw an error if open fails. To retrieve a file name, use

    my $plain = gunzip_file ('file.gz', file_name => \my $file_name);

Note that you must provide a scalar reference to the file_name argument. This reference is filled in with the file name information from the header of file.gz. If file.gz does not contain any file name information, $file_name will contain the undefined value.

The modification time can also be read from the header:

    my $plain = gunzip_file ('file.gz', file_name => \my $file_name,
                             mod_time => \my $mod_time);

gunzip_file was added in version 0.04. File name reading was added in version 0.18. Modification time reading was added in version 0.19.

gzip_to_file

    gzip_to_file ($plain, 'file.gz');

This compresses $plain in memory using "gzip" and writes the compressed content to 'file.gz'. There is no return value. The errors are the same as "gzip", plus this may also throw an error if open fails. To write a file name, use

    gzip_to_file ($plain, 'file.gz', file_name => 'file');

gzip_to_file was added in version 0.08. File name writing was added in version 0.18. Modification time writing was added in version 0.19.

deflate

    my $deflated = deflate ($plain);

This compresses $plain into the deflate format. The deflate format is similar to the "gzip" format, except that it doesn't contain a gzip header. The output of deflate can be inflated either with "inflate" or with "gunzip".

To see an example of using "deflate" to write a PNG image file, see t/png.t in the module's tests.

deflate was added in version 0.16.

inflate

    my $inflated = inflate ($deflated);

This inflates the output of "deflate". For all practical purposes this is identical to "gunzip", and it's included only for completeness. In other words, you can use inflate and gunzip interchangeably.

inflate was added in version 0.16.

deflate_raw

This is similar to "deflate", except that it doesn't write check sum value in the data. The output is incompatible with "inflate" and "gunzip", and must be inflated with "inflate_raw".

deflate_raw was added in version 0.16.

inflate_raw

This inflates data output by "deflate_raw". It won't work on the output of "gzip" and "deflate". It prints a warning and returns the undefined value if its input is the undefined value or an empty string. It throws a fatal error if its input is not in the deflate format.

inflate_raw was added in version 0.16.

METHODS

This section describes the object-oriented interface of Gzip::Faster.

If you need to control the compression beyond what is offered by "gzip" and "gunzip", create a Gzip::Faster object using "new", and compress and uncompress using the "zip" and "unzip" methods. The type of compression can be toggled with "gzip_format" and "raw". A file name can be set and retrieved from the gzip header with "file_name", and the modification time of the file can be set and retrieved with "mod_time". The level of compression can be altered with "level". Perl flags can be copied into the gzip header using "copy_perl_flags".

new

    my $gf = Gzip::Faster->new ();

Create a Gzip::Faster object. The return value defaults to gzip compression. This can be altered with "gzip_format" and "raw".

new was added in version 0.16.

zip

    my $zipped = $gf->zip ($plain);

Compress $plain. The type of compression can be set with "gzip" and "raw".

zip was added in version 0.16.

unzip

    my $plain = $gf->unzip ($zipped);

Uncompress $zipped. The type of uncompression can be set with "gzip" and "raw".

unzip was added in version 0.16.

copy_perl_flags

    $gf->copy_perl_flags (1);

Copy some of the Perl flags (currently the utf8 flag) into the header of the gzipped data.

Please see "Browser bugs and Gzip::Faster" for reasons why you might not want to use this feature.

This feature of the module was restored in version 0.16.

file_name

    my $filename = $gf->file_name ();
    $gf->file_name ('this.gz');
    my $zipped = $gf->zip ($something);

Get or set the file name in the compressed output. The file name is a feature of the gzip format which is used, for example, when you use the command gzip -d file.gz. It tells gzip what to call the file after it's uncompressed.

The file_name method is only useful for the gzip format, since the deflate format does not have a header to store a name into. To prevent accidental re-use of a file name, when you set a file name with "file_name", then use "zip", the file name is deleted from the object, so it needs to be set each time "zip" is called. If you set a file name with "file_name" then call "unzip", that file name may be deleted.

The following example demonstrates storing and then retrieving the name:

    use Gzip::Faster;
    my $gf = Gzip::Faster->new ();
    $gf->file_name ("blash.gz");
    my $something = $gf->zip ("stuff");
    my $no = $gf->file_name ();
    if ($no) {
        print "WHAT?\n";
    }
    else {
        print "The file name has been deleted by the call to zip.\n";
    }
    my $gf2 = Gzip::Faster->new ();
    $gf2->unzip ($something);
    my $file_name = $gf2->file_name ();
    print "Got back file name $file_name\n";

produces output

    The file name has been deleted by the call to zip.
    Got back file name blash.gz

(This example is included as file-name.pl in the distribution.)

The module currently has a hard-coded limit of 1024 bytes as the maximum length of file name it can read back.

file_name was added in version 0.16.

gzip_format

    $gf->gzip_format (1);

Switch the compression between the "gzip format" and the "deflate format". A true value turns on the gzip format, and a false value turns on the deflate format. The default is gzip format. Switching on gzip format on an object automatically switches off "raw" format on the object.

gzip_format was added in version 0.16.

raw

    $gf->raw (1);

Switch between the raw deflate and deflate formats. A true value turns on the "raw deflate format", and a false value turns off the raw deflate format. Switching this on has the side effect of automatically switching off "gzip_format". Thus the sequence

    $gf->gzip_format (1);
    $gf->raw (1);
    $gf->raw (0);

puts $gf in the non-raw deflate format.

raw was added in version 0.16.

level

    $gf->level (9);

Set the level of compression, from 0 (no compression) to 9 (best compression). Values outside the levels cause a warning and the level to be set to the nearest valid value, for example a value of 100 causes the level to be set to 9. The higher the level of compression, the more time it takes to compute. The default value is a compromise between speed and quality of compression.

level was added in version 0.16.

mod_time

    $gf->mod_time (time ());
    my $mod_time = $gf->mod_time ();

Set or get the file modification time in the gzip header. The modification time is an unsigned integer which represents the number of seconds since the Unix epoch. This only applies to "gzip_format" compression.

mod_time was added in version 0.19.

PERFORMANCE

This section compares the performance of Gzip::Faster with IO::Compress::Gzip / IO::Uncompress::Gunzip and Compress::Raw::Zlib. These results are produced by the file bench/benchmarks.pl in the distribution.

Short text

This section compares the performance of Gzip::Faster and other modules on a short piece of English text. Gzip::Faster is about five times faster to load, seven times faster to compress, and twenty-five times faster to uncompress than IO::Compress::Gzip and IO::Uncompress::Gunzip. Round trips are about ten times faster with Gzip::Faster.

Compared to Compress::Raw::Zlib, load times are about one and a half times faster, round trips are about three times faster, compression is about two and a half times faster, and decompression is about six times faster.

The versions used in this test are as follows:

    $IO::Compress::Gzip::VERSION = 2.069
    $IO::Uncompress::Gunzip::VERSION = 2.069
    $Compress::Raw::Zlib::VERSION = 2.069
    $Gzip::Faster::VERSION = 0.19

The size after compression is as follows:

    IO::Compress:Gzip size is 830 bytes.
    Compress::Raw::Zlib size is 830 bytes.
    Gzip::Faster size is 830 bytes.

Here is a comparison of load times:

                Rate Load IOUG Load IOCG  Load CRZ   Load GF
    Load IOUG 25.3/s        --       -4%      -66%      -77%
    Load IOCG 26.5/s        5%        --      -65%      -76%
    Load CRZ  75.1/s      197%      184%        --      -31%
    Load GF    109/s      330%      311%       45%        --

Here is a comparison of a round-trip:

                           Rate IO::Compress::Gzip Compress::Raw::Zlib  Gzip::Faster
    IO::Compress::Gzip   1309/s                 --                -66%          -90%
    Compress::Raw::Zlib  3888/s               197%                  --          -70%
    Gzip::Faster        12929/s               888%                233%            --

Here is a comparison of gzip (compression) only:

                                    Rate IO::Compress::Gzip Compress::Raw::Zlib::Deflate Gzip::Faster
    IO::Compress::Gzip            2567/s                 --                         -60%         -86%
    Compress::Raw::Zlib::Deflate  6491/s               153%                           --         -65%
    Gzip::Faster                 18338/s               614%                         183%           --

Here is a comparison of gunzip (decompression) only:

                                    Rate IO::Uncompress::Gunzip Compress::Raw::Zlib::Inflate Gzip::Faster
    IO::Uncompress::Gunzip        2818/s                     --                         -74%         -96%
    Compress::Raw::Zlib::Inflate 10997/s                   290%                           --         -84%
    Gzip::Faster                 69565/s                  2368%                         533%           --

Long text

This section compares the compression on a 2.2 megabyte file of Chinese text, which is the Project Gutenberg version of Journey to the West, http://www.gutenberg.org/files/23962/23962-0.txt, with the header and footer text removed.

The versions used in this test are as above.

The sizes are as follows:

    IO::Compress:Gzip size is 995387 bytes.
    Compress::Raw::Zlib size is 995387 bytes.
    Gzip::Faster size is 995823 bytes.

Note that the size of the file compressed with the command-line gzip, with the default compression, is identical to the size with Gzip::Faster::gzip, except for the 12 bytes in the file version used to store the file name:

    $ gzip --keep chinese.txt
    $ ls -l chinese.txt.gz 
    -rw-r--r--  1 ben  ben  995835 Oct 20 18:52 chinese.txt.gz

Here is a comparison of a round-trip:

                          Rate IO::Compress::Gzip Compress::Raw::Zlib   Gzip::Faster
    IO::Compress::Gzip  4.43/s                 --                 -3%            -8%
    Compress::Raw::Zlib 4.57/s                 3%                  --            -5%
    Gzip::Faster        4.81/s                 9%                  5%             --

Here is a comparison of gzip (compression) only:

                                   Rate IO::Compress::Gzip Compress::Raw::Zlib::Deflate Gzip::Faster
    IO::Compress::Gzip           5.04/s                 --                           0%          -6%
    Compress::Raw::Zlib::Deflate 5.04/s                 0%                           --          -6%
    Gzip::Faster                 5.36/s                 6%                           6%           --

Here is a comparison of gunzip (decompression) only:

                                   Rate IO::Uncompress::Gunzip Compress::Raw::Zlib::Inflate Gzip::Faster
    IO::Uncompress::Gunzip       36.8/s                     --                         -18%         -20%
    Compress::Raw::Zlib::Inflate 45.1/s                    23%                           --          -1%
    Gzip::Faster                 45.7/s                    24%                           1%           --

For longer files, Gzip::Faster is not much faster. The underlying library's speed is the main factor.

BUGS

The module doesn't check whether the input of "gzip" is already gzipped, and it doesn't check whether the compression was effective. That is, it doesn't check whether the output of "gzip" is actually smaller than the input.

In "copy_perl_flags", only the utf8 flag is implemented. Possible other things which could be implemented are the read-only and the taint flags.

Browser bugs and Gzip::Faster

Some web browsers have bugs which may affect users of this module.

Using "copy_perl_flags" with utf8-encoded text trips a browser bug in the Firefox web browser where it produces a content encoding error message.

Using deflate rather than gzip compression trips browser bugs in older versions of Internet Explorer, which mistakenly say they can handle the deflate format, but in fact can only handle gzip format.

EXPORTS

The module exports "gzip", "gunzip", "gzip_file", "gunzip_file", and "gzip_to_file" by default. You can switch this blanket exporting off with

    use Gzip::Faster ();

or

    use Gzip::Faster 'gunzip';

whereby you only get gunzip and not the other functions exported. The functions "inflate", "deflate", "inflate_raw" and "deflate_raw" are exported on demand only. You can export all the functions from the module using

    use Gzip::Faster ':all';

DIAGNOSTICS

Data input to inflate is not in libz format

(Fatal) The data given to "gunzip", "inflate", "inflate_raw", or "unzip" was not in the compressed format.

Error opening '$file': $!

(Fatal) This may be produced by "gunzip_file", "gzip_file", or "gzip_to_file".

Error closing '$file': $!

(Fatal) This may be produced by "gunzip_file", "gzip_file", or "gzip_to_file".

wrong format: perl flags not copied: use gzip_format(1)

(Warning) The user tried to use "copy_perl_flags" together with deflate compression, which isn't possible. Use "gzip_format" with a true argument to allow "copy_perl_flags" to work.

wrong format: file name ignored: use gzip_format(1)

(Warning) The user tried to use "file_name" together with deflate compression, which isn't possible. Use "gzip_format" with a true argument to allow "file_name" to work.

Cannot set compression level to less than %d

(Warning) The user used "level" with a negative value.

Cannot set compression level to more than %d

(Warning) The user used "level" with a value greater than nine.

Cannot write file name to non-scalar reference

(Warning) The user's value for file_name in the optional argument to "gunzip_file" was not a scalar reference.

Empty input

(Warning) The user tried to compress or decompress the undefined value.

Attempt to (un)compress empty string

(Warning) The user tried to compress or decompress an empty string, as in

    my $out = inflate ('');

There are other diagnostic messages in the module to detect bugs. A list can be obtained by running the parse-diagnostics script which comes with Parse::Diagnostics on the files gzip-faster-perl.c and lib/Gzip/Faster.pm in the distribution.

INSTALLATION

Installation follows standard Perl methods. Detailed instructions can be found in the file README in the distribution. The following are some extra notes for people who get stuck.

Gzip::Faster requires the compression library "zlib" (also called libz) to be installed on your computer. The following message printed during perl Makefile.PL:

    You don't seem to have zlib available on your system.

or

    Warning (mostly harmless): No library found for -lz

or the following message at run-time:

    undefined symbol: inflate

indicate that Gzip::Faster was unable to link to libz.

Ubuntu Linux

On Ubuntu Linux, you may need to install zlib1g-dev using the following command:

    sudo apt-get install zlib1g-dev

Windows

Unfortunately at this time the module doesn't seem to install on ActiveState Perl. You can check the current status at http://code.activestate.com/ppm/Gzip-Faster/. However, the module seems to install without problems on Strawberry Perl, so if you cannot install via ActiveState, you could try that instead.

SEE ALSO

About gzip

The gzip and deflate formats are closely related formats for compressing information. They are used for compressing web pages to reduce the amount of data sent, for compressing source code files, or in applications such as MATLAB files or PNG images.

These formats are formally described by RFC 1950 (ZLIB Compressed Data Format Specification), RFC 1951 (DEFLATE Compressed Data Format Specification), and RFC 1952 (GZIP file format specification). The library "zlib" implements the formats.

Alternatives

The following alternatives to this module may also be useful.

Command-line gzip

To use the command line utility gzip, use

    system ("gzip file");

or put the command in backquotes, like `gzip file`.

mod_deflate and mod_gzip

These are Apache web server modules which compress web outputs immediately after you produce them, and before sending to the user.

PerlIO::gzip

This is a Perl extension to provide a PerlIO layer to gzip/gunzip. That means you can just add :gzip when you open a file to read or write compressed files:

    open my $in, "<:gzip", 'file.gz'

    open my $out, ">:gzip", 'file.gz'

and you never have to deal with the gzip format.

IO::Zlib
Compress::Zlib
Compress::Raw::Zlib
CGI::Compress::Gzip
IO::Compress::Gzip and IO::Uncompress::Gunzip
Gzip::RandomAccess - extract arbitrary bits of a gzip stream
Compress::Zopfli

This is a compress-only library by Google for a gzip/deflate format compression.

EXTENDED EXAMPLES

This section gives some extended examples of the use of this module.

CGI output

Compressing CGI output with Perl and Gzip::Faster demonstrates how to use Gzip::Faster to compress the output of a web program.

Get compressed web pages

Requesting compressed content from a web server with LWP::UserAgent demonstrates how to use Gzip::Faster with LWP::UserAgent when requesting compressed content.

View the image data of a PNG

The following example demonstrates using "inflate" to view the image data within a PNG image. See "ACKNOWLEDGEMENTS" for credit.

    use File::Slurper 'read_binary';
    use FindBin '$Bin';
    use Gzip::Faster 'inflate';
    my $pngfile = "$Bin/larry-wall.png";
    my $pngdata = read_binary ($pngfile);
    if ($pngdata !~ /IHDR(.{13})/) {
        die "No header";
    }
    my ($height, $width, $bits) = unpack ("NNCCCCC", $1);
    if ($pngdata !~ /(....)IDAT(.*)$/s) {
        die "No image data";
    }
    my $length = unpack ("N", $1);
    my $data = substr ($2, 0, $length);
    my $idat = inflate ($data);
    for my $y (0..$height - 1) {
        my $row = substr ($idat, $y * ($width + 1), ($y + 1) * ($width + 1));
        for my $x (1..$width - 1) {
            my $pixel = substr ($row, $x, $x + 1);
            if (ord ($pixel) < 128) {
                print "#";
                next;
            }
            print " ";
        }
        print "\n";
    }

produces output

               ######              
             #########             
           #############           
          ###############          
          ################         
         ##################        
         ########   ########       
        #######      #######       
        ####          ######       
        ###           ######       
        ###           #######      
       ########    ##########      
       ####  ###    #  ######      
       #### # ##   #  ######       
       ####       #     ###        
        ###       #    ####        
                  ##   ###         
                  ##   ###         
              ######## ###         
             ##############        
            ##### #########        
            ## ## ##########       
             #   ##  ########      
             #       ##########    
          #####    ########### ### 
        ######     ################
      #########  ######  ##########
     ##########    ###   # ########
    # # #######    #     ##########
    #  ###### #          ##########

(This example is included as inflate.pl in the distribution.)

GLOSSARY

This section describes some of the terminology of the Gzip compression system.

deflate format

The deflate format is the same as the "gzip format" except that it does not contain the header with the additional information such as the file name and modification time. The deflate format may or may not include a checksum. If it does not include the checksum, it is the "raw deflate format". The deflate format is the format used within PNG images, for example.

gzip format

The gzip format is the same as the "deflate format" except that it includes a header which may contain such things as a file name or a modification time. The gzip format is the one used by the command-line utility gzip in such things as .tar.gz files.

raw deflate format

The raw deflate format is a form of the "deflate format" without an Adler-32 checksum. (The terminology "raw deflate" for this format is from the zlib manual and does not appear in the RFCs.)

zlib

zlib is the implementation of the gzip and deflate algorithms. zlib is necessary to install Gzip::Faster. It is described at http://zlib.net.

HISTORY

This module started as an experimental benchmark against IO::Compress::Gzip when profiling revealed that some programs were spending the majority of their time in IO::Compress::Gzip. Since I (Ben Bullock) knew that zlib was fast, I was surprised by the time the Perl code was taking. I wrote Gzip::Faster to test IO::Compress::Gzip against a simplistic gzip wrapper. I released the module to CPAN because the results were very striking. See "PERFORMANCE" above for details.

Gzip::Faster's ancestor is the example program zpipe supplied with zlib. See http://zlib.net/zpipe.c. Gzip::Faster is zpipe reading to and and writing from Perl scalars.

Version 0.16 added "deflate" and related functions and the object-oriented functions.

Version 0.18 added the ability to set and get file names to the g*zip*file functions, and version 0.19 added modification times.

Version 0.21 added warnings upon input of empty strings to "gzip", "gunzip", and friends.

ACKNOWLEDGEMENTS

zgrim reported an important bug related to zlib.

Aristotle Pagaltzis contributed the benchmarking code for Compress::Raw::Zlib.

The tests in t/png.t and the example "View the image data of a PNG" use material taken from Image::PNG::Write::BW by Andrea Nall (<ANALL>).

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2014-2017 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.