The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
ziplimit.txt

A) Hard limits of the Zip archive format:

   Number of entries in Zip archive:            64 k (2^16 - 1 entries)
   Compressed size of archive entry:            4 GByte (2^32 - 1 Bytes)
   Uncompressed size of entry:                  4 GByte (2^32 - 1 Bytes)
   Size of single-volume Zip archive:           4 GByte (2^32 - 1 Bytes)
   Per-volume size of multi-volume archives:    4 GByte (2^32 - 1 Bytes)
   Number of parts for multi-volume archives:   64 k (1^16 - 1 parts)
   Total size of multi-volume archive:          256 TByte (4G * 64k)

   The number of archive entries and of multivolume parts are limited by
   the structure of the "end-of-central-directory" record, where the these
   numbers are stored in 2-Byte fields.
   Some Zip and/or UnZip implementations (for example Info-ZIP's) allow
   handling of archives with more than 64k entries.  (The information
   from "number of entries" field in the "end-of-central-directory" record
   is not really neccessary to retrieve the contents of a Zip archive;
   it should rather be used for consistency checks.)

   Length of an archive entry name:             64 kByte (2^16 - 1)
   Length of archive member comment:            64 kByte (2^16 - 1)
   Total length of "extra field":               64 kByte (2^16 - 1)
   Length of a single e.f. block:               64 kByte (2^16 - 1)
   Length of archive comment:                   64 KByte (2^16 - 1)

   Additional limitation claimed by PKWARE:
     Size of local-header structure (fixed fields of 30 Bytes + filename
      local extra field):                     < 64 kByte
     Size of central-directory structure (46 Bytes + filename +
      central extra field + member comment):  < 64 kByte

   Note:
   In 2001, PKWARE has published version 4.5 of the Zip format specification
   (together with the release of PKZIP for Windows 4.5).  This specification
   defines new extra field blocks that allow to break the size limits of the
   standard zipfile structures.  In this extended "Zip64" format, the limits
   on the size of zip entries and the size of the complete zip archive are
   extended to (2^64 - 1) Bytes; the maximum number of archive entries and
   split volumes are enlarged to (2^64 - 1) respective (2^32 - 1).
   Currently, these extensions are not yet supported by the released Info-ZIP
   software. However, new major releases (Zip 3.0 and UnZip 6.0) are under
   development and will support Zip64 archives on selected environments.
   (Beta releases are already available for Unix, VMS and Win32.)

B) Implementation limits of UnZip:

 1. Size limits caused by file I/O and decompression handling:
   Size of Zip archive:                 2 GByte (2^31 - 1 Bytes)
   Compressed size of archive entry:    2 GByte (2^31 - 1 Bytes)

   Note: On some systems, UnZip may support archive sizes up to 4 GByte.
         To get this support, the target environment has to meet the following
         requirements:
         a) The compiler's intrinsic "long" data types must be able to hold
            integer numbers of 2^32. In other words - the standard intrinsic
            integer types "long" and "unsigned long" have to be wider than
            32 bit.
         b) The system has to supply a C runtime library that is compatible
            with the more-than-32-bit-wide "long int" type of condition a)
         c) The standard file positioning functions fseek(), ftell() (and/or
            the Unix style lseek() and tell() functions) have to be capable
            to move to absolute file offsets of up to 4 GByte from the file
            start.
         On 32-bit CPU hardware, you generally cannot expect that a C compiler
         provides a "long int" type that is wider than 32-bit. So, many of the
         most popular systems (i386, PowerPC, 680x0, et. al) are out of luck.
         You may find environment that provide all requirements on systems
         with 64-bit CPU hardware. Examples might be Cray number crunchers
         or Compaq (former DEC) Alpha AXP machines.

   The number of Zip archive entries is unlimited. The "number-of-entries"
   field of the "end-of-central-dir" record is checked against the "number
   of entries found in the central directory" modulus 64k (2^16).

   Multi-volume archive extraction is not supported.

   Memory requirements are mostly independent of the archive size
   and archive contents.
   In general, UnZip needs a fixed amount of internal buffer space
   plus the size to hold the complete information of the currently
   processed entry's local header. Here, a large extra field
   (could be up to 64 kByte) may exceed the available memory
   for MSDOS 16-bit executables (when they were compiled in small
   or medium memory model, with a fixed 64kByte limit on data space).

   The other exception where memory requirements scale with "larger"
   archives is the "restore directory attributes" feature. Here, the
   directory attributes info for each restored directory has to be held
   in memory until the whole archive has been processed. So, the amount
   of memory needed to keep this info scales with the number of restored
   directories and may cause memory problems when a lot of directories
   are restored in a single run.

C) Implementation limits of the Zip executables:

 1. Size limits caused by file I/O and compression handling:
   Size of Zip archive:                 2 GByte (2^31 - 1 Bytes)
   Compressed size of archive entry:    2 GByte (2^31 - 1 Bytes)
   Uncompressed size of entry:          2 GByte (2^31 - 1 Bytes),
                                        (could/should be 4 GBytes...)
   Multi-volume archive creation is not supported.

 2. Limits caused by handling of archive contents lists

 2.1. Number of archive entries (freshen, update, delete)
     a) 16-bit executable:              64k (2^16 -1) or 32k (2^15 - 1),
                                        (unsigned vs. signed type of size_t)
     a1) 16-bit executable:             <16k ((2^16)/4)
         (The smaller limit a1) results from the array size limit of
         the "qsort()" function.)
         32-bit executables             <1G ((2^32)/4)
         (usual system limit of the "qsort()" function on 32-bit systems)

     b) stack space needed by qsort to sort list of archive entries

     NOTE: In the current executables, overflows of limits a) and b) are NOT
           checked!

     c) amount of free memory to hold "central directory information" of
        all archive entries; one entry needs:
        96 bytes (32-bit) resp. 80 bytes (16-bit)
        + 3 * length of entry name
        + length of zip entry comment (when present)
        + length of extra field(s) (when present, e.g.: UT needs 9 bytes)
        + some bytes for book-keeping of memory allocation

   Conclusion:
     For systems with limited memory space (MSDOS, small AMIGAs, other
     environments without virtual memory), the number of archive entries
     is most often limited by condition c).
     For example, with approx. 100 kBytes of free memory after loading and
     initializing the program, a 16-bit DOS Zip cannot process more than 600
     to 1000 (+) archive entries.  (For the 16-bit Windows DLL or the 16-bit
     OS/2 port, limit c) is less important because Windows or OS/2 executables
     are not restricted to the 1024k area of real mode memory.  These 16-bit
     ports are limited by conditions a1) and b), say: at maximum approx.
     16000 entries!)


 2.2. Number of "new" entries (add operation)
     In addition to the restrictions above (2.1.), the following limits
     caused by the handling of the "new files" list apply:

     a) 16-bit executable:              <16k ((2^64)/4)

     b) stack size required for "qsort" operation on "new entries" list.

     NOTE: In the current executables, the overflow checks for these limits
           are missing!

     c) amount of free memory to hold the directory info list for new entries;
        one entry needs:
        24 bytes (32-bit) resp. 22 bytes (16-bit)
        + 3 * length of filename

D) Some technical remarks:

 1. The 2GByte size limit on archive files is a consequence of the portable
    C implementation of the Info-ZIP programs.
    Zip archive processing requires random access to the archive file for
    jumping between different parts of the archive's structure.
    In standard C, this is done via stdio functions fseek()/ftell() resp.
    unix-io functions lseek()/tell(). In many (most?) C implementations,
    these functions use "signed long" variables to hold offset pointers
    into sequential files. In most cases, this is a signed 32-bit number,
    which is limited to ca. 2E+09. There may be specific C runtime library
    implementations that interpret the offset numbers as unsigned, but for
    us, this is not reliable in the context of portable programming.

 2. The 2GByte limit on the size of a single compressed archive member
    is again a consequence of the implementation in C.
    The variables used internally to count the size of the compressed
    data stream are of type "long", which is guaranted to be at least
    32-bit wide on all supported environments.

    But, why do we use "signed" long and not "unsigned long"?

    Throughout the I/O handling of the compressed data stream, the
    sign bit of the "long" numbers is (mis-)used as a kind of overflow
    detection. In the end, this is caused by the fact that standard C
    lacks any overflow checking on integer arithmetics and does not
    support access to the underlying hardware's overflow detection
    (the status bits, especially "carry" and "overflow" of the CPU's
    flags-register) in a system-independent manner.

    So, we "misuse" the most-significant bit of the compressed data
    size counters as carry bit for efficient overflow/underflow detection.
    We could change the code to a different method of overflow detection,
    by using a bunch of "sanity" comparisons (kind of "is the calculated
    result plausible when compared with the operands"). But, this would
    "blow up" the code of the "inner loop", with remarkable loss of
    processing speed. Or, we could reduce the amount of consistency checks
    of the compressed data (e.g. detection of premature end of stream) to
    an absolute minimum, at the cost of the programs' stability when
    processing corrupted data.

    Summary: Changing the compression/decompression core routines to
    be "unsigned safe" would require excessive recoding, with little
    gain on maximum processable uncompressed size (a gain can only be
    expected for hardly compressable data), but at severe costs on
    performance, stability and maintainability.  Therefore, it is
    quite unlikely that this will ever happen for Zip/UnZip.

    The argumentation above is somewhat out-dated. The new releases
    Zip 3 and UnZip 6 will support archive sizes larger than 4GB on
    systems where the required underlying support for 64-bit file offsets
    and file sizes is available from the OS (and the C runtime environment).
    However, this new support will partially break compatibility with
    older "legacy" systems.  And it should be expected that the portability
    and readability of the UnZip and Zip code may be reduced due to the
    extensive use of non-standard language extension needed for 64-bit
    support on the major target systems.

Please report any problems to:  Zip-Bugs at www.info-zip.org

Last updated:  22 February 2005, Christian Spieler