The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Netscape::Cache - object class for accessing Netscape cache files

SYNOPSIS

The object oriented interface:

    use Netscape::Cache;

    $cache = new Netscape::Cache;
    while (defined($url = $cache->next_url)) {
        print $url, "\n";
    }

    while (defined($o = $cache->next_object)) {
        print
          $o->{'URL'}, "\n",
          $o->{'CACHEFILE'}, "\n",
          $o->{'LAST_MODIFIED'}, "\n",
          $o->{'MIME_TYPE'}, "\n";
    }

The TIEHASH interface:

    use Netscape::Cache;

    tie %cache, 'Netscape::Cache';
    foreach (sort keys %cache) { 
        print $cache{$_}->{URL}, "\n";
    }

DESCRIPTION

The Netscape::Cache module implements an object class for accessing the filenames and URLs of the cache files used by the Netscape web browser.

Note: You can also use the undocumented pseudo-URLs about:cache, about:memory-cache and about:global-history to access your disk cache, memory cache and global history.

There is also an interface for using tied hashes.

Netscape uses the old Berkeley DB format (version 1.85) for its cache index index.db. Versions 2 and newer of Berkeley DB are incompatible with the old format (db_intro(3)), so you have either to downgrade or to convert the database using db_dump185 and db_load. See convert_185_2xx for a (experimental) converter function.

CONSTRUCTOR

    $cache = new Netscape::Cache(-cachedir => "$ENV{HOME}/.netscape/cache");

This creates a new instance of the Netscape::Cache object class. The -cachedir argument is optional. By default, the cache directory setting is retrieved from ~/.netscape/preferences. The index file is normally named index.db on Unix systems and FAT.DB on Microsoft systems. It may be changed with the -index argument.

If the Netscape cache index file does not exist, a warning message will be generated, and the constructor will return undef.

METHODS

The Netscape::Cache class implements the following methods:

  • rewind - reset cache index to first URL

  • next_url - get next URL from cache index

  • next_object - get next URL as a full Netscape::Cache::Object from cache index

  • get_object - get a Netscape::Cache::Object for a given URL

Each of the methods are described separately below.

next_url

    $url = $history->next_url;

This method returns the next URL from the cache index. Unlike Netscape::History, this method returns a string and not an URI::URL-like object.

This method is faster than next_object, since it does only evaluate the URL of the cached file.

next_object

    $cache->next_object;

This method returns the next URL from the cache index as a Netscape::Cache::Object object. See below for accessing the components (cache filename, content length, mime type and more) of this object.

get_object

    $cache->get_object;

This method returns the Netscape::Cache::Object object for a given URL. If the URL does not live in the cache index, then the returned value will be undefined.

delete_object

Deletes URL from cache index and the related file from the cache.

WARNING: Do not use delete_object while in a next_object loop! It is better to collect all objects for delete in a list and do the deletion after the loop, otherwise you can get strange behavior (e.g. malloc panics).

rewind

    $cache->rewind();

This method is used to move the internal pointer of the cache index to the first URL in the cache index. You do not need to bother with this if you have just created the object, but it does not harm anything if you do.

get_object_by_cachefile

    $o = $cache->get_object_by_cachefile($cachefile);

Finds the corresponding entry for a cache file and returns the object, or undef if there is no such $cachefile. This is useful, if you find something in your cache directory by using grep and you want to know the URL and other attributes of this file.

WARNING: Do not use this method while iterating with get_url, get_object or each, because this method does iterating itself and would mess up the previous iteration.

get_object_by_cachefile

    $url = $cache->get_url_by_cachefile($cachefile);

Finds the corresponding URL for a cache file. This method is implemented using get_object_by_cachefile.

convert_185_2xx

    $newindex = Netscape::Cache::convert_185_2xx($origindex [, $tmploc])

This is a (experimental) utility for converting index.db to the new Berkeley DB 2.x.x format. Note that this function will not overwrite the original index.db, but rather copy the converted index to $tmploc or /tmp/index.$$.db, if $tmploc is not given. convert_185_2xx returns the filename of the new created index file. The converted index is only temporary, and all write access is useless.

Usage example:

    my $newindex = Netscape::Cache::convert_185_2xx($indexfile);
    my $o = new Netscape::Cache -index => $newindex;

Netscape::Cache::Object

next_object and get_object return an object of the class Netscape::Cache::Object. This object is simply a hash, which members have to be accessed directly (no methods).

An example:

    $o = $cache->next_object;
    print $o->{'URL'}, "\n";
URL

The URL of the cached object

COMPLETE_URL

The complete URL with the query string attached (only Netscape 4.x).

CACHEFILE

The filename of the cached URL in the cache directory. To construct the full path use ($cache is a Netscape::Cache object and $o a Netscape::Cache::Object object)

    $cache->{'CACHEDIR'} . "/" . $o->{'CACHEFILE'}
CACHEFILE_SIZE

The size of the cache file.

CONTENT_LENGTH

The length of the cache file as specified in the HTTP response header. In general, SIZE and CONTENT_LENGTH are equal. If you interrupt a transfer of a file, only the first part of the file is written to the cache, resulting in a smaller CONTENT_LENGTH than SIZE.

LAST_MODIFIED

The date of last modification of the URL as unix time (seconds since epoch). Use

    scalar localtime $o->{'LAST_MODIFIED'}

to get a human readable date.

LAST_VISITED

The date of last visit.

EXPIRE_DATE

If defined, the date of expiry for the URL.

MIME_TYPE

The MIME type of the URL (eg. text/html or image/jpeg).

ENCODING

The encoding of the URL (eg. x-gzip for gzipped data).

CHARSET

The charset of the URL (eg. iso-8859-1).

NS_VERSION

The version of Netscape which created this cache file (3 for Netscape 2.x and 3.x, 4 for Netscape 4.0x and 5 for Netscape 4.5).

AN EXAMPLE PROGRAM

This program loops through all cache objects and prints a HTML-ified list. The list is sorted by URL, but you can sort it by last visit date or size, too.

    use Netscape::Cache;

    $cache = new Netscape::Cache;

    while ($o = $cache->next_object) {
        push(@url, $o);
    }
    # sort by name
    @url = sort {$a->{'URL'} cmp $b->{'URL'}} @url;
    # sort by visit time
    #@url = sort {$b->{'LAST_VISITED'} <=> $a->{'LAST_VISITED'}} @url;
    # sort by mime type
    #@url = sort {$a->{'MIME_TYPE'} cmp $b->{'MIME_TYPE'}} @url;
    # sort by size
    #@url = sort {$b->{'CACHEFILE_SIZE'} <=> $a->{'CACHEFILE_SIZE'}} @url;

    print "<ul>\n";
    foreach (@url) {
        print
          "<li><a href=\"file:",
          $cache->{'CACHEDIR'}, "/", $_->{'CACHEFILE'}, "\">",
          $_->{'URL'}, "</a> ",
          scalar localtime $_->{'LAST_VISITED'}, "<br>",
          "type: ", $_->{'MIME_TYPE'}, 
          ",size: ", $_->{'CACHEFILE_SIZE'}, "\n";
    }
    print "</ul>\n";

FORMAT OF index.db

Here is a short description of the format of index.db. All integers are in VAX byte order (little endian). Time is specified as seconds since epoch.

    Key:

    Offset  Type/Length  Description

    0       long         Length of key entry
    4       long         Length of URL with trailing \0
    8       string       URL (null-terminated)
    +0      string       filled with \0

    Value:

    Offset  Type/Length  Description

    0       long         Length of value entry
    4       long         A version number (see NS_VERSION)
    8       long         Last modified
    12      long         Last visited
    16      long         Expire date
    20      long         Size of cachefile
    24      ...          Unknown
    29      long         Length of cache filename with trailing \0
    33      string       Cache filename (null-terminated)
    +0      ...          Unknown
    +33     long         Length of mime type with trailing \0
    +37     string       Mime type (null-terminated)
    +0      long         Length of content encoding with trailing \0
    +4      string       Content encoding (null-terminated)
    +0      long         Length of charset with trailing \0
    +4      string       Charset (null-terminated)
    +0      ...          Unknown
    +1      long         Content length
    +5      long         Length of the complete URL with trailing \0
    +9      string       Complete URL (null-terminated)

ENVIRONMENT

The Netscape::Cache module examines the following environment variables:

HOME

Home directory of the user, used to find Netscape's preferences ($HOME/.netscape). Otherwise, if not set, retrieve the home directory from the passwd file.

BUGS

There are still some unknown fields (_XXX_FLAG_{2,3,4}).

You can't use delete_object while looping with next_object. See the question "What happens if I add or remove keys from a hash while iterating over it?" in perlfaq4.

keys() or each() on the tied hash are slower than the object oriented equivalents next_object or next_url.

SEE ALSO

Netscape::History

AUTHOR

Slaven Rezic <eserte@cs.tu-berlin.de>

Thanks to: Fernando Santagata <lac0658@iperbole.bologna.it>

COPYRIGHT

Copyright (c) 1997 Slaven Rezic. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.