NAME
Image::Delivery - Efficient transformation and delivery of web images
INTRODUCTION
Many web applications generate or otherwise deliver graphics as part of
their interface. Getting the delivery of these images right is tricky,
and developers usually need to make trade-offs in order to get a usable
mechanism.
Image::Delivery is an extremely sophisticated module for delivering
these generated images. It is designed to be powerful, flexible,
extensible, scalable, secure, stable and correct, and use a minimum of
resources.
DESIGN
Because it can take a little bit of work to set up Image::Delivery, we
will start with a quick once-over of the design of the API, and the
reasons and use cases that drove it.
Preventing Multiple Server Calls
Use Case 1: CVS Monitor
The initial idea for Image::Delivery was due to some problems with
the design of CVS Monitor (L<http://ali.as/devel/cvsmonitor/), an advanced
but extremely resource-hungry MVC CGI application. Many of the CVS Monitor
views have a single large graph on them, which involves a second call to the
server that starts just before the previous call ends. Generating the graph
took minimal extra effort, but the overhead of starting another process and
loading another 100meg of data creates a double whammy hit to the server.
What would be ideal would be to generate both at once and have the browser
get the image without a CGI hit.
The solution to this problem, and the primary mechanism that
Image::Delivery implements could be called "Static Delivery via Cached
Disk", but is best demonstrated with the diagram outlined in General
Structure below.
Use Case 2: Thumbnails
One problem with thumbnailing is the vast number that need to be generated.
When done on demand, if generated by the image request, you will have large
numbers of processes working. The normal solution is to pre-generate the
thumbnails, potentially polluting image directories.
Image::Delivery stores all images in one central cache, so that the
original images are unaffected.
General Structure
Image Provider
|
|BLOB + TransformPath
|
\1/
Image::Delivery
| \
| |
| |
\2/ |
Hard Disk |
/5\ | |URI
| | |
| | |
| \6/ |
Web Server |
/4\ | /
| |gzip /
\ | /
\ \7/ \3/
Web Browser
1) Image Data pulled from Object/Provider
An Object, or a Provider that accesses the data from outside the API,
generates or obtains the image data and various metadata that describes
the image data.
2) Image Written to File-System
Image::Delivery writes the image to the filesystem with a specific file
name
3) URI sent to Browser in HTML
Image::Delivery determines the matching URI that points to the location
of the written file, and provides it to be used in an "img" tag in the
generated HTML page.
4) Web Browser Requests Image
Having received the HTML, the browser requests the image from the web
server.
5) Web Server Finds Image File
The web server receives the image request and finds the file that was
written at step 2)
6) Web Server Retrieves Image File
Web server reads the file like any other plain file
7) Web Server Sends File to Browser
Web server sends the file off to the browser
Digest::TransformPath
Image::Delivery works around source objects. Each source object may want
to work with more than one image, and each image may need to come in
several different versions. In short, there can be lots of variations of
images.
To handle this, we utilise (or SHOULD utilise) Digest::TransformPath to
help identify the images, with a 10 digit digest built into the
filename.
Might as Well Cache Them
Since we went to all that effort to write the file, its relatively easy
to add caching. But the most important thing if we are going to cache is
to have a good file naming scheme.
Image::Delivery Naming Scheme
In order to make this all work, the naming scheme is critical.
The basic path format is:
$ROOT/Object.id/checksum.type
Object.id
When an object is updated, it may have any number of Image fields, which
may each have any number of scaled/rotated/morphed/derived images. When
a source object is updated, some or all of these need to be cleared.
checksum
The checksum calculated from the TransformPath does not describe any of
the data, only the data source and modifications to it. This means that
it is possible to cheaply test if the image for a particular transform
has already been created, without having to access any of the data in
the actual images.
type
Because we accept image data in a variety of formats, its not possible
to know what image type any given image should be. So when testing we
simply check the lot until we find one.
Generally, rather than test 10-15 types, the Provider will inform us of
the types to expect. :)
Operation Profile
All of this junk gives the module the following properties
- Intrinsicaly supports all major image types
- No pre-generation of images, generates everything on-the-fly
- Image names are secure and can't be predicted
- All images for any page are processed in one process hit
- Cache checking is extremely quick
- Never touches image source data when not filling the cache
- Handles many images. Storage extendable to support thousands to
millions of individual images
- Multiple hosts can work with the same Image cache
- Images can be delivered by a different web server to the application
DESCRIPTION
Image::Delivery is very powerful, but setting it up may take a little
bit of work.
Setting up the URI <-> path mapping
First, you need to become aquainted with HTML::Location. This is used as
the basis for the mapping between the disc and a URI.
You should also make sure that whatever process will be running will
have write permissions to the appropriate directory.
For starters, we would suggest creating the cache directory just under
the root of a website, at "$ROOT/cache", which will be linked to
"http://yourwebsite.com/cache/".
This will let you create your HTML::Location.
# Set up the location of the cache
my $Location = HTML::Location->new(
"$ROOT/cache",
"http://yourwebsite.com/cache"
);
This gives you the absolute minimum Image::Delivery itself needs to get
rolling. With a location to manage, you can then start to fire images at
it, and it will store them and hand you back a HTML::Location for the
actual file.
# Create the Image::Delivery object
my $Delivery = Image::Delivery->new(
Location => $Location,
);
However, the tricky bit is probably setting up your Provider class.
Although the abstract class implements much of the details and defaults
for you, you are probably still going to need to do some work to tie the
two together.
STATUS
While the concept and design are fairly well understood and unlikely to
change, there is an unfortunate situation with regards to the Cache::
family of modules.
Although originally written to live at Cache::Web and to be a little
more general, it was felt by the maintainer that Cache::Web would
represent the module as being a full member of the Cache:: family, which
it is not.
However, during the first few releases I hope to at least try to move
the API of Image::Delivery as close to Cache:: as possible, possibly
under a common Cache::Interface class, to gain some potential benefits
from code written on top of it.
Until these comments are updated, you should assume that the API may
undergo some changes.
METHODS
new %params
The "new" constructor creates a new Image::Delivery object. It takes a
number of required and optional parameters, provided as a set of
key/value pairs.
Location
The required Location parameter
Location
The "Location" method returns the HTML::Location that was used when
creating the Image::Delivery.
filename $TransformPath | $Provider
The "filename" method determines, for a given $TransformPath or
$Provider, the file name that the Image should be written to, excluding
the file type.
This is the method most likely to be overloaded, so enable a different
naming scheme.
exists $TransformPath | $Provider
For a given Digest::TransformPath, or a ::Provider which contains one,
check to see the a file exists for it in the cache already.
Returns the HTML::Location of the image if it exists, false if it does
not exist, or "undef" on error.
get $TransformPath | $Provider
The "get" methods gets the contents of a cached file from the cache, if
it exists. You should generally check that the image "exists" first
before trying to get it.
Returns a reference to a SCALAR containing the image data if the image
exists. Returns "undef" if the image does not exist, or some other error
occurs.
set $Provider
The "set" method stores an image in the cache, shortcutting if the image
has already been stored.
Returns the HTML::Location of the stored image on success, or "undef" on
error.
clear $TransformPath
The "clear" method allows you to explicitly delete an image from the
cache. This would generally be done for security purposes, as the cache
cleaners will generally harvest files directly, rather than going via
TransformPaths.
Returns true if the image was removed, or did not exist. Returns "undef"
on error.
TO DO
- Add ability to mask indexes with empty HTML files
- Add cache clearing capabilities
- Add file locking to prevent race conditions in the cache
- Add pluggable cache cleaners
SUPPORT
All bugs should be filed via the bug tracker at
<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Image-Delivery>
For other issues, contact the author
AUTHORS
Adam Kennedy <adamk@cpan.org>
COPYRIGHT
Copyright 2004 - 2007 Adam Kennedy.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included
with this module.