
CGI::Uploader - Manage CGI uploads using an SQL database

# Create an upload object
# -----------------------
my($u) = CGI::Uploader -> new # Mandatory.
(
dbh => $dbh, # Optional. Or specify in call to upload().
dsn => [...], # Optional. Or specify in call to upload().
manager => $obj, # Optional. Or specify in call to upload().
query => $q, # Optional.
temp_dir => $t, # Optional.
);
# Upload N files
# --------------
my($meta_data) = $u -> upload # Mandatory.
(
form_field_1 => # An arrayref of hashrefs. The keys are CGI form field names.
[
{ # First, mandatory, set of options for storing the uploaded file.
column_map => {...}, # Optional.
dbh => $dbh, # Optional. But one of dbh or dsn is
dsn => [...], # Optional. mandatory if no manager.
file_scheme => $s, # Optional.
manager => $obj, # Optional. If present, all others params are optional.
sequence_name => $s, # Optional, but mandatory if Postgres and no manager.
table_name => $s, # Optional if manager, but mandatory if no manager.
transform => sub() # Optional.
},
{ # Second, etc, optional sets of options for storing copies of the file.
},
],
form_field_2 => [...], # Another arrayref of hashrefs.
);
# Delete N files for each uploaded file
# -------------------------------------
my($report) = $u -> delete # Optional.
(
column_map => {...}, # Mandatory.
dbh => $dbh, # Optional. But one of dbh or dsn is
dsn => [...], # Optional. mandatory.
id => $id, # Mandatory.
table_name => $s, # Mandatory.
);
# Generate N files from each uploaded file
# ----------------------------------------
$u -> generate # Optional.
(
form_field_1 => [...], # Mandatory. An arrayref of hashrefs.
form_field_2 => [...], # Mandatory. Another arrayref of hashrefs.
);
The simplest option, then, is to use
CGI::Uploader -> new() -> upload(file_name => [{dbh => $dbh, table_name => 'uploads'}]);
and let CGI::Uploader do all the work.
For Postgres, make that
CGI::Uploader -> new() -> upload(file_name => [{dbh => $dbh, sequence_name => 'uploads_id_seq', table_name => 'uploads'}]);

CGI::Uploader is a pure Perl module.

The API for CGI::Uploader version 3 is not compatible with the API for version 2.
This is because V 3 is a complete rewrite of the code, taking in to account all the things learned from V 2.

new() returns a CGI::Uploader object.
This is the class's contructor.
You must pass a hash to new().
Options:
This key may be specified globally or in the call to upload().
See Details for an explanation, including how this key interacts with dsn.
This key (dbh) is optional.
This key may be specified globally or in the call to upload().
See Details for an explanation, including how this key interacts with dbh.
This key (dsn) is optional.
This key may be specified globally or in the call to upload().
This object is used to handle the transfer of meta-data into the database. See Meta-data.
This key (manager) is optional.
Use this to pass in a query object.
This object is expected to belong to one of these classes:
If not provided, an object of type CGI will be created and used to do the uploading.
If you want to use a different type of object, just ensure it has these CGI-compatible methods:
This is only called if something goes wrong.
Warning: CGI::Simple cannot be supported. See this ticket, which is not resolved:
http://rt.cpan.org/Ticket/Display.html?id=14838
There is a comment in the source code of CGI::Simple about this issue. Search for 14838.
This key (query) is optional.
Note the spelling of temp_dir.
If not provided, an object of type File::Spec will be created and its tmpdir() method called.
This key (temp_dir) is optional.

Transform is an optional component in the call to upload().
Generate() is a separate method.
This section discusses these 2 processes.
This means transformation takes exactly 1 input file.
This means transformation outputs exactly 1 file.
Transformation is a 2-stage process.
The comments here apply to both (a) using the transform key in the call to upload(), and (b) to the subroutine names in the parameters passed in to generate().
Assume you use the transform key like this: transform => sub_name(params). Then:
This subroutine must return an anonymous subroutine (i.e. a subref) which is a closure.
That is, as upload() processes your options, the fact that the transform key is present as one of your options causes the subref to be called.
Here are the 2 examples I used in testing:
transform => CGI::Uploader::Transform::ImageMagick::transformer(height => 400, width => 500)
transform => CGI::Uploader::Transform::Imager::transformer(ypixels => 400, xpixels => 500)
As you can see, CGI::Uploader ships with 2 sample transformers, one using Image::Magick and one using Imager. Both of these do no more than resize the image.
However, you can use this code as a guide to write any sort of transformer.
It's vital that your code complies with the design of these transformers, with respect to the mandatory input parameters and the list of return values.
Input parameters:
This is the name of the (temporary) file which has been uploaded, and is to be transformed.
This is the file's extension, determined by MIME::Types.
Some transformers, e.g. Imager, need this parameter.
Return values:
This is the name of the (temporary) file output by your transformation process.
Returning the extension means your transformer can, say, convert a GIF file to a PNG.
This means generation takes exactly 1 input file.
So this input file was, presumably, uploaded at some time in the past, and may have been transformed at that time.
That is, you specify a set of options which control the generation of 1 new file.
This means generation outputs N >= 1 new files.
A typical use of generation would be to produce thumbnails of large images.

Note: Methods are listed here in alphabetical order. So delete() comes before upload(). Nevertheless, the most detailed explanations of options are under upload(), with only brief notes here under delete().
You must pass a hash to delete().
delete(%hash) deletes everything associated with a given database table id.
The keys of this hash are reserved words, and the values are your options.
See Details for a discussion of column_map.
Note: If your column map does not contain the server_file_name key, delete(%hash) will do nothing because it won't be able to find any file names to delete.
The key (column_map) is optional.
This key may be specified globally or in the call to delete().
See Details for an explanation, including how this key interacts with dsn.
This key (dbh) is optional.
This key may be specified globally or in the call to delete().
See Details for an explanation, including how this key interacts with dbh.
This key (dsn) is optional.
This is the (primary) key of the database table which will be processed.
To specify a column name other than id, use the column_map option.
This key (id) is mandatory.
This is the name of the database table.
This key (table_name) is mandatory.
There is no manager key because there is no point in you passing all these options to delete(%hash) just so this method can pass them all back to your manager.
The items deleted are:
They can be identified because their parent_id column matches $id, and their file names come from the server_file_name column.
It can be identified becase its id column matches $id, and its file name comes from the server_file_name column.
delete(%hash) returns an array ref of hashrefs.
Each hashref has 2 keys and 2 values:
$id is the value of the (primary) key column of a deleted file.
One of these $id values will be the $id you passed in to delete(%hash).
$string is the name of a deleted file.

You must pass a hash to generate().
The keys to this hash are:
The default column_map is documented under Details.
This key (column_map) is optional.
Dbh is documented under Details.
At least one of dbh and dsn must be provided.
Dbh is documented under Details.
At least one of dbh and dsn must be provided.
file_scheme is documented under Details.
File_scheme defaults to string.
This key (file_scheme) is optional.
Manager is documented under Details.
This key (manager) is optional.
Path is documented under Details.
This key (path) is mandatory.
Records specifies which (primary) keys in the table are used to find files to process.
These files are input files, and the values pointed to by those keys specify how to use these files to generate output files.
The keys in the hashref are the keys in the database table. E.g.:
records => {1 => [...], 99 => [...]}
specifies that only records with ids of 1 and 99 are to be processed.
The name of the (primary) key column defaults to id, but you can use column_map to change that.
The name of the input file comes from the server_file_name column of the table. Use column_map to change that column name.
The arrayrefs are used to specify N >= 1 output files for each input file.
So, each arrayref contains N >= 1 subroutine names, and each subroutine specifies how to generate 1 output file. E.g.:
records => {1 => [sub_1(...), sub_2(...)], 99 => [sub_n(...)]}
This says use id 1 to generate 2 output files, and use id 99 to generate 1 output file.
To make life easier, if you only wish to generate a single output file, you can reduce this:
records => {99 => [sub_1(...)]}
to this:
records => {99 => sub_1(...)}
The format of the subroutine names is exactly the same as for the <transform> key. See Transformation Subroutines.
CGI::Uploader takes care of the meta-data for each generated file. See Meta-data.
Sample code: See CGI::Uploader::Test, which uses CGI::Uploader::Transform::ImageMagick and CGI::Uploader::Transform::Imager. CGI::Uploader ships with these 3 modules.
This key (records) is mandatory.
Sequence_name is documented under Details.
This key is mandatory if you are using Postgres, and optional if not.
This key (table_name) is mandatory.
Note: generate() returns an hashref of arrayrefs, where the keys of the hashref are the ids provided in the records hashref, and the arrayrefs list the ids of the files generated.
You can use this data, e.g., to read the meta-data from the database and populate form fields to inform the user of the results of the generation process.

You must pass a hash to upload().
The keys of this hash are CGI form field names (where the fields are of type file).
CGI::Uploader cycles thru these keys, using each one in turn to drive a single upload.
Note: upload() returns an arrayref of hashrefs, one hashref for each uploaded file stored.
The hashrefs returned are not the meta-data associated with each uploaded file, but more like status reports.
These status reports are explained here, and the meta-data is explained in the next section.
The structure of these status hashrefs is 2 keys and 2 values:
You can use this data, e.g., to read the meta-data from the database and populate form fields to inform the user of the results of the upload.

Meta-data associated with each uploaded file is accumulated while upload() works.
Meta-data is a hashref, with these keys:
The client_file_name is the name supplied by the web client to CGI::Uploader. It may or may not have path information prepended, depending on the web client.
This value is the string 'now()', until the meta-data is saved in the database.
At that time, the value of the function now() is stored, except for SQLite, which just stores the string 'now()'.
Date_stamp has an underscore in it in case your database regards datastamp as a reserved word.
This is provided by the MIME::Types module.
The extension is a string without the leading dot.
If an extension cannot be determined, the value will be '', the empty string.
This is provided by the Image::Size module, if it recognizes the type of the file.
For non-image files, the value will be 0.
The id is (presumably) the primary key of your table.
This value is 0 until the meta-data is saved in the database.
In the case of Postgres, it will be populated by the sequence named with the sequence_name key.
This is provided by the MIME::Types module, if it can determine the type.
If not, it is '', the empty string.
This is populated when a file is generated from the uploaded file. It's value will be the id of the upload file's record.
For the uploaded file itself, the value will be 0.
The server_file_name is the name under which the file is finally stored on the file system of the web server.
It is not one of the temporary file names used during the upload or transformation processes.
This is the size in bytes of the uploaded or transformed file.
This is detrmined by the Image::Size module, if it recognizes the type of the file.
For non-image files, the value will be 0.

A mini-synopsis:
$u -> upload
(
file_name_1 =>
[
{First set of storage options for this file},
{Second set of storage options for the same file},
{...},
],
);
upload() calls do_upload() to do the work of uploading the caller's file to a temporary file.
This is done once, whereas the following steps are done once for each hashref of storage options you specify in the arrayref pointed to by the 'current' CGI form field's name.
do_upload() returns a hashref of meta-data associated with the file.
If requested, call the sub pointed to by the transform option.
upload() calls the do_insert() method on the manager object to insert the meta-data into the database.
The default manager is CGI::Uploader itself.
do_insert() saves the last insert id from that insert in the meta-data hashref.
upload() calls copy_temp_file() to save the file permanently.
copy_temp_file() saves the permanent file name in the meta-data hashref.
upload() calls the get_size() method to get the image size, which delegates the work to Image::Size.
get_size() saves the image's dimensions in the meta-data hashref.
upload() calls the do_update() method on the manager object to put the permanent file's name into the database record, along with the height and width.
Each key in the hash passed in to upload() points to an arrayref of options which specifies how to process the form field.
Use multiple elements in the arrayref to store multiple sets of meta-data, all based on the same uploaded file.
Each hashref contains 1 or more of the following keys:
This hashref maps column_names used by CGI::Uploader to column names used by your database table.
The default column_map is:
{
client_file_name => 'client_file_name',
date_stamp => 'date_stamp',
extension => 'extension',
height => 'height',
id => 'id',
mime_type => 'mime_type',
parent_id => 'parent_id',
server_file_name => 'server_file_name',
size => 'size',
width => 'width',
}
If you supply a different column map, the values on the right-hand side are the ones you change.
Points to note:
If you omit any keys from your map, the corresponding meta-data will not be saved.
This key (column_map) is optional.
This is a database handle for use by the default manager class (which is just CGI::Uploader) discussed below, under manager.
This key is optional if you use the manager key, since in that case you do anything in your own storage manager code.
If you do provide the dbh key, it is passed in to your manager just in case you need it.
Also, if you provide dbh, the dsn key, below, is ignored.
If you do not provide the dbh key, the default manager uses the dsn arrayref to create a dbh via DBI.
This key is optional if you use the manager key, since in that case you do anything in your own storage manager code.
If you do provide the dsn key, it is passed in to your manager just in case you need it.
Using the default manager, this key is ignored if you provide a dbh key, but it is mandatory when you do not provide a dbh key.
The elements in the arrayref are:
E.g.: 'dbi:Pg:dbname=test'
This element is mandatory.
This element is mandatory, even if it's just the empty string.
This element is mandatory, even if it's just the empty string.
This element is optional.
The default manager class calls DBI -> connect(@$dsn) to connect to the database, i.e. in order to generate a dbh, when you don't provide a dbh key.
File_scheme controls how files are stored on the web server's file system.
All files are stored in the directory specified by the path option.
Each file name has the appropriate extension appended (as determined by MIME::Types.
The possible values of file_scheme are:
The file name is determined like this:
Use the (primary key) id (returned by storing the meta-data in the database) to seed the Digest::MD5 module.
Use the first 3 digits of the hex digest of the id to generate 3 levels of sub-directories.
The file name is the (primary key) id.
The file name is the (primary key) id.
Simple is the default.
This key (file_scheme) is optional.
This is an instance of your class which will manage the transfer of meta-data to a database table.
In the case you provide the manager key, your object is responsible for saving (or discarding!) the meta-data.
If you provide an object here, CGI::Uploader will call $object => do_insert($field_name, $meta_data, $store_option).
Parameters are:
$field_name will be the 'current' CGI form field.
Remember, upload() is iterating over all your CGI form field parameters at this point.
$meta_data will be a hashref of options generated by the uploading process
See Meta-data, for the definition of meta-data.
$store_option will be the 'current' hashref of storage options, one of the arrayref elements associated with the 'current' form field.
If you do not provide the manager key, CGI::Uploader will do the work itself.
Later, CGI::Uploader will call $object => do_update($field_name, $meta_data, $store_option), as explained above, under Processing Steps.
This key (manager) is optional.
This is a path on the web server's file system where a permanent copy of the uploaded file will be saved.
This key (path) is mandatory.
This is the name of the sequence used to generate values for the primary key of the table.
You would normally only need this when using Postgres.
This key is optional if you use the manager key, since in that case you can do anything in your own storage manager code. If you do provide the sequence_name key, it is passed in to your manager just in case you need it.
This key is mandatory if you use Postgres and do not use the manager key, since without the manager key, sequence_name must be passed in to the default manager (CGI::Uploader).
This is the name of the table into which to store the meta-data.
This key is optional if you use the manager key, since in that case you can do anything in your own storage manager code. If you do provide the table_name key, it is passed in to your manager just in case you need it.
This key is mandatory if you do not use the manager key, since without the manager key, table_name must be passed in to the default manager (CGI::Uploader).
This key points to a subroutine (not method) which is used to help transform the uploaded file.
As stated above, transformation takes 1 file being uploaded, transforms it, saves the transformed file, and discards the uploaded file.
See Transformation Subroutines for details.
This key (transform) is optional.

Most of the features in CGI::Uploader are demonstrated in samples shipped with the distro:
Patch lib/CGI/Uploader/.ht.cgi.uploader.conf as desired.
This is used by CGI::Uploader::Config and hence by CGI::Uploader::Test.
Copy the directory htdocs/uploads/ to the doc root of your web server.
Copy the files in cgi-bin/ to your cgi-bin directory.
As explained above, don't expect use.cgi.simple.pl to work.
Also, use.cgi.uploader.v2.pl will not run if you have installed V 3 over the top of V 2.
Point your web client at:
You can enter 1 or 2 file names in each CGI form.
The code executed is actually in CGI::Uploader::Test.
See the method use_cgi_uploader_v3() in that module for one way of utilizing the data returned by upload().
The scripts/ directory contains various sample programs.
In particular, see scripts/test.generate.pl.
Note: to run this program you will have already uploaded one or more files, and Apache will have created a directory structure according to your path option, and will own that path.
So, you may need to use sudo to run scripts/test.generate.pl, since it will write temporary files to the same path.

Both Build.PL and Makefile.PL list the modules used by CGI::Uploader.
Further to those, user options can trigger the use of these modules:
If you use CGI::Uploader::Test, it uses CGI::Uploader::Config, which uses Config::IniFiles.
I used Postgres when writing and testing V 3, and hence I used DBD::Pg.
Examine lib/CGI/Uploader/.ht.cgi.uploader.conf for details. This file is read in by CGI::Uploader::Config.
A quick test with SQLite worked, too.
The test only requires changing .ht.cgi.uploader.conf and re-running scripts/create.table.pl. E.g.:
dsn=dbi:SQLite:dbname=/tmp/test
password=
table_name=uploads
username=
Also, after running scripts/create.table.pl, use 'chmod a+w /tmp/test' so that the Apache daemon can write to the database.
One last thing. SQLite does not interpret the function now(); it just puts that string in the date_stamp column. Oh, well.
If you do not specify a manager object, CGI::Uploader uses DBI.
If you use CGI::Uploader::Test to create the table, via scripts/create.table.pl, you'll need DBIx::Admin::CreateTable.
If you set the file_scheme option to md5, you'll need Digest::MD5.
If you want to run any of the test scripts in cgi-bin/, you'll need HTML::Template.
The test module CGI::Uploader::Test uses Image::Magick.
The test module CGI::Uploader::Test uses Imager.

This feature is not provided, for various reasons.
One problem is sabotage.
Another problem is users specifying characters which are illegal in file names on the server.
In other words, this feature was considered and rejected.
API changes between V 2 and V 3 are obviously enormous. A direct comparison doesn't make much sense.
However, here are some things to watch out for:
Under V 2, a file called 'x' would be saved by force with a name of 'x.bin'.
V 3 does not change file names, so 'x' will be stored in the database as 'x'.
Under V 2, a file called 'x.png' would have '.png' stored in the extension column of the database.
V 3 only stores 'png'.
Under V 2, various mechanisms were used to retrieve this value.
V 3 calls $dbh -> last_insert_id(), unless of course you've circumvented this by supplying your own manager object.
Under V 2, the permanent file name was not stored as part of the meta-data.
V 3 stores this information.
Under V 2, the datestamp of when the file was uploaded was not saved.
V 3 stores this information.
Errr, it's been renamed to delete() and upload().

See Changes and Changelog.ini. The latter is machine-readable, using Module::Metadata::Changes.

V 3 is available from github: git:github.com/ronsavage/cgi--uploader.git

V 2 was written by Mark Stosberg <mark@summersault.com>.
V 3 was written by Ron Savage <ron@savage.net.au>.
Ron's home page: http://savage.net.au/index.html

Artistic.