NAME

Apache::Request::I18N - Internationalization extension to Apache::Request

SYNOPSIS

  use Apache::Request::I18N;
  my $apr = Apache::Request::I18N->new($r, DECODE_PARMS => 'utf-8');

Or, add something like this to your Apache httpd.conf:

  PerlModule Apache::Request::I18N;

  <Location ...>
  SetHandler  perl-script
  PerlHandler Apache::Request::I18N <your other handlers ...>
  PerlSetVar  DecodeParms  utf-8
  </Location>

DESCRIPTION

Apache::Request::I18N adds transparent support over Apache::Request for internationalized GET/POST parameters. Form field names and values are automatically decoded and converted either to Perl's internal UTF-8 format, or to another character encoding.

Since this module inherits from Apache::Request, it can be used as a drop-in replacement. (It is not a perfect replacement, though; see "COMPATIBILITY ISSUES" below.) It can also be used in a PerlHandler directive, in which case all subsequent handlers will -- if they play nicely -- automatically see the converted names and values.

CONSTRUCTORS

new( REQ [, OPTIONS ] )

Creates and returns a new Apache::Request::I18N object. REQ is the Apache or Apache::Request associated with the current request.

OPTIONS is an optional list of name/value pairs. Each option also has a corresponding mod_perl variable (listed in parentheses) that can be set via PerlSetVar in httpd.conf. Values in OPTIONS take precedence. The available options are:

DECODE_PARMS (DecodeParms)

Required. Declares the character encoding that will be used by default when decoding form field names and values. This character encoding must be supported by the Encode module (see Encode::Supported for more details).

ENCODE_PARMS (EncodeParms)

Declares the character encoding that will be used to re-encode form field names and values. If omitted, names and values will be in Perl's own internal UTF-8 format.

Apache::Request options can also be included (although they will be ignored if REQ is already an Apache::Request object).

instance( REQ [, OPTIONS ] )

Equivalent to the instance() method in Apache::Request, except that this method will return a Apache::Request::I18N object. Subsequent calls to Apache::Request->instance() will also return the same object. It is allowed to call Apache::Request->instance() beforehand.

METHODS

Almost all Apache::Request methods are supported (see "COMPATIBILITY ISSUES" below for a list of exceptions), and will properly return values according to ENCODE_PARMS. (Apache methods, like args(), are not affected by this module.)

All arguments passed to a method must be encoded to ENCODE_PARMS beforehand, unless ENCODE_PARMS is empty. This also applies to each key/value of any Apache::Table passed to parms().

Additional methods

decode_parms()
encode_parms()

Returns the current DECODE_PARMS or ENCODE_PARMS value.

FILE UPLOADS

Uploads returned by the upload() method are Apache::Upload::I18N objects; they behave like Apache::Upload objects, and their name() and filename() methods will return values according to ENCODE_PARMS.

(This is however not the case within the upload hook; see "BUGS" below.)

HANDLER

This module provides a simple Apache handler that can be used in a PerlHandler directive. This is useful when used in combination with other handlers, which will then automatically access the decoded values. (This works as long as each handler takes care to call instance() instead of creating a new object.)

For example, you can use this module in combination with Mason:

  SetHandler  perl-script
  PerlHandler +Apache::Request::I18N +HTML::Mason::ApacheHandler
  PerlSetVar  DecodeParms  EUC-JP

Each Mason component will now see its arguments as true Perl character strings instead of EUC-JP bytes strings.

COMPATIBILITY ISSUES

  • Calling parms() is not supported if ENCODE_PARMS is empty, as Apache::Table cannot handle character strings. This also applies to calling param() in scalar context.

  • Query parameter keys may or may not be case-insensitive, depending on their contents and on ENCODE_PARMS.

  • Calling next() on an upload object is not currently supported.

BUGS

  • When using the multipart/form-data encoding, the proper encoding of form field names and filenames as specified by RFC 2184 is currently not supported. (This is due to a limitation in libapreq.)

    Conversely, since some user-agents are known to encode such values via RFC 2047, we attempt decoding if possible. This means that a value supplied by a standard-compliant user-agent may be wrongly decoded.

  • When using the multipart/form-data encoding, each form field value may have its character encoding specified via the charset parameter of its Content-Type header. This value is currently ignored. (This is due to a limitation in libapreq.)

    Similarly, the Content-Transfer-Encoding header is also ignored.

  • When using upload hooks, the upload object supplied to UPLOAD_HOOK will not have had its name() and filename() decoded yet.

  • When using the multipart/form-data encoding, this module will get confused if a form field appears in both the query string and the request body. In other words, don't try to do this:

      <FORM METHOD=post ENCTYPE="multipart/form-data"
            ACTION=".../my_script?foo=1">
      <INPUT NAME="foo" ...>
      ...

    You should also avoid mixing file uploads and regular input within a single field name. In other words, don't try this either:

      <INPUT TYPE=text NAME="foo">
      <INPUT TYPE=file NAME="foo">
  • Since all query parameter keys are stored in encoded form within an Apache::Table (which is case-insensitive), it is possible for two distinct keys to be fused together if their encoded representations are similar.

TODO

  • Allow changing DECODE_PARMS and ENCODE_PARMS after the object has been created.

  • Automatically decode the contents of a text/* file upload if a charset has been provided.

  • Allow for more than one DECODE_PARMS, and try to guess which one is appropriate.

  • Use the User-Agent header to figure out how far from the standards we must stray.

  • Write a short text about the various standards and issues.

SEE ALSO

 <http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html>

 RFC 1522 - MIME (Multipurpose Internet Mail Extensions) Part Two: Message Header Extensions for Non-ASCII Text
 RFC 1806 - Communicating Presentation Information in Internet Messages: The Content-Disposition Header [2.3]
 RFC 1866 - Hypertext Markup Language - 2.0 [8.2.1]
 RFC 1867 - Form-based File Upload in HTML [3.3, 5.11]
 RFC 2047 - MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text [5]
 RFC 2070 - Internationalization of the Hypertext Markup Language [5.2]
 RFC 2183 - Communicating Presentation Information in Internet Messages: The Content-Disposition Header Field [2, 2.3]
 RFC 2231 - MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations
 RFC 2388 - Returning Values from Forms: multipart/form-data

AUTHOR

Frédéric Brière, <fbriere@fbriere.net>

COPYRIGHT AND LICENSE

Copyright (C) 2005, 2006 by Frédéric Brière

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 577:

You forgot a '=back' before '=head1'

Around line 593:

Non-ASCII character seen before =encoding in 'Frédéric'. Assuming CP1252