=head1 NAME
Class::DBI::utf8 - A Class:::DBI subclass that knows about UTF8
=head1 SYNOPSIS
Exactly as Class::DBI:
package Foo;
use base qw( Class::DBI );
use Class::DBI::utf8;
...
__PACKAGE__->utf8_columns(qw( text ));
...
# create an object with a nasty character.
my $foo = Foo->create({
text => "a \x{2264} b for some a",
});
=head1 DESCRIPTION
Rather than have to think about things like character sets, I prefer to
have my objects just Do The Right Thing. I also want utf-8 encoded byte
strings in the database whenever possible. Using this subclass of Class::DBI,
I can just put perl strings into the properties of an object, and the
right thing will always go into the database and come out again.
This module requires perl 5.8.0 or later - if you're still using 5.6, and
you want to use unicode, I suggest you don't. It's not nice.
This module _assumes_ that the underlying DBD driver doesn't know anything
about character sets, and is just storing bytess. If you're using something
like DBD::Pg, which _does_ know about charsets, you don't need this module.
Having said, that, I've tried to write it in such a way that if the DBD
driver _does_ know about charsets, we won't break. (see L<BUGS>)
Internally, the module is futzing with the _utf8_on and _utf8_off methods.
If you don't know _why_ doing that is probably a bad idea, you should read
into it before you start trying to do this sort of thing yourself. I'd
_prefer_ to use encode_utf8 and decode_utf8, but I have my reasons for
doing it this way - mostly, it's so that we can allow for DBD drivers that
know about charsets.
=head1 BUGS
I've attempted to make the module _keep_ doing the Right Thing even when
the DBD driver for the database _does_ know what it's doing, ie, if you
give it sensible perl strings it'll store the right thing in the database
and recover the right thing from the database. However, I've been forced
to assume that, in this eventuality, the database driver will hand back
strings that already have the utf-8 bit set. If they don't, things _will_
break. On the bright side, they'll break really fast.
=head1 AUTHOR
Tom Insam <tinsam@fotango.com>
=cut
package Class::DBI::utf8;
use warnings;
use strict;
use base qw( Exporter );
our @EXPORT = (qw( _do_search ));
our $VERSION = 0.1;
use Encode qw( encode_utf8 decode_utf8 );
use utf8;
sub import {
# export functions as normal
__PACKAGE__->export_to_level(1, @_);
my $caller = caller;
unless (UNIVERSAL::isa($caller, "Class::DBI")) {
warn "caller $caller is not a Class::DBI subclass, using "
.__PACKAGE__." here is probably not what you want.\n";
return;
}
$caller->add_trigger($_ => sub {
my ($self) = @_;
for ($self->columns('All')) {
next if ref($self->{$_});
utf8::upgrade( $self->{$_} ) if defined($self->{$_});
}
}) for qw( before_create before_update );
$caller->add_trigger(select => sub {
my ($self) = @_;
for ($self->columns('All')) {
next if ref($self->{$_});
# flip the bit..
Encode::_utf8_on($self->{$_}) if defined($self->{$_});
# ..sanity check
if (defined($self->{$_}) and !utf8::valid($self->{$_})) {
# if we're in an eval, let's at least not _completely_ stuff
# the process. Turn the bit off again.
Encode::_utf8_off($self->{$_});
# ..and die
$self->_croak("Invalid UTF8 from database in column '$_'");
}
}
});
}
# ick, we have to override this so we can utf8-encode the search
# terms. Bug in Class::DBI?
sub _do_search {
my ($proto, $search_type, @args) = @_;
@args = %{ $args[0] } if ref $args[0] eq "HASH";
for (my $i = 1; $i < @args; $i += 2) { # odd elements only
utf8::upgrade($args[$i]) unless ref($args[$i]);
}
# SUPER is handled at compile-time. Bugger.
$proto->Class::DBI::_do_search($search_type, @args);
}
1;