HTTP-Response-Encoding

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

# Revision history for HTTP-Response-Encoding
#
# $Id: Changes,v 0.6 2009/07/28 21:25:25 dankogai Exp dankogai $
# 
$Revision: 0.6 $ $Date: 2009/07/28 21:25:25 $
! lib/HTTP/Response/Encoding.pm t/01-file.t
  Addressed RT#47033:
    new libwww-perl-5.827 release from 15.06.2009 breaks all tests
  (Tested both on lwp5.826 and lwp5.830)
  http://rt.cpan.org/Ticket/Display.html?47033

0.05 2007/05/12 09:24:15
! lib/HTTP/Response/Encoding.pm
  removed method
  - decoded_content() because HTTP::Message already has that
  added methods
  + charset() -- returns the chraset as-is
  + encoder() -- encoding object that can be used to decode

0.4 2007/04/20 05:40:37
! lib/HTTP/Response/Encoding.pm
  When you require Carp, you should surround arguments with ().
  Message-Id: <200704200454.l3K4sMHL008173@franz.ak.mind.de>

0.03 2007/04/18 04:50:40
! MANIFEST
+ t/t-null.html
  forgot to add. sorry.

0.02 2007/04/17 13:56:12
! lib/HTTP/Response/Encoding.pm
  t/01-file.pm
  + more descriptive error message for decoded_content().
  + test case for failure added

0.01 2007/04/17 13:14:24
+ *
  First version.

MANIFEST  view on Meta::CPAN

Changes
MANIFEST
META.yml # Will be created by "make dist"
Makefile.PL
README
lib/HTTP/Response/Encoding.pm
t/00-load.t
t/01-file.t
t/boilerplate.t
t/pod-coverage.t
t/pod.t
t/t-euc-jp.html
t/t-iso-2022-jp.html
t/t-null.html
t/t-shiftjis.html
t/t-utf-8.html

META.yml  view on Meta::CPAN

--- #YAML:1.0
name:               HTTP-Response-Encoding
version:            0.06
abstract:           Adds encoding() to HTTP::Response
author:
    - Dan Kogai <dankogai@dan.co.jp>
license:            unknown
distribution_type:  module
configure_requires:
    ExtUtils::MakeMaker:  0
build_requires:
    ExtUtils::MakeMaker:  0
requires:
    Encode:          2
    HTTP::Response:  0
    Test::More:      0
no_index:
    directory:
        - t
        - inc
generated_by:       ExtUtils::MakeMaker version 6.54
meta-spec:
    url:      http://module-build.sourceforge.net/META-spec-v1.4.html
    version:  1.4

Makefile.PL  view on Meta::CPAN

use 5.008001;
use strict;
use warnings;
use ExtUtils::MakeMaker;

WriteMakefile(
    NAME                => 'HTTP::Response::Encoding',
    AUTHOR              => 'Dan Kogai <dankogai@dan.co.jp>',
    VERSION_FROM        => 'lib/HTTP/Response/Encoding.pm',
    ABSTRACT_FROM       => 'lib/HTTP/Response/Encoding.pm',
    PL_FILES            => {},
    PREREQ_PM => {
        'Encode'         => 2.00,
        'Test::More'     => 0,
	# 'HTTP::Message'  => 5.827,
        'HTTP::Response' => 0,
      },
    dist                => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', },
    clean               => { FILES => 'HTTP-Response-Encoding-*' },
);

README  view on Meta::CPAN

NAME
    HTTP::Response::Encoding - Adds encoding() to HTTP::Response

VERSION
    $Id: README,v 0.2 2007/05/12 09:24:15 dankogai Exp $

SYNOPSIS
      use LWP::UserAgent;
      use HTTP::Response::Encoding;

      my $ua = LWP::UserAgent->new();
      my $res = $ua->get("http://www.example.com/");
      warn $res->encoding;

EXPORT
    Nothing.

METHODS
    This module adds the following methods to HTTP::Response objects.

    "$res->charset"
      Tells the charset *exactly as appears* in the "Content-Type:" header.
      Note that the presence of the charset does not guarantee if the
      response content is decodable via Encode.

      To normalize this, you should try

        $res->encoder->mime_name; # with Encode 2.21 or above

      or

        use I18N::Charset;
        # ...
        mime_charset_name($res->encoding);

    "$res->encoder"
      Returns the corresponding encoder object or undef if it can't.

    "$res->encoding"
      Tells the content encoding in the canonical name in Encode. Returns
      undef if it can't.

      For most cases, you are more likely to successfully find encoding
      after GET than HEAD. HTTP::Response is smart enough to parse

        <meta http-equiv="Content-Type" content="text/html; charset=whatever"/>

      But you need the content to let HTTP::Response parse it. If you don't
      want to retrieve the whole content but interested in its encoding, try
      something like below;

        my $req =  HTTP::Request->new(GET => $uri);
        $req->headers->header(Range => "bytes=0-4095"); # just 1st 4k
        my $res = $ua->request($req);
        warn $res->encoding;

    "$res->decoded_content"
      Discontinued since HTTP::Message already has this method.

      See HTTP::Message for details.

INSTALLATION
    To install this module, run the following commands:

        perl Makefile.PL
        make
        make test
        make install

AUTHOR
    Dan Kogai, "<dankogai at dan.co.jp>"

BUGS
    Please report any bugs or feature requests to
    "bug-http-response-encoding at rt.cpan.org", or through the web
    interface at
    <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HTTP-Response-Encoding>.
    I will be notified, and then you'll automatically be notified of
    progress on your bug as I make changes.

SUPPORT
    You can find documentation for this module with the perldoc command.

        perldoc HTTP::Response::Encoding

    You can also look for information at:

    *   AnnoCPAN: Annotated CPAN documentation

        <http://annocpan.org/dist/HTTP-Response-Encoding>

    *   CPAN Ratings

        <http://cpanratings.perl.org/d/HTTP-Response-Encoding>

    *   RT: CPAN's request tracker

        <http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTTP-Response-Encoding>

    *   Search CPAN

        <http://search.cpan.org/dist/HTTP-Response-Encoding>

ACKNOWLEDGEMENTS
    GAAS for LWP.

    MIYAGAWA for suggestions.

COPYRIGHT & LICENSE
    Copyright 2007 Dan Kogai, all rights reserved.

    This program is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

lib/HTTP/Response/Encoding.pm  view on Meta::CPAN

package HTTP::Response::Encoding;
use warnings;
use strict;
our $VERSION = sprintf "%d.%02d", q$Revision: 0.6 $ =~ /(\d+)/g;

sub HTTP::Response::charset {
    my $self = shift;
    return $self->{__charset} if exists $self->{__charset};
    if ($self->can('content_charset')){
	# To suppress:
	# Parsing of undecoded UTF-8 will give garbage when decoding entities
	local $SIG{__WARN__} = sub {};
	my $charset = $self->content_charset;
	$self->{__charset} = $charset;
	return $charset;
    }

    my $content_type = $self->headers->header('Content-Type');
    return unless $content_type;
    $content_type =~ /charset=([A-Za-z0-9_\-]+)/io;
    $self->{__charset} = $1 || undef;
}

sub HTTP::Response::encoder {
    require Encode;
    my $self = shift;
    return $self->{__encoder} if exists $self->{__encoder};
    my $charset = $self->charset or return;
    my $enc = Encode::find_encoding($charset);
    $self->{__encoder} = $enc;
}

sub HTTP::Response::encoding {
    my $enc = shift->encoder or return;
    $enc->name;
}

=head1 NAME

HTTP::Response::Encoding - Adds encoding() to HTTP::Response

=head1 VERSION

$Id: Encoding.pm,v 0.6 2009/07/28 21:25:25 dankogai Exp dankogai $

=cut

=head1 SYNOPSIS

  use LWP::UserAgent;
  use HTTP::Response::Encoding;

  my $ua = LWP::UserAgent->new();
  my $res = $ua->get("http://www.example.com/");
  warn $res->encoding;

=head1 EXPORT

Nothing.

=head1 METHODS

This module adds the following methods to  L<HTTP::Response> objects.

=over 2

=item C<< $res->charset >>

Tells the charset I<exactly as appears> in the C<Content-Type:> header.
Note that the presence of the charset does not guarantee if the
response content is decodable via Encode.

To normalize this, you should try

  $res->encoder->mime_name; # with Encode 2.21 or above

or

  use I18N::Charset;
  # ...
  mime_charset_name($res->encoding);

=item C<< $res->encoder >>

Returns the corresponding encoder object or undef if it can't.

=item C<< $res->encoding >>

Tells the content encoding in the canonical name in L<Encode>.
Returns undef if it can't.

For most cases, you are more likely to successfully find encoding
after GET than HEAD.  HTTP::Response is smart enough to parse 

  <meta http-equiv="Content-Type" content="text/html; charset=whatever"/>

But you need the content to let HTTP::Response parse it.
If you don't want to retrieve the whole content but interested in its
encoding, try something like below;

  my $req =  HTTP::Request->new(GET => $uri);
  $req->headers->header(Range => "bytes=0-4095"); # just 1st 4k
  my $res = $ua->request($req);
  warn $res->encoding;

=item C<< $res->decoded_content >>

Discontinued since HTTP::Message already has this method.

See L<HTTP::Message> for details.

=back

=head1 INSTALLATION

To install this module, run the following commands:

    perl Makefile.PL
    make
    make test
    make install

=head1 AUTHOR

Dan Kogai, C<< <dankogai at dan.co.jp> >>

=head1 BUGS

Please report any bugs or feature requests to
C<bug-http-response-encoding at rt.cpan.org>, or through the web interface at
L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HTTP-Response-Encoding>.
I will be notified, and then you'll automatically be notified of progress on
your bug as I make changes.

=head1 SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc HTTP::Response::Encoding

You can also look for information at:

=over 4

=item * AnnoCPAN: Annotated CPAN documentation

L<http://annocpan.org/dist/HTTP-Response-Encoding>

=item * CPAN Ratings

L<http://cpanratings.perl.org/d/HTTP-Response-Encoding>

=item * RT: CPAN's request tracker

L<http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTTP-Response-Encoding>

=item * Search CPAN

L<http://search.cpan.org/dist/HTTP-Response-Encoding>

=back

=head1 ACKNOWLEDGEMENTS

GAAS for L<LWP>.

MIYAGAWA for suggestions.

=head1 COPYRIGHT & LICENSE

Copyright 2007 Dan Kogai, all rights reserved.

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

=cut

1; # End of HTTP::Response::Encoding

t/00-load.t  view on Meta::CPAN

#!perl -T

use Test::More tests => 1;

BEGIN {
	use_ok( 'HTTP::Response::Encoding' );
}

diag( "Testing HTTP::Response::Encoding $HTTP::Response::Encoding::VERSION, Perl $], $^X" );

t/01-file.t  view on Meta::CPAN

#!perl -T

use strict;
use warnings;
use LWP::UserAgent;
use HTTP::Response::Encoding;
use File::Spec;
use Encode;
use Cwd;
use URI;
use Test::More tests => 13;

my $ua = LWP::UserAgent->new;
my $cwd = getcwd;

#BEGIN{
#    package LWP::Protocol;
#    $^W = 0;
#}

for my $meth (qw/charset encoder encoding decoded_content/){
    can_ok('HTTP::Response', $meth);
}

my %charset = qw(
		 UTF-8        utf-8-strict;
		 EUC-JP       EUC-JP
		 Shift_JIS    SHIFT_JIS
		 ISO-2022-JP  ISO-2022-JP
	       );

my %filename = qw(
	      UTF-8        t-utf-8.html
	      EUC-JP       t-euc-jp.html
	      Shift_JIS    t-shiftjis.html
	      ISO-2022-JP  t-iso-2022-jp.html
	     );

for my $charset (sort keys %charset){
    my $uri = URI->new('file://');
    $uri->path(File::Spec->catfile($cwd, "t", $filename{$charset}));
    my $res;
    {
	local $^W = 0; # to quiet LWP::Protocol
	$res = $ua->get($uri);
    }
    die unless $res->is_success;
    is $res->charset, $charset, "\$res->charset eq '$charset'";
    my $canon = find_encoding($charset)->name;
    is $res->encoding, $canon, "\$res->encoding eq '$canon'"; 
}

my $uri = URI->new('file://');
$uri->path(File::Spec->catfile($cwd, "t", "t-null.html"));
my $res = $ua->get($uri);
die unless $res->is_success;
if (defined $res->encoding){
    is $res->encoding, "ascii", "res->encoding is ascii";
}else{
    ok !$res->encoding, "res->encoding is undef";
}

t/boilerplate.t  view on Meta::CPAN

#!perl -T

use strict;
use warnings;
use Test::More tests => 3;

sub not_in_file_ok {
    my ($filename, %regex) = @_;
    open my $fh, "<", $filename
        or die "couldn't open $filename for reading: $!";

    my %violated;

    while (my $line = <$fh>) {
        while (my ($desc, $regex) = each %regex) {
            if ($line =~ $regex) {
                push @{$violated{$desc}||=[]}, $.;
            }
        }
    }

    if (%violated) {
        fail("$filename contains boilerplate text");
        diag "$_ appears on lines @{$violated{$_}}" for keys %violated;
    } else {
        pass("$filename contains no boilerplate text");
    }
}

not_in_file_ok(README =>
    "The README is used..."       => qr/The README is used/,
    "'version information here'"  => qr/to provide version information/,
);

not_in_file_ok(Changes =>
    "placeholder date/time"       => qr(Date/time)
);

sub module_boilerplate_ok {
    my ($module) = @_;
    not_in_file_ok($module =>
        'the great new $MODULENAME'   => qr/ - The great new /,
        'boilerplate description'     => qr/Quick summary of what the module/,
        'stub function definition'    => qr/function[12]/,
    );
}

module_boilerplate_ok('lib/HTTP/Response/Encoding.pm');

t/pod-coverage.t  view on Meta::CPAN

#!perl -T

use Test::More;
eval "use Test::Pod::Coverage 1.04";
plan skip_all => "Test::Pod::Coverage 1.04 required for testing POD coverage" if $@;
all_pod_coverage_ok();

t/pod.t  view on Meta::CPAN

#!perl -T

use Test::More;
eval "use Test::Pod 1.14";
plan skip_all => "Test::Pod 1.14 required for testing POD" if $@;
all_pod_files_ok();

t/t-euc-jp.html  view on Meta::CPAN

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=EUC-JP"/>
<title>Test</title>
</head>
<body>
<p>´Á»ú¡¢¥«¥¿¥«¥Ê¡¢¤Ò¤é¤¬¤Ê¤ÎÆþ¤Ã¤¿html.</p>
</body>
</html>

t/t-iso-2022-jp.html  view on Meta::CPAN

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-2022-JP"/>
<title>Test</title>
</head>
<body>
<p>$B4A;z!"%+%?%+%J!"$R$i$,$J$NF~$C$?(Bhtml.</p>
</body>
</html>

t/t-null.html  view on Meta::CPAN

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Test</title>
</head>
<body>
<p>The quick brown fox jumps over the black lazy dog.</p>
</body>
</html>

t/t-shiftjis.html  view on Meta::CPAN

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS"/>
<title>Test</title>
</head>
<body>
<p>Š¿ŽšAƒJƒ^ƒJƒiA‚Ђ炪‚È‚Ì“ü‚Á‚½html.</p>
</body>
</html>

t/t-utf-8.html  view on Meta::CPAN

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Test</title>
</head>
<body>
<p>漢字、カタカナ、ひらがなの入ったhtml.</p>
</body>
</html>

 view all matches for this distribution
 view release on metacpan -  search on metacpan

( run in 0.839 second using v1.00-cache-2.02-grep-82fe00e-cpan-1925d2aa809 )