The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

utf8::all - turn on Unicode - all of it

VERSION

version 0.023

SYNOPSIS

use utf8::all;                      # Turn on UTF-8, all of it.

open my $in, '<', 'contains-utf8';  # UTF-8 already turned on here
print length 'føø bār';             # 7 UTF-8 characters
my $utf8_arg = shift @ARGV;         # @ARGV is UTF-8 too (only for main)

DESCRIPTION

The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope. This also means that you can now use literal Unicode characters as part of strings, variable names, and regular expressions.

utf8::all goes further:

Lexical Scope

The pragma is lexically-scoped, so you can do the following if you had some reason to:

{
    use utf8::all;
    open my $out, '>', 'outfile';
    my $utf8_str = 'føø bār';
    print length $utf8_str, "\n"; # 7
    print $out $utf8_str;         # out as utf8
}
open my $in, '<', 'outfile';      # in as raw
my $text = do { local $/; <$in>};
print length $text, "\n";         # 10, not 7!

Instead of lexical scoping, you can also use no utf8::all to turn off the effects.

Note that the effect on @ARGV and the STDIN, STDOUT, and STDERR file handles is always global and can not be undone!

Enabling/Disabling Global Features

As described above, the default behaviour of utf8::all is to convert @ARGV and to open the STDIN, STDOUT, and STDERR file handles with UTF-8 encoding, and override the readlink and readdir functions and glob operators when utf8::all is used from the main package.

If you want to disable these features even when utf8::all is used from the main package, add the option NO-GLOBAL (or LEXICAL-ONLY) to the use line. E.g.:

use utf8::all 'NO-GLOBAL';

If on the other hand you want to enable these global effects even when utf8::all was used from another package than main, use the option GLOBAL on the use line:

use utf8::all 'GLOBAL';

UTF-8 Errors

utf8::all will handle invalid code points (i.e., utf-8 that does not map to a valid unicode "character"), as a fatal error.

For glob, readdir, and readlink, one can change this behaviour by setting the attribute "$utf8::all::UTF8_CHECK".

ATTRIBUTES

$utf8::all::UTF8_CHECK

By default utf8::all marks decoding errors as fatal (default value for this setting is Encode::FB_CROAK). If you want, you can change this by setting $utf8::all::UTF8_CHECK. The value Encode::FB_WARN reports the encoding errors as warnings, and Encode::FB_DEFAULT will completely ignore them. Please see Encode for details. Note: Encode::LEAVE_SRC is always enforced.

Important: Only controls the handling of decoding errors in glob, readdir, and readlink.

INTERACTION WITH AUTODIE

If you use autodie, which is a great idea, you need to use at least version 2.12, released on June 26, 2012. Otherwise, autodie obliterates the IO layers set by the open pragma. See RT #54777 and GH #7.

BUGS

Please report any bugs or feature requests on the bugtracker website.

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

COMPATIBILITY

The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8. The readlink and readdir functions and glob operators will therefore not be replaced on these systems.

SEE ALSO

AUTHORS

COPYRIGHT AND LICENSE

This software is copyright (c) 2009 by Michael Schwern mschwern@cpan.org; he originated it.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.