Ed Summers > MARC-Charset-0.95 > MARC::Charset

Download:
MARC-Charset-0.95.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Module Version: 0.95   Source   Latest Release: MARC-Charset-1.35

NAME ^

MARC::Charset - convert MARC-8 encoded strings to UTF-8

SYNOPSIS ^

    # import the marc8_to_utf8 function
    use MARC::Charset 'marc8_to_utf8';
   
    # prepare STDOUT for utf8
    binmode(STDOUT, 'utf8');

    # print out some marc8 as utf8
    print marc8_to_utf8($marc8_string);

DESCRIPTION ^

MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records.

    http://www.loc.gov/marc/specifications/spechome.html

EXPORTS ^

marc8_to_utf8()

Converts a MARC-8 encoded string to UTF-8.

    my $utf8 = marc8_to_utf8($marc8);

If you'd like to ignore errors pass in a true value as the 2nd parameter:

    my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');

utf8_to_marc8()

Will attempt to translate utf8 into marc8.

    my $marc8 = utf8_to_marc8($utf8);

If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter:

    my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');

DEFAULT CHARACTER SETS ^

If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code:

    use MARC::Charset::Constants qw(:all);
    $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
    $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;

SEE ALSO ^

AUTHOR ^

Ed Summers (ehs@pobox.com)

syntax highlighting: