MARC::Charset - convert MARC-8 encoded strings to UTF-8
# import the marc8_to_utf8 function use MARC::Charset 'marc8_to_utf8'; # prepare STDOUT for utf8 binmode(STDOUT, 'utf8'); # print out some marc8 as utf8 print marc8_to_utf8($marc8_string);
MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records.
http://www.loc.gov/marc/specifications/spechome.html
Converts a MARC-8 encoded string to UTF-8.
my $utf8 = marc8_to_utf8($marc8);
If you'd like to ignore errors pass in a true value as the 2nd parameter:
my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');
Will attempt to translate utf8 into marc8.
my $marc8 = utf8_to_marc8($utf8);
If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter:
my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');
If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code:
use MARC::Charset::Constants qw(:all); $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC; $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;
MARC::Charset::Constant
MARC::Charset::Table
MARC::Charset::Code
MARC::Charset::Compiler
MARC::Record
MARC::XML
Ed Summers (ehs@pobox.com)
To install MARC::Charset, copy and paste the appropriate command in to your terminal.
cpanm
cpanm MARC::Charset
CPAN shell
perl -MCPAN -e shell install MARC::Charset
For more information on module installation, please visit the detailed CPAN module installation guide.