Bryan Baldus > MARC-Errorchecks-1.17a > MARC::BBMARC

Download:
MARC-Errorchecks-1.17a.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  1
Open  0
View/Report Bugs
Module Version: 1.08   Source   Latest Release: MARC-Errorchecks-1.18

JUNK CODE

 ####################
 ####################
 ###
 ### Add below where it belongs. Also, use
 ### in individual scripts where needed.
 ###
 ### #!/usr/bin/perl -w
 ### # use strict;
 ### # $| = 1;
 ### # use MARC::File;
 ###
 ####################
 ####################

NAME ^

MARC::BBMARC

SYNOPSIS ^

Basic list of subs. For individual use, see the descriptions/POD-like info above each sub.

  use MARC::Field;
  use MARC::File;
  use MARC::BBMARC;
  MARC::BBMARC::as_formatted2();
  MARC::BBMARC::recas_formatted();
  MARC::BBMARC::skipget();
  MARC::BBMARC::getthreedigits();
  MARC::BBMARC::getindicators();
  MARC::BBMARC::updated_record_array();
  MARC::BBMARC::read_controlnos();
  MARC::BBMARC::readcodedata();
  MARC::BBMARC::parse008date($field008string);
  MARC::BBMARC::updated_record_hash();

DESCRIPTION ^

Collection of methods and subroutines, add-ons to MARC::Record, MARC::File, MARC::Field.

Subroutines include:

as_formatted2(), add-on to MARC::Field, which pretty-prints fields, separating subfields by tabs, rather than line breaks.

recas_formatted(), add-on to MARC::Record, which is the same as as_formatted, but uses as_formatted2() instead.

skipget(), add-on to MARC::File, which returns the next raw marc record from a file.

updated_record_array(), which creates an array of control numbers (001) from input file. Used with merge marc script. Call to initialize updated record array variable prior to entering loop. Accepts passed file name, or prompts for one.

read_controlnos(), reads file of control number and returns the control numbers (or lines of file) as an array. Accepts passed file name, or prompts for one.

getthreedigits(), prompts (without prompt) for 3 digit number, if not received, asks to try again.

getindicators(), prompts for 1st and 2nd indicator values. Parses for legitimate values. Returns 2 array references: \@indicators, \@indicatortypes The first contains the values of the indicators. The second contains the types of those indicators. The first element in each is 'empty', so numbering matches indicator 1 and indicator 2. Indicator types: 'digit', 'blank', or 'any'.

readcodedata(), subroutine for reading data to build an array of country codes, geographic area codes, and language codes, valid and obsolete, for use in validate008 (in MARC::Errorchecks) and 043 validation (in Lintadditions (which uses its own, similar subroutine).

parse008date($field008string), preliminary version of code to parse a 6 digit date in the form yymmdd into yyyy\tmm\tdd\t$errors. It is from validate008, and that subroutine might be (or has been?) cleaned by calling parse008date($field008string).

counting_print( $number ), prints out running count ('$number') passed in, based on constant MOD_INTERVAL (if count divides evenly).

startstop_time(), returns current time in h:m:s format. If numbers are 1 digit, then only that digit appears (may be fixed later).

Also includes code to wrap around scripts, integrating startstop_time and counting_print, plus elapsed time.

updated_record_hash(), similar to updated_record_array(), but stores raw USMARC record indexed (keyed) by control number. This has not been fully tested, and will likely eat massive amounts of memory, especially for large files of records.

as_array(), add-on to MARC::Field, breaks field into flat array of subfield code-data pairs. Based on example #U9 of the MARC::Doc::Tutorial.

EXPORT ^

None

TO DO ^

Figure out how to "use" and not have to put MARC::BBMARC before subroutine/method calls.

Clean up readability of POD-like documentation.

Test and cleanup updated_record_hash()

(More to do in individual subs).

Evaluate the usefulness of parse008date, which is now duplicated in MARC::Errorchecks.

Verify each of the codes in the data against current lists and lists of changes. Maintain code list data when future changes occur.

as_formatted2()

Returns a pretty string for printing in a MARC dump. From MARC::Field.

recas_formatted()

Prints an entire record in human-readable form, using as_formatted2(). This puts each field on a single line and uses @ (at) as subfield delimiter instead of _ (underscore). Based on MARC::Record::as_formatted().

skipget()

Returns a raw MARC record string or undef.

updated_record_array()

Note: Creates an array of control numbers (001) from input file. Use with merge marc script. Call to initialize updated record array variable prior to entering loop. Prompts for updated record file. Prints running count of records based on counting_print function. Works only with USMARC input files.

read_controlnos()

Accepts passed filename as arguement. If nothing is passed, asks for file path/name. Reads each line of file, and pushes it onto array, @controlnumberarray, which is returned. Lines in the file should contain only control number.

Since it does not do anything to the line it reads, this subroutine can be used to read lines from a file and store them in an array.

To do: Modify existing scripts to clean control number, replacing spaces with underscores. Regex-ify control number to be (3 char) - (8 digit) - (space).

getthreedigits()

Looks for three digit input. Returns three digit string.

sub getindicators()

Gets 1st and 2nd indicator values. Parses for legitimate values. Returns 2 array references: \@indicators, \@indicatortypes The first contains the values of the indicators. The second contains the types of those indicators. The first element in each is 'empty', so numbering matches indicator 1 and indicator 2. Indicator types: 'digit', 'blank', or 'any'.

Get indicator additional info

 ##################################
 ##################################
 ### Synopsis/Calling procedure ###
 ##################################

 my ($gotindicators, $gotindicatortypes) = getindicators();
 print join ("\n", "indarray", @$gotindicators, "\n");
 print join ("\n", "indtypes", @$gotindicatortypes, "\n");

readcodedata()

readcodedata() -- Read Country, Geographic Area Code, Language Data

DESCRIPTION (readcodedata())

Subroutine for reading data to build an array of country codes, geographic area codes, and language codes, valid and obsolete, for use in validate008 (in MARC::Errorchecks) and 043 validation (in MARC::Lintadditions).

SYNOPSIS (readcodedata())

 my @dataarray = MARC::BBMARC::readcodedata();
## or 
 #MARC::BBMARC::readcodedata();
 #my @countrycodes = split "\t", $MARC::BBMARC::dataarray[1];
 
 my @countrycodes = split "\t", $dataarray[1];
 my @oldcountrycodes = split "\t", $dataarray[3];
 my @geogareacodes = split "\t", $dataarray[5];
 my @oldgeogareacodes = split "\t", $dataarray[7];
 my @languagecodes = split "\t", $dataarray[9];
 my @oldlanguagecodes = split "\t", $dataarray[11];

DATA Outline

 Data lines:
 0: __CountryCodes__
 1: countrycodes (tab-delimited)
 2: __ObsoleteCountry__
 3: oldcountrycodes (tab-delimited)
 4: __GeogAreaCodes__
 5: gacodes (tab-delimited)
 6: __ObsoleteGeogAreaCodes__
 7: oldgacodes (tab-delimited)
 8: __LanguageCodes__
 9: languagecodes (tab-delimited)
 10: __LanguageCodes__
 11: oldlanguagecodes (tab-delimited)

parse008date($field008string)

Subroutine parse008date returns four-digit year, two-digit month, and two-digit day. It requres an 008 string at least 6 bytes long.

SYNOPSIS (parse008date($field008string))

 my ($earlyyear, $earlymonth, $earlyday);
 print ("What is the earliest create date desired (008 date, in yymmdd)? ");
 while (my $earlydate = <>) {
 chomp $earlydate;
 my $field008 = $earlydate;
 my $yyyymmdderr = MARC::BBMARC::parse008date($field008);
 my @parsed008date = split "\t", $yyyymmdderr;
 $earlyyear = shift @parsed008date;
 $earlymonth = shift @parsed008date;
 $earlyday = shift @parsed008date;
 my $errors = join "\t", @parsed008date;
 if ($errors) {
 if ($errors =~ /is too short/) {
 print "Please enter a longer date, $errors\nEnter date (yymmdd): ";
 }
 else {print "$errors\nEnter valid date (yymmdd): ";}
 } #if errors
 else {last;}
 }

counting_print ($modcount)

Prints a running count (when called from a loop) based on MOD_INTERVAL. Argument is current count use constant MOD_INTERVAL => ###put number here###;

startstop_time()

Start stop time is called when a program starts or finishes, to see how long it takes to complete. Returns time in hour:min:second format, with seconds<10 being single digit. (to fix later)

updated_record_hash()

Note: Creates an hash of control numbers (001) and associated raw MARC data from input file. Use with compare records script. Call to initialize updated record array variable prior to entering loop. Prompts for updated record file if the name (or path) of one is not passed in. Prints running count of records based on counting_print function. Works only with USMARC input files.

NOTE WARNING (on updated_record_hash) ^

This may be very memory intensive as it stores raw MARC for each record in the updated (first) file, with its associated control number. 40000+ records (43815K on disk) take approximately 102,192K+ to read in and then dereference. YOU HAVE BEEN WARNED!!!

TO DO (on updated_record_hash)

Reduce memory usage, probably by learning how to tie hash to file instead of storing everything in memory.

as_array

Add-on method to MARC::Field. Breaks MARC::Field into a flat array of subfield code and subfield data pairs. Based on example 9 of the MARC::Doc::Tutorial.

head2 Example (as_array)

my $field043 = MARC::Field->new('043', '', '', 'a' => 'n-us---', 'a' => 'e-uk---', 'a' => 'a-th---' );

my $field043_arrayref = $field043->as_array(); my @field043_array = @$field043arrayref;

# @field043_array is: ('a', 'n-us---', 'a', 'e-uk---', 'a', 'a-th---')

TO DO (as_array)

Add ability to optionally pass in regex to find in subfields, returning positions of the matches (in a second array ref).

 ########################
 ### Program template ###
 ########################
 ###########################
 ### Initialize includes ###
 ### and basic needs     ###
 ###########################
 
 ##Time coding to wrap around program to determine how long execution takes:
 
 ##########################
 ## Time coding routines ##
 ## Print start time and ##
 ## set start variable   ##
 ##########################
 
 use Time::HiRes qw(  tv_interval );
 # measure elapsed time 
 # (could also do by subtracting 2 gettimeofday return values)
 my $t0 = [Time::HiRes::time()];
 my $startingtime = MARC::BBMARC::startstop_time();
 # do bunch of stuff here
 #########################
 ### Start main program ##
 #########################
 ############################################
 # Set start time for main calculation loop #
 ############################################
 my $t1 = [Time::HiRes::time()];
 my $runningrecordcount=0;
 #####################################
 ## Place the following within loop ##
 #####################################
 $runningrecordcount++;
 MARC::BBMARC::counting_print ($runningrecordcount);
 ##########################
 ### Main program done.  ##
 ### Report elapsed time.##
 ##########################
 
 my $elapsed = tv_interval ($t0);
 my $calcelapsed = tv_interval ($t1);
 print sprintf ("%.4f %s\n", "$elapsed", "seconds from execution\n");
 print sprintf ("%.4f %s\n", "$calcelapsed", "seconds to calculate\n");
 my $endingtime = MARC::BBMARC::startstop_time();
 print "Started at $startingtime\nEnded at $endingtime";
 
 print "\n\nPress Enter to quit";
 <>;
 #####################
 ### END OF PROGRAM ##
 #####################

SEE ALSO ^

MARC::Record -- Required for this module to work.

MARC::Lintadditions -- Extension of MARC::Lint (in the MARC::Record distribution) for checks involving individual tag checking.

MARC::Errorchecks -- Extension of MARC::Lint (in the MARC::Record distribution) for checks involving cross-field checking.

MARC pages at the Library of Congress (http://www.loc.gov/marc)

CHANGES/HISTORY ^

Version 1.08: Updated Oct 31, 2004. Released Dec. 5, 2004.

 -New method, as_array, an add-on to MARC::Field which breaks down a MARC::Field object into a flat array, returns a ref to that array.
 -Misc. cleanup.

Version 1.07: Updated Aug. 30-Oct. 16, 2004. Released Oct. 17, 2004.

 -Moved subroutine getcontrolstocknos() to MARC::QBIerrorchecks
 -Moved validate007() to Lintadditions.pm
 -Moved validate008() and related subs to Errorchecks.pm
 --(Left readcodedata() in BBMARC, but it is now duplicated in Errorchecks.pm, along with a modified version in Lintadditions.pm).
 --Also left parse008date, which may have uses outside of error checking.
 -Updated read_controlnos([$filename]) with minor changes. 
 --This subroutine could be rewritten in a more general way, since it simply reads all lines from a file into an array and returns that array.

Version 1.06: Updated Aug. 10-22, 2004. Released Aug. 22, 2004.

 -Implemented VERSION (uncommented)
 -Added subroutine getcontrolstocknos()
 -General readability cleanup (added tabs)
 -Bug fix in C<validate008> for date2 check

Version 1.05: Updated July 3-17, 2004. Released July 18, 2004

 -Cleaned some documentation
 -Added global variable in hopes of improving efficiency of language/GAC/country code validation
 -Modified C<validate008> and/or C<readcodedata()> to use the new global variable.
 -Moved C<readcodedata()> and C<parse008date> above C<validate008>

Version 1.04: Updated June 16, 2004, released June 20, 2004

 -Updated as_formatted2() to work with MARC::Record 1.38 (is_control_field() instead of is_control_tag()
 -Fixed bug in validate008 for visual materials running time (hypen was not escaped, so it was being interpreted as a range indicator).
 -Added parse008date($) to allow user to enter yymmdd and get yyyy\tmm\tdd\t$error string back (for other uses).
 -Added DATA containing codes from the MARC lists for Countries, Geographic Areas, and Languages, to 2003. Each code set is separated by tabs, and Obsolete codes are given following each set of valid codes, in the same format.
 -Added readcodedata() subroutine for reading in the data and returning the data in an array for use by validation code, such as in validate008()
 -Modified validate008 subroutine to use the DATA to validate language and country codes.

Version 1.03: Updated June 10, not released.

 -Contained many of the changes in 1.04, but 1.04 contains the update to validate008, so I wanted a new version.

Version 1.02: Updated May 27, 2004, released May 31, 2004

 -added updated_record_hash() (not yet tested, highly memory intensive)
 -cleaned some documentation

Version 1.01: Updated Apr. 28, 2004, released May 1, 2004

 -Added validate008()
 -Changed as_formatted2() in attempt to remove extra spaces between subfields
 -Changed getthreedigits() to allow wildcards (.)

Version 1 (original version, lacked version designation): First release, Jan. 5, 2004

LICENSE ^

This code may be distributed under the same terms as Perl itself.

Please note that this module is not a product of or supported by the employers of the various contributors to the code.

AUTHOR ^

Bryan Baldus eijabb@cpan.org

Copyright (c) 2003-2004

 ##methodchecking code:
 ### put this in scripts if adding it to BBMARC fails:
 # else {
 # *MARC::File::skipget = *MARC::BBMARC::skipget;
 # };

 ###########################################
 ### For Windows/DOS, end programs with: ###
 #print "Press Enter to continue"; #########
 #<>; ######################################
 ############################################
syntax highlighting: