Bryan Baldus > MARC-Errorchecks-1.18 > MARC::Errorchecks

Download:
MARC-Errorchecks-1.18.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Module Version: 1.18   Source  

NAME ^

MARC::Errorchecks -- Collection of MARC 21/AACR2 error checks

DESCRIPTION ^

Module for storing MARC error checking subroutines, based on MARC 21, AACR2, and LCRIs. These are used to find errors not easily checked by the MARC::Lint and MARC::Lintadditions modules, such as those that cross field boundaries.

Each subroutine should generally be passed a MARC::Record object.

Returned warnings/errors are generated as follows: push @warningstoreturn, join '', ($field->tag(), ": [ERROR TEXT]\t"); return \@warningstoreturn;

SYNOPSIS ^

 use MARC::Batch;
 use MARC::Errorchecks;

 #See also MARC::Lintadditions for more checks
 #use MARC::Lintadditions;

 #change file names as desired
 my $inputfile = 'marcfile.mrc';
 my $errorfilename = 'errors.txt';
 my $errorcount = 0;
 open (OUT, ">$errorfilename");
 #initialize $infile as new MARC::Batch object
 my $batch = MARC::Batch->new('USMARC', "$inputfile");
 my $errorcount = 0;
 #loop through batch file of records
 while (my $record = $batch->next()) {
  #if $record->field('001') #add this if some records in file do not contain an '001' field
  my $controlno = $record->field('001')->as_string();   #call MARC::Errorchecks subroutines

  my @errorstoreturn = ();

  # check everything

  push @errorstoreturn, (@{MARC::Errorchecks::check_all_subs($record)});

  # or only a few
  push @errorstoreturn, (@{MARC::Errorchecks::check_010($record)});
  push @errorstoreturn, (@{MARC::Errorchecks::check_bk008_vs_bibrefandindex($record)});

  # report results
  if (@errorstoreturn){
   #########################################
   print OUT join( "\t", "$controlno", @errorstoreturn, "\t\n");

   $errorcount++;
  }

 } #while

TO DO ^

Maintain check-all subroutine, a wrapper that calls all the subroutines in Errorchecks, to simplify calling code in .pl.

Determine whether extra tabs are being added to warnings. Examine how warnings are returned and see if a better way is available.

Add functionality.

 -Ending punctuation (in Lintadditions.pm, and 300 dealt with here, and now 5xx (some)).
 -Matching brackets and parentheses in fields?
 -Geographical headings miscoded as subjects.
 
 Possibly rewrite as object-oriented?
 If not, optimize this and the Lintadditions.pm checks.
 Example: reduce number of repeated breaking-out of fields into subfield parts.
 So, subroutines that look for double spaces and double punctuation might be combined.

Remove local practice code or facilitate its modification/customization.

Deal with other TO DO items found below. This includes fixing problem of "bibliographical references" being required if 008 contents has 'b'.

check_all_subs

Calls each error-checking subroutine in Errorchecks. Gathers all errors and returns those errors in an array (reference).

TO DO (check_all_subs)

Make sure to update this subroutine as additional subroutines are added.

is_RDA($record)

Checks to see if record is coded as an RDA record or not (based on 040$e).

check_double_periods($record)

Looks for more than one period within subfields after 010. Exception: Exactly 3 periods together are treated as ellipses.

Looks for multiple commas.

TO DO (check_double_periods)

Find exceptions where double periods may be allowed. Find exceptions where more than 3 periods can be next to each other. Find exceptions where double commas are allowed (URI subfields, 856 field).

Deal with the exceptions. Currently, skips 856 field completely. Needs to skip URI subfields.

check_internal_spaces($record)

Looks for more than one space within subfields after 010. Ignores 035 field, since multiple spaces could be allowed. Accounts for extra spaces between angle brackets for open date in 260c. Current version allows extra spaces in any 260 subfield containing angle brackets.

TO DO (check_internal_spaces)

Account for non-numeric tags? Will likely complain for non-numeric tags in a record, since comparisons rely upon numeric tag checking.

check_trailing_spaces($record)

Looks for extra spaces at the end of fields greater than 010. Ignores 016 extra space at end.

TO DO (check_trailing_spaces)

Rewrite to incorporate 010 and 016 space checking.

Consider allowing trailing spaces in 035 field.

check_006($record)

Code for validating 006s in MARC records. Validates each byte of the 006, based on #MARC::Errorchecks::validate008($field008, $mattype, $biblvl)

TO DO (check_006)

Use validate008 subroutine: -Break byte 18-34 checking into separate sub so it can be used for 006 validation as well. -Optimize efficiency.

check_008($record)

Code for validating 008s in MARC records. Validates each byte of the 008, based on MARC::Errorchecks::validate008($field008, $mattype, $biblvl)

TO DO (check_008)

Improve validate008 subroutine (see that sub for more information): -Break byte 18-34 checking into separate sub so it can be used for 006 validation as well. -Optimize efficiency.

Revised 12-2-2004 to use new validate008() sub.

check_010($record)

Verifies 010 subfield 'a' has proper spacing.

TO DO (check_010)

Compare efficiency of getting current date vs. setting global current date. Determine best way to establish global date.

Think about whether subfield 'z' needs proper spacing.

Deal with non-digit characters in original 010a field. Currently these are simply reported and the space checking is skipped.

Revise local treatment of LCCN checking (invalid 8-digits pre-1980) for more universal use.

Maintain date ranges in checking validity of numbers.

Modify date ranges according to local catalog needs.

Determine whether this subroutine can be implemented in MARC::Lintadditions/Lint--I don't remember why it is here rather than there?

local practice

 #this section could be implemented to validate 8-digit LCCN being between a specific set of years (1900-1980, for example).

 #code has been commented/podded out for general practice
            my $year = substr($subfielda, 0, 2);
            #should be old lccn, so first 2 digits are 00 or > 80
            #The 1980 limit is a local practice.
            #Change the date ranges according to local needs (e.g. if LC records back to 1900 exist in the catalog, do not implement this section of the error check)
            if (($year >= 1) && ($year < 80)) {push @warningstoreturn, ("010: First digits of LCCN are $year.");}

NAME

check_end_punct_300($record)

DESCRIPTION

Reports an error if an ending period in 300 is missing if 4xx exists, or if 300 ends with closing parens-period if 4xx does not exist.

NAME

check_bk008_vs_300($record)

DESCRIPTION

300 subfield 'b' vs. presence of coding for illustrations in 008/18-21.

Ignores CIP records completely. Ignores non-book records completely (for the purposes of this subroutine).

If 300 'b' has wording, reports errors if matching 008/18-21 coding is not present. If 008/18-21 coding is present, but similar wording is not present in 300, reports errors.

Note: plates are an exception, since they are noted in $a rather than $b of the 300. So, they need to be checked twice--once if 'f' is the only code in the 008/18-21, and again amongst other codes.

Also checks for 'p.' or 'v.' in subfield 'a'

LIMITATIONS

Only accounts for a single 300 field (300 was recently made repeatable).

Older/more specific code checking is limited due to lack of use (by our catalogers). For example, coats of arms, facsim., etc. are usually now given as just 'ill.' So the error check allows either the specific or just ill. for all except maps.

Depends upon 008 being coded for book monographs.

Subfield 'a' and 'c' wording checks ('p.' or 'v.'; 'cm.', 'in.', 'mm.') only look at first of each kind of subfield.

TO DO (check_bk008_vs_300($record))

Take care of case of 008 coded for serials/continuing resources.

Find exceptions to $a having 'p.' or 'v.' (and leaves, columns) for books.

Find exceptions to $c having 'cm.', 'mm.', or 'in.' preceded by digits.

Deal with other LIMITATIONS.

Account for upcoming rule change in which metric units have no punctuation. When that rule goes into effect, move 300$c checking to check_end_punct_300($record).

Reverse checks to report missing 008 code if specific wording is present in 300.

Reverse check for plates vs. 'f'

NAME

 parse008vs300b($illcodes, $field300subb)

DESCRIPTION

008 illustration parse subroutine

checks 008/18-21 code against 300 $b

WHY?

To simplify the check_bk008_vs_300($record) subroutine, which had many if-then statements. This moves the additional checking conditionals out of the way. It may be integrated back into the main subroutine once it works. This was written while constructing check_bk008_vs_300($record) as a separate script.

Synopsis/Usage description

    parse008vs300b($illcodes, $field300subb)

 #$illcodes is bytes 18-21 of 008
 #$subfieldb is subfield 'b' of record's 300 field

TO DO (parse008vs300b($$))

Integrate code into check_bk008_vs_300($record)?

Verify possibilities for 300 text

Move 'm' next to 'f' since it is likely to be indicated in subfield 'e' not 'b' of the 300. Our catalogers do not generally code for sound recordings in this way in book records.

check_490vs8xx($record)

If 490 with 1st indicator '1' exists, then 8xx (800, 810, 811, 830) should exist.

check_240ind1vs1xx($record)

If 1xx exists then 240 1st indicator should be '1'. If 1xx does not exist then 240 should not be present.

However, exceptions to this rule are possible, so this should be considered an optional error.

check_245ind1vs1xx($record)

If 1xx exists then 245 1st indicator should be '1'. If 1xx does not exist then 245 1st indicator should be '0'.

However, exceptions to this rule are possible, so this should be considered an optional error.

TODO (check_245ind1vs1xx($record))

Provide some way to easily turn off reporting of "245: Indicator is 0 but 1xx exists." errors. In some cases, catalogers may choose to code a 245 with 1st indicator 0 if they do not wish that 245 to be indexed. There is not likely a way to programmatically determine this choice by the cataloger, so in situations where catalogers are likely to choose not to index a 245, this error should be supressed.

matchpubdates($record)

Date matching 008, 050, 260

Attempts to match date of publication in 008 date1, 050 subfield 'b', and 260 subfield 'c'.

Reports errors when one of the fields does not match. Reports errors if one of the dates cannot be found

Handles cases where 050 or 260 (or 260c) does not exist. -Currently if the subroutine is unable to get either the date1, any 050 with $b, or a 260 with $c, it returns (exits). -Future, or better, behavior, might be to continue processing for the other fields.

Handles cases where 050 is different due to conference dates. Conference exception handling is currently limited to presence of 111 field or 110$d.

For RDA, checks 264 _1 $c as well as 1st 260$c.

KNOWN PROBLEMS

May not deal well with serial records (problem not even approached).

Only examines 1st 260, does not account for more than one 260 (recent addition).

Relies upon 260$c date being the first date in the last 260$c subfield.

Has problem finding 050 date if it is not last set of digits in 050$b.

Process of getting 008date1 duplicates similar check in validate008 subroutine.

TO DO

Improve Conference publication checking (limited to 111 field or 110$d being present for this version) This may include comparing 110$d or 111$d vs. 050, and then comparing 008date1 vs. 260$c.

Fix parsing for 050$bdate.

For CIP, if 260 does not exist, compare only 050 and 008date1. Currently, CIP records without 260 are skipped.

Account for undetermined dates, e.g. [19--?] in 260 and 008.

Account for older 050s with no date present.

check_bk008_vs_bibrefandindex($record)

 Ignores non-book records (other than cartographic materials).
 For cartographic materials, checks only for index coding (not bib. refs.).

 Examines 008 book-contents (bytes 24-27) and book-index (byte 31).
 Compares with 500 and 504 fields.
 Reports error if 008contents has 'b' but 504 does not have "bibliographical references."
 Reports error if 504 has "bibliographical references" but no 'b' in 008contents.
 Reports error if 008index has 1 but no 500 or 504 with "Includes .* index."
 Reports error if a 500 or 504 has "Includes .* index" but 008index is 0. 
 Reports error if "bibliographical references" appears in 500.
 Allows "bibliographical reference."

TO DO/KNOWN PROBLEMS

 As with other subroutines, this one treats all 008 as being coded for monographs.
 Serials are ignored for the moment.

 Account for records with "Bibliography" or other wording in place of "bibliographical references."
 Currently 'b' in 008 must match with "bibliographical reference" or "bibliographical references" in 504 (or 500--though that reports an error).

 Reverse check for other wording (or subject headings) vs. 008 'b' in contents.

 Check for other 008contents codes.

 Check for misspelled "bibliographical references."

 Check spacing if pagination is given in 504.

check_041vs008lang($record)

Compares first code in subfield 'a' of 041 vs. 008 bytes 35-37.

check_5xxendingpunctuation($record)

Validates punctuation in various 5xx fields.

Currently checks 500, 501, 504, 505, 508, 511, 538, 546.

For 586, see check_nonpunctendingfields($record)

TO DO (check_5xxendingpunctuation)

Add checks for the other 5xx fields.

Verify rules for these checks (particularly 505).

findfloatinghyphens($record)

Looks at various fields and reports fields with space-hypen-space as errors.

TO DO (findfloatinghyphens($record))

Find exceptions.

check_floating_punctuation($record)

 Looks at each non-control tag and reports an error if a floating period, comma, or question mark are found.

Example:

    245 _aThis has a floating period .

Ignores double dash-space when preceded by a non-space (example-- [where functioning as ellipsis replacement])

TODO (check_floating_punctuation($record))

 -Add other undesirable floating punctuation.

 -Look for exceptions where floating punctuation should be allowed.

 -Merge functionality with findfloatinghyphens($record) (to reduce number of runs through the same record, especially).

 -Improve reporting. Current version reports approximately 10 characters before and after the floating text for fields longer than 80 characters, or the full field otherwise, to provide context, particularly in the case of multiple instances.

video007vs300vs538($record)

Comparison of 007 coding vs. 300abc subfield data and vs. 538 data for video records (VHS and DVD).

DESCRIPTION

Focuses on videocassettes (VHS) and videodiscs (DVD and Video CD). Does not consider coding for motion pictures.

If LDR/06 is 'g' for projected medium, (skipping those that aren't) and 007 is present, at least 1 007 should start with 'v'

If 007/01 is 'd', 300a should have 'videodisc(s)'. 300c should have 4 3/4 in. Also, 538 should have 'DVD' If 007/01 is 'f', 300a should have 'videocassette(s)' 300c should have 1/2 in. Also, 538 should have 'VHS format' or 'VHS hi-fi format' (case insensitive on hi-fi), plus a playback mode.

LIMITATIONS

Checks only videocassettes (1/2) and videodiscs (4 3/4). Current version reports problems with other forms of videorecordings.

Accounts for existence of only 1 300 field.

Looks at only 1st subfield 'a' and 'c' of 1st 300 field.

TO DO

Account for motion pictures and videorecordings not on DVD (4 3/4 in.) or VHS cassettes.

Check proper plurality of 300a (1 videodiscs -> error; 5 videocassette -> error)

Monitor need for changes to sizes, particularly 4 3/4 in. DVDs.

Expand allowed terms for 538 as needed and revise current VHS allowed terms.

Update to allow SMDs of conventional terminology ('DVD') if such a rule passes.

Deal with multiple 300 fields.

Check GMD in 245$h

Clean up redundant code.

ldrvalidate($record)

Validates bytes 5, 6, 7, 17, and 18 of the leader against MARC code list valid characters.

DESCRIPTION

Checks bytes 5, 6, 7, 17, and 18.

$ldrbytes{$key} has keys "\d\d", "\d\dvalid" for each of the bytes checked (05, 06, 07, 17, 18)

"\d\dvalid" is a hash ref containing valid code linked to the meaning of that code.

print $ldrbytes{'05valid'}->{'a'}, "\n"; yields: 'Increase in encoding level'

TO DO (ldrvalidate)

Customize (comment or uncomment) bytes according to local needs. Perhaps allow %ldrbytes to be passed into ldrvalidate($record) so that that hash may be created by a calling program, rather than relying on the preset MARC 21 values. This would facilitate adding valid OCLC-MARC bytes such as byte 17--I, K, M, etc.

Examine other Lintadditions/Errorchecks subroutines using the leader to see if duplicate checks are being done.

Move or remove such duplicate checks.

Consider whether %ldrbytes needs full text of meaning of each byte.

geogsubjvs043($record)

Reports absence of 043 if 651 or 6xx subfield z is present.

TO DO (geogsubjvs043)

Update/maintain list of exceptions (in the hash, %geog043exceptions).

findemptysubfields($record)

 Looks for empty subfields.
 Skips 037 in CIP-level records and tags < 010.

check_040present($record)

Reports error if 040 is not present. Can not use Lintadditions check_040 for this since that relies upon field existing before the check is executed.

check_nonpunctendingfields($record)

Checks for presence of punctuation in the fields listed below. These fields are not supposed to end in punctuation unless the data ends in abbreviation, ___, or punctuation.

Ignores initialisms such as 'Q.E.D.' Certain abbrevations and initialisms are explicitly coded.

Fields checked: 240, 246, 440, 490, 586.

TO DO (check_nonpunctendingfields)

Add exceptions--abbreviations--or deal with them. Currently all fields ending in period are reported.

check_fieldlength($record)

Reports error if field is longer than 1870 bytes. (1879 is actual limit, but I wanted to leave some extra room in case of miscalculation.)

This check relates to certain system limitations.

Also reports records with more than 50 fields.

TO DO (check_fieldlength($record))

Use directory information in raw MARC to get the field lengths.

Add new subs with code below.

sub {

    #get passed MARC::Record object

    my $record = shift;

    #declaration of return array

    my @warningstoreturn = ();

    push @warningstoreturn, ("");

    return \@warningstoreturn;

} #

_validate006($field006)

Internal sub that checks the validity of 006 bytes. Used by the check_006 method for 006 validation.

DESCRIPTION

Checks the validity of 006 bytes. Continuing resources/serials 006 may not work (not thoroughly tested, since 006 would usually be coded for serials, with 006 for other material types?).

OTHER INFO

Current version implements material specific validation through internal subs for each material type. Those internal subs allow for checking either 006 or 006 material specific bytes.

NAME

parse008date($field008string)

DESCRIPTION

Subroutine parse008date returns four-digit year, two-digit month, and two-digit day. It requres an 008 string at least 6 bytes long. Also checks of current year, month, day vs. 008 creation date, reporting an error if creation date appears to be later than local time. Assumes 008 dates of 00mmdd to 70mmdd represent post-2000 dates.

Relies upon internal _get_current_date().

SYNOPSIS

 my ($earlyyear, $earlymonth, $earlyday);
 print ("What is the earliest create date desired (008 date, in yymmdd)? ");
 while (my $earlydate = <>) {
 chomp $earlydate;
 my $field008 = $earlydate;
 my $yyyymmdderr = MARC::Errorchecks::parse008date($field008);
 my @parsed008date = split "\t", $yyyymmdderr;
 $earlyyear = shift @parsed008date;
 $earlymonth = shift @parsed008date;
 $earlyday = shift @parsed008date;
 my $errors = join "\t", @parsed008date;
 if ($errors) {
 if ($errors =~ /is too short/) {
 print "Please enter a longer date, $errors\nEnter date (yymmdd): ";
 }
 else {print "$errors\nEnter valid date (yymmdd): ";}
 } #if errors
 else {last;}
 }

TODO parse008date

Remove local practice or revise for easier updating/customization.

validate008 reworked

Reworking of the validate008 sub. Revised to work more like other Errorchecks and Lintadditions checks. Returns array ref of errors. Previous version returned hash ref of 008 byte key-value pairs, array ref of cleaned bytes, and scalar ref of errors. New version returns only an array ref of errors.

validate008 ($field008, $mattype, $biblvl)

Checks the validity of 008 bytes. Used by the check_008 method for 008 validation.

DESCRIPTION

Checks the validity of 008 bytes. Depends upon 008 being based upon LDR/06, so continuing resources/serials records may not work. Checks LDR/07 for 's' for serials before checking material specific bytes.

OTHER INFO

Character positions 00-17 and 35-39 are defined the same across all types of material, with special consideration for position 06.

Current version implements material specific validation through internal subs for each material type. Those internal subs allow for checking either 006 or 008 material specific bytes.

Synopsis

 use MARC::Record;
 use MARC::Errorchecks;

 #$mattype and $biblvl are from LDR/06 and LDR/07
 #my $mattype = substr($leader, 6, 1); 
 #my $biblvl = substr($leader, 7, 1);
 #my $field008 = $record->field('008')->as_string();
 my $field008 = '000101s20002000nyu                 eng d';
 my @warningsfrom008 =  @{MARC::Errorchecks::validate008($field008, $mattype, $biblvl)};

print join "\t", @warningsfrom008, "\n";

TO DO (validate008)

 Add requirement that 40 char string needs to be passed in.
 Add error checking for less than 40 char string.
 --Partially done--Less than 40 characters leads to error.
 Verify datetypes that allow multiple dates.

 Verify continuing resource checking (not thoroughly tested).

 Determine proper values for date type 'e'.

SKIP CODE for SERIALS

### This is not here for any particular reason, ### I just wanted to save it for future use if I needed it. #stop checking if record is not coded 'm', monograph unless ($biblvl eq 'm') { push @warningstoreturn, ("LDR: Record coded $biblvl, not monograph. Further parsing of 008 will not be done for this record."); return (\@warningstoreturn); } #unless bib level is 'm'

TEST CODE

 #test code
 use MARC::Errorchecks;
 use MARC::Record;
 my $leader = '00050nam';
 my $field008 = '000101s20002000nyu                 eng d';
 my $mattype = substr($leader, 6, 1); 
 my $biblvl = substr($leader, 7, 1);

 print "$field008\n";
 my @warningsfrom008 =  @{validate008($field008, $mattype, $biblvl)};

print join "\t", @warningsfrom008, "\n";

_check_cont_res_bytes($mattype, $biblvl, $bytes)

 Internal sub to check 008 bytes 18-34 or 006 bytes 01-17 for Continuing Resources.

 Receives material type, bibliographic level, and a 17-byte string to be validated. The bytes should be bytes 18-34 of the 008, or bytes 01-17 of the 006.

_check_book_bytes($mattype, $biblvl, $bytes)

 Internal sub to check 008 bytes 18-34 or 006 bytes 01-17 for Books.

 Receives material type, bibliographic level, and a 17-byte string to be validated. The bytes should be bytes 18-34 of the 008, or bytes 01-17 of the 006.

_check_electronic_resources_bytes($mattype, $biblvl, $bytes)

 Internal sub to check 008 bytes 18-34 or 006 bytes 01-17 for Electronic Resources.

 Receives material type, bibliographic level, and a 17-byte string to be validated. The bytes should be bytes 18-34 of the 008, or bytes 01-17 of the 006.

_check_cartographic_bytes($mattype, $biblvl, $bytes)

 Internal sub to check 008 bytes 18-34 or 006 bytes 01-17 for Cartographic Materials.

 Receives material type, bibliographic level, and a 17-byte string to be validated. The bytes should be bytes 18-34 of the 008, or bytes 01-17 of the 006.

_check_music_bytes($mattype, $biblvl, $bytes)

 Internal sub to check 008 bytes 18-34 or 006 bytes 01-17 for Music and Sound Recordings.

 Receives material type, bibliographic level, and a 17-byte string to be validated. The bytes should be bytes 18-34 of the 008, or bytes 01-17 of the 006.

_check_visual_material_bytes($mattype, $biblvl, $bytes)

 Internal sub to check 008 bytes 18-34 or 006 bytes 01-17 for Visual Materials.

 Receives material type, bibliographic level, and a 17-byte string to be validated. The bytes should be bytes 18-34 of the 008, or bytes 01-17 of the 006.

_check_mixed_material_bytes($mattype, $biblvl, $bytes)

 Internal sub to check 008 bytes 18-34 or 006 bytes 01-17 for Mixed Materials.

 Receives material type, bibliographic level, and a 17-byte string to be validated. The bytes should be bytes 18-34 of the 008, or bytes 01-17 of the 006.

_get_current_date()

Internal sub for use with validate008($field008, $mattype, $biblvl) (actually with parse008date($field008string)). Returns the current year-month-day, in the form yyyymmdd.

Also used by check_010($record).

CHANGES/VERSION HISTORY ^

Version 1.18: Updated Oct. 8, 2012 to June 22, 2013. Released , 2013.

 -Updated _check_music_bytes for MARC Update 16 (Sept. 2012), adding 'l' as valid for 008/20.

Version 1.17: Updated Oct. 8, 2012 to June 22, 2013. Released June 23, 2013.

 -Updated check_490vs8xx($record) to look only for 800, 810, 811, 830 rather than any 8XX.
 -Added functionality to deal with RDA records.
 -Updated parse008vs300b($illcodes, $field300subb, $record_is_RDA) to pass 3rd variable, "$record_is_RDA".
 -Updated _check_music_bytes for MARC Update 15 (Sept. 2012), adding 'k' as valid for 008/20.

Version 1.16: Updated May 16-Nov. 14, 2011. Released .

 -Turned off check_fieldlength($record) in check_all_subs()
 -Turned off checking of floating hyphens in 520 fields in findfloatinghyphens($record)
 -Updated validate008 subs (and 006) related to 008/24-27 (Books and Continuing Resources) for MARC Update no. 10, Oct. 2009 and Update no. 11, 2010; no. 12, Oct. 2010; and no. 13, Sept. 2011.
 -Updated %ldrbytes with leader/18 'c' and redefinition of 'i' per MARC Update no. 12, Oct. 2010.

Version 1.15: Updated June 24-August 16, 2009. Released , 2009.

 -Updated checks related to 300 to better account for electronic resources.
 -Revised wording in validate008($field008, $mattype, $biblvl) language code (008/35-37) for '   '/zxx.
 -Updated validate008 subs (and 006) related to 008/24-27 (Books and Continuing Resources) for MARC Update no. 9, Oct. 2008.
 -Updated validate008 sub (and 006) for Books byte 33, Literary form, invalidating code 'c' and referring it to 008/24-27 value 'c' .
 -Updated video007vs300vs538($record) to allow Blu-ray in 538 and 's' in 07/04.

Version 1.14: Updated Oct. 21, 2007, Jan. 21, 2008, May 20, 2008. Released May 25, 2008.

 -Updated %ldrbytes with leader/19 per Update no. 8, Oct. 2007. Check for validity of leader/19 not yet implemented.
 -Updated _check_book_bytes with code '2' ('Offprints') for 008/24-27, per Update no. 8, Oct. 2007.
 -Updated check_245ind1vs1xx($record) with TODO item and comments
 -Updated check_bk008_vs_300($record) to allow "leaves of plates" (as opposed to "leaves", when no p. or v. is present), "leaf", and "column"(s).

Version 1.13: Updated Aug. 26, 2007. Released Oct. 3, 2007.

 -Uncommented valid MARC 21 leader values in %ldrbytes to remove local practice. Libraries wishing to restrict leader values should comment out individual bytes to enable errors when an unwanted value is encountered.
 -Added ldrvalidate.t.pl and ldrvalidate.t tests.
 -Includes version 1.18 of MARC::Lint::CodeData.

Version 1.12: Updated July 5-Nov. 17, 2006. Released Feb. 25, 2007.

 -Updated check_bk008_vs_300($record) to look for extra p. or v. after parenthetical qualifier.
 -Updated check_bk008_vs_300($record) to look for missing period after 'col' in subfield 'b'.
 -Replaced $field-tag() with $tag in error message reporting in check_nonpunctendingfields($record).
 -Turned off 50-field limit check in check_fieldlength($record).
 -Updated parse008vs300b($illcodes, $field300subb) to look for /map[ \,s]/ rather than just 'map' when 008 is coded 'b'.
 -Updated check_bk008_vs_bibrefandindex($record) to look for spacing on each side of parenthetical pagination.
 -Updated check_internal_spaces($record) to report 10 characters on either side of each set of multiple internal spaces.
 -Uncommented level-5 and level-7 leader values as acceptable. Level-3 is still commented out, but could be uncommented for libraries that allow it.
 -Includes version 1.14 of MARC::Lint::CodeData.

Version 1.11: Updated June 5, 2006. Released June 6, 2006.

 -Implemented check_006($record) to validate 006 (currently only does length check).
 --Revised validate008($field008, $mattype, $biblvl) to use internal sub for material specific bytes (18-34)
 -Revised validate008($field008, $mattype, $biblvl) language code (008/35-37) to report new 'zxx' code availability when '   ' is the code in the record.
 -Added 'mgmt.' to %abbexceptions for check_nonpunctendingfields($record).

Version 1.10: Updated Sept. 5-Jan. 2, 2006. Released Jan. 2, 2006.

 -Revised validate008($field008, $mattype, $biblvl) to use internal subs for material specific byte checking.
 --Added: 
 ---_check_cont_res_bytes($mattype, $biblvl, $bytes),
 ---_check_book_bytes($mattype, $biblvl, $bytes),
 ---_check_electronic_resources_bytes($mattype, $biblvl, $bytes),
 ---_check_cartographic_bytes($mattype, $biblvl, $bytes),
 ---_check_music_bytes($mattype, $biblvl, $bytes),
 ---_check_visual_material_bytes($mattype, $biblvl, $bytes),
 ---_check_mixed_material_bytes,
 ---_reword_008(@warnings), and
 ---_reword_006(@warnings).
 --Updated Continuing resources byte 20 from ISSN center to Undefined per MARC 21 update of Oct. 2003.
 -Updated wording in findfloatinghyphens($record) to report 10 chars on either side of floaters and check_floating_punctuation($record) to report some context if the field in question has more than 80 chars.
 -check_bk008_vs_bibrefandindex($record) updated to check for 'p. ' following bibliographical references when pagination is present.
 -check_5xxendingpunctuation($record) reports question mark or exclamation point followed by period as error.
 -check_5xxendingpunctuation($record) now checks 505.
 -Updated check_nonpunctendingfields($record) to account for initialisms with interspersed periods.
 -Added check_floating_punctuation($record) looking for unwanted spaces before periods, commas, and other punctuation marks.
 -Renamed findfloatinghyphens($record) to fix spelling.
 -Revised check_bk008_vs_300($record) to account for textual materials on CD-ROM.
 -Added abstract to name.

Version 1.09: Updated July 18, 2005. Released July 19, 2005 (Aug. 14, 2005 to CPAN).

 -Added check_010.t (and check_010.t.pl) tests for check_010($record).
 -check_010($record) revisions.
 --Turned off validation of 8-digit LCCN years. Code commented-out.
 --Modified parsing of numbers to check spacing for 010a with valid non-digits after valid numbers.
 --Validation of 10-digit LCCN years is based on current year.
 -Fixed bug of uninitialized values for matchpubdates($record) 050 and 260 dates.
 -Corrected comparison for year entered < 1980.
 -Removed AutoLoader (which was a remnant of the initial module creation process)

Version 1.08: Updated Feb. 15-July 11, 2005. Released July 16, 2005.

 -Added 008errorchecks.t (and 008errorchecks.t.txt) tests for 008 validation
 -Added check of current year, month, day vs. 008 creation date, reporting error if creation date appears to be later than local time. Assumes 008 dates of 00mmdd to 70mmdd represent post-2000 dates.
 --This is a change from previous range, which gave dates as 00-06 as 200x, 80-99 as 19xx, and 07-79 as invalid. 
 -Added _get_current_date() internal sub to assist with check of creation date vs. current date.
 -findemptysubfields($record) also reports error if period(s) and/or space(s) are the only data in a subfield.
 -Revised wording of error messages for validate008($field008, $mattype, $biblvl)
 -Revised parse008date($field008string) error message wording and bug fix.
 -Bug fix in video007vs300vs538($record) for gathering multiple 538 fields.
 -added check in check_5xxendingpunctuation($record) for space-semicolon-space-period at the end of 5xx fields.
 -added field count check for more than 50 fields to check_fieldlength($record)
 -added 'webliography' as acceptable 'bibliographical references' term in check_bk008_vs_bibrefandindex($record), even though it is discouraged. Consider adding an error message indicating that the term should be 'bibliographical references'?
 -Code indenting changed from tabs to 4 spaces per tab.
 -Misc. bug fixes including changing '==' to 'eq' for tag numbers, bytes in 008, and indicators.

Version 1.07: Updated Dec. 11-Feb. 2005. Released Feb. 13, 2005.

 -check_double_periods() skips field 856, where multiple punctuation is possible for URIs.
 -added code in check_internal_spaces() to account for spaces between angle brackets in open dates in field 260c.
 -Updated various subs to verify that 008 exists (and quietly return if not. check_008 will report the error).
 -Changed #! line, removed -w, replaced with use warnings.
 -Added error message to check_bk008_vs_bibrefandindex($record) if 008 book
 index byte is not 0 or 1. This will result in duplicate errors if check_008 is
 also called on the record.

Version 1.05 and 1.06: Updated Dec. 6-7. Released Dec. 6-7, 2004.

 -CPAN distribution fix.

Version 1.04: Updated Nov. 4-Dec. 4, 2004. Released Dec. 5, 2004.

 -Updated validate008() to use MARC::Lint::CodeData.
 -Removed DATA section, since this is now in MARC::Lint::CodeData.
 -Updated check_008() to use the new validate008().
 -Revised bib. refs. check to require 'reference' to be followed by optional 's', optional period, and word boundary (to catch things like 'referenced'.

Version 1.03: Updated Aug. 30-Oct. 16, 2004. Released Oct. 17. First CPAN version.

 -Moved subs to MARC::QBIerrorchecks
 --check_003($record)
 --check_CIP_for_stockno($record)
 --check_082count($record)
 -Fixed bug in check_5xxendingpunctuation for first 10 characters.
 -Moved validate008() and parse008date() from MARC::BBMARC (to make MARC::Errorchecks more self-contained).
 -Moved readcodedata() from BBMARC (used by validate008)
 -Moved DATA from MARC::BBMARC for use in readcodedata() 
 -Remove dependency on MARC::BBMARC
 -Added duplicate comma check in check_double_periods($record)
 -Misc. bug fixes
 Planned (future versions):
 -Account for undetermined dates in matchpubdates($record).
 -Cleanup of validate008
 --Standardization of error reporting
 --Material specific byte checking (bytes 18-34) abstracted to allow 006 validation.

Version 1.02: Updated Aug. 11-22, 2004. Released Aug. 22, 2004.

 -Implemented VERSION (uncommented)
 -Added check for presence of 040 (check_040present($record)).
 -Added check for presence of 2 082s in full-level, 1 082 in CIP-level records (check_082count($record)).
 -Added temporary (test) check for trailing punctuation in 240, 586, 440, 490, 246 (check_nonpunctendingfields($record))
 --which should not end in punctuation except when the data ends in such.
 -Added check_fieldlength($record) to report fields longer than 1870 bytes.
 --This should be rewritten to use the length in the directory of the raw MARC.
 -Fixed workaround in check_bk008_vs_bibrefandindex($record) (Thanks again to Rich Ackerman).

Version 1.01: Updated July 20-Aug. 7, 2004. Released Aug. 8, 2004.

 -Temporary (or not) workaround for check_bk008_vs_bibrefandindex($record) and bibliographies.
 -Removed variables from some error messages and cleanup of messages.
 -Code readability cleanup.
 -Added subroutines:
 --check_240ind1vs1xx($record)
 --check_041vs008lang($record)
 --check_5xxendingpunctuation($record)
 --findfloatinghypens($record)
 --video007vs300vs538($record)
 --ldrvalidate($record)
 --geogsubjvs043($record)
 ---has list of exceptions (e.g. English-speaking countries)
 --findemptysubfields($record)
 -Changed subroutines:
 --check_bk008_vs_300($record): 
 ---added cross-checking for codes a, b, c, g (ill., map(s), port(s)., music)
 ---added checking for 'p. ' or 'v. ' or 'leaves ' in subfield 'a'
 ---added checking for 'cm.', 'mm.', 'in.' in subfield 'c'
 --parse008vs300b
 ---revised check for 'm', phono. (which our catalogers don't currently use)
 --Added check in check_bk008_vs_bibrefandindex($record) for 'Includes index.' (or indexes) in 504
 ---This has a workaround I would like to figure out how to fix

Version 1.00 (update to 0.95): First release July 18, 2004.

 -Fixed bugs causing check_003 and check_010 subroutines to fail (Thanks to Rich Ackerman)
 -Added to documentation
 -Misc. cleanup
 -Added skip of 787 fields to check_internal_spaces
 -Added subroutines:
 --check_end_punct_300($record)
 --check_bk008_vs_300($record)
 ---parse008vs300b
 --check_490vs8xx($record)
 --check_245ind1vs1xx($record)
 --matchpubdates($record)
 --check_bk008_vs_bibrefandindex($record)

Version 1 (original version (actually version 0.95)): First release, June 22, 2004

SEE ALSO ^

MARC::Record -- Required for this module to work.

MARC::Lint -- In the MARC::Record distribution and basis for this module.

MARC::Lintadditons -- Extension of MARC::Lint for checks involving individual tags. (vs. cross-field checking covered in this module). Available at http://home.inwave.com/eija (and may be merged into MARC::Lint).

MARC pages at the Library of Congress (http://www.loc.gov/marc)

Anglo-American Cataloging Rules, 2nd ed., 2002 revision, plus updates.

Library of Congress Rule Interpretations to AACR2.

MARC Report (http://www.marcofquality.com) -- More full-featured commercial program for validating MARC records.

LICENSE ^

This code may be distributed under the same terms as Perl itself.

Please note that this module is not a product of or supported by the employers of the various contributors to the code.

AUTHOR ^

Bryan Baldus eijabb@cpan.org

Copyright (c) 2003-2013

syntax highlighting: