=head1 NAME
Catmandu::MARC::Tutorial - A documentation-only module for new users of Catmandu::MARC
=head1 SYNOPSIS
perldoc Catmandu::MARC::Tutorial
=head1 READING
=head2 Convert MARC21 records into JSON
The command below converts file data.mrc into JSON:
$ catmandu convert MARC to JSON < data.mrc
=head2 Convert MARC21 records into MARC-XML
$ catmandu convert MARC to MARC --type XML < data.mrc
=head2 Convert UNIMARC records into JSON, XML, ...
To read UNIMARC records use the RAW parser to get the correct character
encoding.
$ catmandu convert MARC --type RAW to JSON < data.mrc
$ catmandu convert MARC --type RAW to MARC --type XML < data.mrc
=head2 Create a CSV file containing all the titles
To extract data from a MARC record on needs a Fix routine. This
is a small language to manipulate data. In the example below
we extract all 245 fields from MARC:
$ catmandu convert MARC to CSV --fix 'marc_map(245,title); retain(title)' < data.mrc
The Fix C<marc_map> puts the MARC 245 field in the C<title> field.
The Fix C<retain> makes sure only the title field ends up in the
CSV file.
=head2 Create a CSV file containing only the 245$a and 245$c subfields
The C<marc_map> Fix can get one or more subfields to extract from MARC:
$ catmandu convert MARC to CSV --fix 'marc_map(245ac,title); retain(title)' < data.mrc
=head2 Create a CSV file which contains a repeated field
In the example below the 650a field can be repeated in some marc records.
We will join all the repetitions in an comma delimited list for each record.
$ catmandu convert MARC to CSV --fix 'marc_map(650a,subject,join:","); retain(subject)' < data.mrc
=head2 Create a list of all ISBN numbers in the data
In the previous example we saw how all subjects can be printed using a few Fix commands.
When a subject is repeated in a record, it will be written on one line joined by a comma:
subject1
subject2, subject3
subject4
In the example over record 1 contained 'subject1', record 2 'subject2' and 'subject3' and
record 3 'subject4'. What should we use when we want a list of all values in a long list?
In the example below we'll print all ISBN numbers in a batch of MARC records in one long list
using the Text exporter:
$ catmandu convert MARC to Text --field_sep "\n" --fix 'marc_map(020a,isbn.\$append); retain(isbn)' < data.mrc
The first new thing is the C<$append> in the marc_map. This will create in C<isbn> a
list of all ISBN numbers found in the C<020a> field. Because C<$> signs have a special meaning on
the command line they need to be escaped with a backslash C<\>. The C<Text> exporter with the C<field_sep>
option will make use all the list in the C<isbn> field are written on a new line.
=head2 Create a list of all unique ISBN numbers in the data
Given the result of the previous command, it is now easy to create a unique list of ISBN numbers
with the UNIX C<uniq> command:
$ catmandu convert MARC to Text --field_sep "\n" --fix 'marc_map(020a,isbn.\$append); retain(isbn)' < data.mrc | uniq
=head2 Create a list of the number of subjects per record
We will create a list of subjects (650a) and count the number of items
in this list for each record. The CSV file will contain the C<_id> (record
identifier) and C<subject> the number of 650a fields.
Writing all Fixes on the command line can become tedious. In Catmandu it is possible
to create a Fix script which contains all the Fix commands.
Open a text editor and create the C<myfix.fix> file with content:
marc_map(650a,subject.$append)
count(subject)
retain(_id, subject)
And execute the command:
$ catmandu convert MARC to CSV --fix myfix.fix < data.mrc
=head2 Create a list of all ISBN numbers for records with type 920a == book
In the example we need an extra condition for match the content of the
920a field against the string C<book>.
Open a text editor and create the C<myfix.fix> file with content:
marc_map(020a,isbn.$append)
marc_map(920a,type)
select all_match(type,"book") # select only the books
select exists(isbn) # select only the records with ISBN numbers
retain(isbn) # only keep this field
All the text after the C<#> sign are inline code comments.
And run the command:
$ catmandu convert MARC to Text --field_sep "\n" --fix myfix.fix < data.mrc
=head2 Show which MARC record don't contain a 900a field matching some list of values
First we need to create a list of keys that need to be matched against our MARC records.
In the example below we create a CSV file with a C<key> , C<value>
header and all the keys that are OK:
$ cat mylist.txt
key,value
book,OK
article,OK
journal,OK
Next we create a Fix script that maps the MARC 900a field to a field called
C<type>. This C<type> field we lookup in the C<mylist.txt> file. If a match
is found, then the C<type> field will contain the value in the list (OK). When
no match is found then the C<type> will contain the original value. We reject
all records that have OK as C<type> and keep only the ones that weren't matched
in the file.
Open a text editor and create the C<myfix.fix> file with content:
marc_map(900a,type)
lookup(type,'/tmp/mylist.txt')
reject all_match(type,OK)
retain(_id,type)
And now run the command:
$ catmandu convert MARC to CSV --fix myfix.fix < data.mrc
=head1 Create a CSV file of all ISSN numbers found at any MARC field
To process this information we need to create a Fix script like the
one below (line numbers are added here to explain the working of this script
but don't need to be included in the script):
01: marc_map('***',text.$append)
02:
03: filter(text,'(\b\d{4}-?\d{3}[\dxX]\b)')
04: replace_all(text.*,'.*(\b\d{4}-?\d{3}[\dxX]\b).*',$1)
05:
06: do list(path:text)
07: unless is_valid_issn(.)
08: reject()
09: end
10: end
11:
12: vacuum()
13:
14: select exists(text)
15:
16: join_field(text,' ; ')
17:
18: retain(_id,text)
On line 01 all the text in the MARC record is mapped into a C<text> array.
On line 03 we filter out this array all the lines that contain an ISSN string
using a regular expression.
On line 04 the C<replace_all> is used to delete everything in the C<text>
array that isn't an ISSN number.
On line 06-10 we go over every ISSN string and check if it has a valid checksum
and erase it when not.
On line 12 we use the C<vacuum> function to remove any remaining empty fields
On line 14 we select only the records that contain a valid ISSN number
On line 16 the ISSN get joined by a semicolon ';' into a long string
On line 18 we keep only the record id and the ISSNs in for the report.
Run this Fix script (without the line number) using this command
$ catmandu convert MARC to CSV --fix myfix.fix < data.mrc
=head2 Create a MARC validator
For this example we need a Fix script that contains validation rules we need to
check. For instance, we require to have a 245 field and at least a 008 control
field with a date filled in. This can be coded as in:
# Check if a 245 field is present
unless marc_has('245')
log("no 245 field",level:ERROR)
end
# Check if there is more than one 245 field
if marc_has_many('245')
log("more than one 245 field?",level:ERROR)
end
# Check if in 008 position 7 to 10 contains a 4 digit number ('\d' means digit)
unless marc_match('008/07-10','\d{4}')
log("no 4-digit year in 008 position 7 -> 10",level:ERROR)
end
Put this Fix script in a file C<myfix.fix> and execute the Catmandu command
with the "-D" option for logging and the Null exporter to discard the normal
output
$ catmandu -D convert MARC to Null --fix myfix.fix < data.mrc
=head1 TRANSFORMING
=head2 Add a new MARC field
In the example bellow we add new 856 field to the record with a $u subfield containing
the Google homepage:
marc_add(856,u,"http://www.google.com")
A control field can be added by using the '_' subfield
marc_add(009,_,0123456789)
Maybe you want to copy the data from one subfield to another. Use the marc_map to
store the data first in a temporary field and add it later to the new field:
# copy a subfield
marc_map(001,tmp)
# maybe process the data a bit
append(tmp,"-mytest")
# add the contents of the tmp field to the new 009 field
marc_add(009,_,$.tmp)
=head2 Set a MARC subfield
Set the $h subfield to a new value (or create it when it doesn't exist yet):
marc_set(100h, test123)
Only set the 100 field if the first indicator is 3
marc_set(100[3]h, test123)
=head2 Remove a MARC (sub)field
Remove all fields 500 , 501 , 5** :
marc_remove(5**)
Remove all 245h fields:
marc_remove(245h)
=head2 Append text to a MARC field
Append a period to the 500 field is there isn't already there:
do marc_each()
unless marc_match(500, "\.$") # Only if the current field 500 doesn't end with a period
marc_append(500,".") # Add to the current 500 field a period
end
end
Use the L<Catmandu::Fix::Bind::marc_each> Bind to loop over all MARC fields. In the
context of the C<do -- end> only one MARC field at a time is visible for the C<marc_*> fixes.
=head2 The marc_each binder
All C<marc_*> fixes will operate on all MARC fields matching a MARC path. For example,
marc_remove(856)
will remove all 856 MARC fields. In some cases you may want to change only some of the fields
in a record. You could write:
if marc_match(856u,"google")
marc_remove(856)
end
in the hope it would remove the 856 fields that contain the text "google" in the $u subfield.
Alas, this is not what will happen. The C<if> condition will match when the record contains one or
more 856u fields containing "google". The C<marc_remove> Fix will delete B<all> 856 fields. To
correctly remove only the 856 fields in the context of the C<if> statement the C<marc_each> binder
is required:
do marc_each()
if marc_match(856u,"google")
marc_remove(856)
end
end
The C<marc_each> will loop over all MARC fields one at a time. The if statement will only match when
the current MARC field is 856 and the $u field contains "google". The C<marc_remove(856)> will only
delete the current 856 field.
In C<marc_each> binder, it seems for all Fixes as if there is only one field at a time visible in the record.
This Fix will not work:
do marc_each()
if marc_match(856u,"google")
marc_remove(900) # <-- there is only a 856 field in the current context
end
end
=head2 marc_copy, marc_cut and marc_paste
The L<Catmandu::Fix::marc_copy>, L<Catmandu::Fix::marc_cut>, L<Catmandu::Fix::marc_paste> Fixes
are needed when complicated edits are needed in MARC record.
The C<marc_copy> fill copy parts of a MARC record matching a MARC_PATH to a temporary variable.
This tempoarary variable will contain an ARRAY of HASHes containing the content of the MARC field.
For instance,
marc_copy(650, tmp)
The C<tmp> will contain something like:
tmp:[
{
"subfields" : [
{
"a" : "Perl (Computer program language)"
}
],
"ind1" : " ",
"ind2" : "0",
"tag" : "650"
},
{
"ind1" : " ",
"subfields" : [
{
"a" : "Web servers."
}
],
"tag" : "650",
"ind2" : "0"
}
]
This structure can be edited with all the Catmandu fixes. For instance you can set the first
indicator to '1':
set_field(tmp.*.ind1 , 1)
The JSON path C<tmp.*.ind1> will match all the first indicators. The JSON path
C<tmp.*.tag> will match all the MARC tags. The JSON path C<tmp.*.subfields.*.a> will
match all the $a subfields. For instance, to change all 'Perl' into 'Python' in the $a subfield
use this Fix:
replace_all(tmp.*.subfields.*.a,"Perl","Python")
When the fields need to be places back into the record the C<marc_paste> command can be used:
marc_paste(subjects)
This will add all 650 fields in the C<tmp> temporary variable at the B<end> of the record. You can
change the MARC fields in place using the C<march_each> binder:
do marc_each()
# Select only the 650 fields
if marc_has(650)
# Create a working copy
marc_copy(650,tmp)
# Change some fields
set_field(tmp.*.ind1 , 1)
# Paste the result back
marc_paste(tmp)
end
end
The C<marc_cut> Fix works like C<marc_copy> but will delete the matching MARC field from the record.
=head2 Rename MARC subfields
In the example below we rename each $1 subfield in the MARC record to $0 using
the L<Catmandu::Fix::marc_cut>, L<Catmandu::Fix::marc_paste> and L<Catmandu::Fix::rename>
fixes:
# For each marc field...
do marc_each()
# Cut the field into tmp..
marc_cut(***,tmp)
# Rename every 1 subfield to 0
rename(tmp.*.subfields.*,1,0)
# And paste it back
marc_paste(tmp)
end
The C<marc_each> bind will loop over all the MARC fields. With C<marc_cut> we
store any field (C<***> matches every field) into a C<tmp> field. The C<marc_cut>
creates an array structure in C<tmp> which is easy to process using the Fix
language. Using the C<rename> function we search for all the subfields, and replace
the field matching the regular expression C<1> with C<0>. At the end, we paste
back the C<tmp> field into the record.
=head1 WRITING
=head2 Convert a MARC record into a MARC record (do nothing)
$ catmandu convert MARC to MARC < data.mrc > output.mrc
=head2 Add a 920a field with value 'checked' to all records
$ catmandu convert MARC to MARC --fix 'marc_add("900",a,"checked")' < data.mrc > output.mrc
=head2 Delete the 024 fields from all MARC records
$ catmandu convert MARC to MARC --fix 'marc_remove("024")' < data.mrc > output.mrc
=head2 Set the 650p field to 'test' for all records
$ catmandu convert MARC to MARC --fix 'marc_add("650p","test")' < data.mrc > output.mrc
=head2 Select only the records with 900a == book
$ catmandu convert MARC to MARC --fix 'marc_map(900a,type); select all_match(type,book)' < data.mrc > output.mrc
The C<all_match> also allows a regular expressions:
$ catmandu convert MARC to MARC --fix 'marc_map(900a,type); select all_match(type,"[Bb]ook")' < data.mrc > output.mrc
=head2 Select only the rcords with 900a values in a given CSV file
Create a CSV file with name,value pairs (need two columns):
$ cat values.csv
name,values
book,1
journal,1
movie,1
$ catmandu convert MARC to MARC --fix myfixes.txt < data.mrc > output.mrc
with myfixes.txt like:
do marc_each()
marc_map(900a,test)
lookup(test,values.csv,default:0)
select all_match(test,1)
remove_field(test)
end
We use a "do marc_each() ... end" loop because 900a fields can be repeated. If a
MARC tag isn't repeatable this loop not isn't needed. With marc_map we copy
first the value of a marc subfield to a 'test' field. This test we lookup against
the CSV file. Then, we select only the records that are found in the CSV file
(and return the correct value).