parsepica - fetch, parse and transform PICA+ data
parsepica [options] [input file(s) or SRU-Server(s) and queries(s)]
This script provides a simple command line client to fetch and transform PICA+ records.
You can parse and transform local files (compressed
.gz files can directly be read) or query records from a server via various protocols.
You can also specify a configuration file for PICA::Source which includes a pointer to an SRU,
or unAPI source.
The records can then be written to a file or STDOUT in PICA+ or PICA/XML format. Instead of writing full records you can select single PICA+ fields. Selecting fields with parsepica is around half as fast as using grep, but grep does not really parse and check for wellformedness.
By default input is read from STDIN and written to STDOUT ('-') without logging. On request logging information is printed to STDOUT or to a specified logfile. Records that cannot be parseded produce error messages to STDERR.
-input FILE file with input files on each line ('-': STDIN) -files FILE read input files from another file ('-': STDIN) -output FILE print all valid records to a given file ('-': STDOUT) -xml [FILE] print records in XML -pxml [FILE] print records in pretty XML (with linebreaks) -pretty [FILE] print records in pretty format -null supress record output -quiet supress logging -select FIELD select a specific field or subfield (not if XML output) -count print simple statistics -stats 0|1|2 print full statistics (1: fields, 2: subfields) -config FILE read configuration from a file ('-': search default file) -auto use default config file $PICASOURCE or ./pica.conf -log [FILE] print logging to a given file ('-': STDOUT, default) -help brief help message -limit N limit the result set to N records (only for SRU) -man full documentation with examples
Read from 'file1' and print parseable records to 'file2'
Parse from 'file1' and pretty print XML format to 'file2.xml'.
Get records with ISBN 3-423-31039-1 via SRU.
Get records with ISBN 3-423-31039-1 via SRU if the default config file contains
Select all fields '021A' from 'picadata' and write to STDOUT.
Parse from 'file1' and count fileds
Parse from 'file1' and print detailed statistics
Error handling for broken records is not fully implemented. If you want to parse PICA+ records downloaded via WinIBW, you may need to first clean them with the script winibw2pica.
The limit parameter should also be implemented for other sources but SRU and an offset parameter would be useful. Fetching records via other protocols but SRU has not been tested. The statistics method can be improved a lot.
Jakob Voß <firstname.lastname@example.org>
This software is copyright (c) 2012 by Verbundzentrale Goettingen (VZG) and Jakob Voss.