The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

sasbactrl.pl - command line interface to SeeAlso::Source::BeaconAggregator and auxiliary classes

SYNOPSIS

DESCRIPTION

This Module allows a collection of BEACON files (cf. http://de.wikipedia.org/wiki/Wikipedia:BEACON) to be used as SeeAlso::Source (probably in the context of an SeeAlso::Server application). Therefore it implements the four methods documented in SeeAlso::Source

The BEACON files (lists of non-local identifiers of a certain type documenting the coverage of a given online database plus means for access) are imported by the methods provided by SeeAlso::Source::BeaconAggregator::Maintenance.pm, usually by employing the script sasbactrl.pl as command line client.

Serving other formats than SeeAlso or providing a BEACON file with respect to this SeeAlso service is achieved by using SeeAlso::Source::BeaconAggregator::Publisher.

USAGE

Use the new() method inherited from SeeAlso::Source::BeaconAggregator to access an existing database or create a new one.

Database Methods

init( [ %options] )

Sets up and initializes the database structure for the object. This has to be done once after creating a new database and after upgrading this module.

Valid options include:

verbose
prepareRedirs
identifierClass

The repos table contains as columns all valid beacon fields plus the following administrative fields which have to be prefixed with "_" in the interface:

seqno

Sequence number: Is incremented on any successfull load

alias

Unique key: On update older seqences with the same alias are automatically discarded. Most methods take an alias as argument thus obliterating the need to determine the sequence number.

sort

optional sort key

uri

Overrides the #FEED header for updates

ruri

Real uri from which the last instance was loaded

ftime

Fetch time: Timestamp as to when this instance was loaded

Clear this or mtime to force automatic reload.

fstat

Short statistics line of last successful reload on update.

mtime

Modification time: Timestamp of the file / HTTP object from which this instance was loaded. Identical to ftime if no timestamp is provided

Clear this or ftime to force automatic reload on update.

utime

Timestamp of last update attempt

ustat

Short status line of last update attempt.

counti

Identifier count

countu

Unique identifier count

admin

Just to store some remarks.

The beacons table stores the individual beacon entries from the input files. Its columns are:

hash
 Identifier. If a (subclass of) C<SeeAlso::Source::Identifier> instance is provided,
 this will be transformed by the C<hash()> method.
seqno
 Sequence number of the beacon file in the database
altid
 optional identifier from an alternative identifier system for use
 with ALTTARGET templates.
hits
 optional number of hits for this identifier in the given resource
info
 optional information text
 optional explicit URL   

The osd table contains key, val pairs for various metadata concerning the collection as such, notably the values needed for the Open Search Description and the Header fields needed in case of publishing a beacon file for this collection.

The admin table stores (unique) key, val pairs for general persistent data. Currently the following keys are defined:

DATA_VERSION

Integer version number to migrate database layout.

IDENTIFIER_CLASS

Name of the Identifier class to be used.

REDIRECTION_INDEX

Control creation of an additional index for the altid column (facialiates reverse lookups as needed for clustering).

deflate()

Maintenance action: performs VACCUUM, REINDEX and ANALYZE on the database

Handling of beacon files

loadFile ( $file, $fields, %options )

Reads a physical beacon file and stores it with a new Sequence number in the database.

Returns a triple:

 my ($seqno, $rec_ok, $message) = loadFile ( $file, $fields, %options ) 

$seqno is undef on error

$seqno and $rec_ok are zero with $message containing an explanation in case of no action taken.

$seqno is an positive integer if something was loaded: The "Sequence Number" (internal unique identifier) for the representation of the beacon file in the database.

$file

File to read: Must be a beacon file

$fields

Hashref with additional meta and admin fields to store

Supported options:
 verbose => (0|1)
 force => (0|1)   process unconditionally without timestamp comparison
 nostat => (0|1)  don't refresh global identifier counters

If the file does not contain a minimal correct header (eg. is an empty file or an HTML error page accidentaly caught) no action is performed.

Otherwise, a fresh SeqNo (sequence number) is generated and meta and BEACON-Lines are stored in the appropriate tables in the database.

If the _alias field is provided, existing database entries for this Alias are updated, identifiers not accounted for any more are eventually discarded.

processbeaconheader($self, $fieldref, [ %options] )

Internal subroutine used by loadFile().

$fieldref

Hash with raw fields.

Supported options:
 verbose => (0|1)

Show seqnos of old instances which are met by the alias

update ($sq_or_alias, $params, %options)

Loads a beacon file into the database, possibly replacing a previous instance.

Some magic is employed to autoconvert ISO-8859-1 or doubly UTF-8 encoded files back to UTF-8.

Returns undef, if something goes wrong, or the file was not modified since, otherwise returns a pair (new seqence number, number of lines imported).

$sq_or_alias

Sequence number or alias: Used to determine an existing instance.

$params

Hashref, containing

  agent => LWP::UserAgent to use
  _uri  => Feed URL to load from
%options

Hash, propagated to loadFile()

 verbose => (0|1)
 force => (0|1)   process unconditionally without timestamp comparison
 nostat => (0|1)  don't refresh global identifier counters

Incorporates a new beacon source from a URI in the database or updates an existing one. For HTTP URIs care is taken not to reload an unmodified BEACON feed (unless the 'force' option is provided).

If the feed appears to be newer than the previously loaded version it is fetched, some UTF-8 adjustments are performed if necessary, then it is stored to a temporary file and from there finally processed by the loadFile() method above.

The URI to load is determined by the following order of precedence:

  1. _uri Option

  2. admin field uri stored in the database

  3. meta field #FEED taken from the database

Typical use is with an alias, not with a sequence number:

 $db->update('whatever');

Can be used to initially load beacon files from URIs:

 $db->update("new_alias", {_uri => $file_uri} );

unload ( [ $seqno_or_alias, %options ] )

Deletes the sequence(s).

$seqno_or_alias
 numeric sequence number, Alias or SQL pattern.
Supported options:
 force => (0|1)

Needed to purge the complete database ($seqno_or_alias empty) or to purge more than one sequence ($seqno_or_alias yields more than one seqno).

purge ( $seqno_or_alias[, %options ] )

Deletes all identifiers from the database to the given pattern, but leaves the stored header information intact, such that it can be updated automatically.

$seqno_or_alias
  Pattern
Supported options:
 force => (0|1)

Allow purging of more than one sequence.

Methods for headers

($rows, @oldvalues) = headerfield ( $sq_or_alias, $key [, $value] )

Gets or sets an meta or admin Entry for the constituent file indicated by $sq_or_alias

($resultref, $metaref) = headers ( [ $seqno_or_alias ] )

Iterates over all

For each iteration returns two hash references:

1 all official beacon fields
2 all administrative fields (_alias, ...)

listCollections ( [ $seqno_or_alias ] )

Iterates over all Sequences and returns on each call an array of

  Seqno, Alias, Uri, Modification time, Identifier Count and Unique identifier count

Returns undef if done.

Statistics

idStat ( [ $seqno_or_alias, %options ] )

Count identifiers for the given pattern.

Supported options:
 distinct => (0|1)

Count multiple occurences only once

 verbose => (0|1)

idCounts ( [ $pattern, %options ] )

Iterates through the entries according to the optional id filter expression.

For each iteration the call returns a triple consisting of (identifier, number of rows, and sum of all individual counts).

Supported options:
 distinct => (0|1)

Count multiple occurences in one beacon file only once.

idList ( [ $pattern ] )

Iterates through the entries according to the optional selection.

For each iteration the call returns a tuple consisting of identifier and an list of array references (Seqno, Hits, Info, explicit Link, AltId) or the emtpy list if finished.

Hits, Info, Link and AltId are normalized to the empty string if undefined (or < 2 for hits).

It is important to finish all iterations before calling this method for "new" arguments:

 1 while $db->idList();  # flush pending results

Manipulation of global metadata: Open Search Description

setOSD ( $field, @values }

Sets the field $field of the OpenSearchDescription to @value(s).

clearOSD ( $field }

Clears the field $field of the OpenSearchDescription.

addOSD ( $field, @values }

Adds more @value(s) as (repeatable) field $field of the OpenSearchDescription.

Manipulation of global metadata: Beacon Metadata

These headers are used when you will be publishing a beacon file for the collection.

setBeaconMeta ( $field, $value )

Sets the field $field of the Beacon meta table (used to generate a BEACON file for this service) to $value.

clearBeaconMeta ( $field }

Deletes the field $field of the Beacon meta table.

addBeaconMeta ( $field, $value )

Appends $value to the field $field of the BEACON meta table

admin ( [$field, [$value]] )

Manipulates the admin table.

Yields a hashref to the admin table if called without arguments.

If called with $field, returns the current value, and sets the table entry to $value if defined.

AUTHOR

    Thomas Berger
    CPAN ID: THB
    gymel.com
    THB@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.