NAME
[![Build
Status](https://travis-ci.org/thedatahub/Datahub-Factory.svg?branch=mas
ter)](https://travis-ci.org/thedatahub/Datahub-Factory)
Datahub::Factory - A conveyor belt which transports data from a data
source to a data sink.
SYNOPSIS
dhconveyor [ARGUMENTS] [OPTIONS]
DESCRIPTION
Datahub::Factory is a command line conveyor belt which automates three
tasks:
Data is fetched automatically from a local or remote data source.
Data is converted to an exchange format.
The output is pushed to a data sink.
Datahub::Factory fetches data from several sources as specified by the
Importer settings, executes a Fix and sends it to a data sink, set by
Exporter. Several importer and exporter modules are supported.
Datahub::Factory contains Log4perl support to monitor conveyor belt
operations.
Note: This toolset is not a generic tool. It has been tailored towards
the functional requirements of the Flemish Art Collection use case.
CONFIGURATION
Datahub::Factory uses a general configuration file called settings.ini.
It can be located at /etc/datahub-factory/settings.ini or
conf/settings.ini. The one in /etc takes priority. An example file is
provided at conf/settings.example.ini
<https://github.com/thedatahub/Datahub-Factory/blob/master/conf/setting
s.example.ini>. It is in INI format
<http://search.cpan.org/~sherzodr/Config-Simple-4.59/Simple.pm#INI-FILE
>.
It has two parts, a [General] block that contains some generic options,
and (optionally) multiple module-specific blocks called
[module_Module_name]. For a list of module options, see the
documentation for every module.
Supported modules
PIDS
General options
log_level
Set the log_level. Takes a numeric parameter. Supported levels are: 1
(WARN), 2 (INFO), 3 (DEBUG). WARN (1) is the default.
Example
[General]
# 1 => WARN; 2 => INFO; 3 => DEBUG
log_level = 1
[module_PIDS]
username = username
api_key = api_key
COMMANDS
help COMMAND
Documentation about command line options.
It is possible to provide either all importer and/or exporter options
on the command line, or to create a pipeline configuration file that
sets those options.
transport [OPTIONS]
Fetch data from a local or remote source, convert it to an exchange
format and push the data to a Datahub instance.
Command line options
--importer NAME
The importer which fetches data from a Collection Registration
system. Currently only "Adlib" and "TMS" are supported options. All
--oimport arguments are tied to the specific importer used.
--fixes PATH
The path to the Catmandu Fix files to transform the data.
--exporter NAME
The exporter that will do something with your data. It is possible to
print to STDOUT in a specific format ("YAML" and "LIDO" are
supported) or to export to a Datahub instance. All --oexport
arguments are tied to the specific exporter used.
--oimport file_name=PATH
The path to a flat file containing data. This option is only relevant
when the input is an Adlib XML export file.
--oimport db_user=VALUE
The database user. This option is only relevant when the input is an
TMS database.
--oimport db_passowrd=VALUE
The database user password. This option is only relevant when the
input is an TMS database.
--oimport db_name=VALUE
The database name. This option is only relevant when the input is an
TMS database.
--oimport db_host=VALUE
The database host. This option is only relevant when the input is an
TMS database.
--oexport datahub_url=VALUE
The URL to the datahub instance. This should be a FQDN ie.
http://datahub.lan/
--oexport oauth_client_id=VALUE
The client public ID. Used for OAuth authentication of the Datahub
endpoint.
--oexport oauth_client_secret=VALUE
The client secret passphrase. Used for OAuth authentication of the
Datahub endpoint.
--oexport oauth_username=VALUE
The username of the Datahub user. Used for OAuth authentication of
the Datahub endpoint.
--oexport oauth_password=VALUE
The password of the Datahub user. Used for OAuth authentication of
the Datahub endpoint.
Pipeline configuration file
The pipeline configuration file is in the INI format
<http://search.cpan.org/~sherzodr/Config-Simple-4.59/Simple.pm#INI-FILE
> and its location is provided to the application using the --pipeline
switch.
The file is broadly divided in two parts: the first (shortest) part
configures the pipeline itself and sets the plugins to use for the
import, fix and export actions. The second part sets options specific
for the used plugins.
Pipeline configuration
This part has three sections: [Importer], [Fixer] and [Exporter]. Every
section has just one option: plugin. Set this to the plugin you want to
use for every action.
All current supported plugins are in the Importer and Exporter folders.
For the [Fixer], only the Fix plugin is supported.
Supported Importer plugins:
TMS
Adlib
KMSKA
MSK
VKC
Supported Exporter plugins:
Datahub
LIDO
YAML
Plugin configuration
All plugins have their own configuration options in sections called
[plugin_type_name] where type can be importer, exporter or fixer and
name is the name of the plugin (see above).
For a list of supported and required options, see the plugin
documentation.
Example configuration file
[Importer]
plugin = Adlib
[Fixer]
plugin = Fix
[Exporter]
plugin = Datahub
[plugin_importer_Adlib]
file_name = '/tmp/adlib.xml'
data_path = 'recordList.record.*'
[plugin_fixer_Fix]
fix_file = '/tmp/msk.fix'
[plugin_exporter_Datahub]
datahub_url = my.thedatahub.io
datahub_format = LIDO
oauth_client_id = datahub
oauth_client_secret = datahub
oauth_username = datahub
oauth_password = datahub
AUTHORS
Pieter De Praetere <pieter@packed.be>
Matthias Vandermaesen <matthias.vandermaesen@vlaamsekunstcollectie.be>
COPYRIGHT
Copyright 2016 - PACKED vzw, Vlaamse Kunstcollectie vzw
LICENSE
This library is free software; you can redistribute it and/or modify it
under the terms of the GPLv3.