The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Datahub::Factory::Command::transport - Implements the 'transport' command.

DESCRIPTION

This command allows datamanagers to (a) fetch data from a (local) source (b) transform the data to LIDO using a fix (c) upload the LIDO transformed data to a Datahub instance.

COMMAND LINE INTERFACE

--pipeline

Location of the pipeline configuration file.

--general

Location of the general configuration file.

--importer

Location of the importer configuration file.

--fixer

Location of the fixer configuration file.

--exporter

Location of the exporter configuration file.

--verbose

Set this flag for pretty output of the ETL processing.

Pipeline configuration file

The pipeline configuration file is in the INI format and its location is provided to the application using the --pipeline switch.

The file is broadly divided in two parts: the first (shortest) part configures the pipeline itself and sets the plugins to use for the import, fix and export actions. The second part sets options specific for the used plugins.

Pipeline configuration

This part has three sections: [Importer], [Fixer] and [Exporter]. Every section has just one option: plugin. Set this to the plugin you want to use for every action.

All current supported plugins are in the Importer and Exporter folders. For the [Fixer], only the Fix plugin is supported.

Supported Importer plugins:

TMS
Adlib
OAI

Supported Exporter plugins:

Datahub
LIDO
YAML

Plugin configuration

    [Importer]
    plugin = OAI
    id_path = 'lidoRecID.0._'

    [plugin_importer_OAI]
    endpoint = https://oai.my.museum/oai

    [Fixer]
    plugin = Fix

    [plugin_fixer_Fix]
    file_name = '/home/datahub/my.fix'

    [Exporter]
    plugin = YAML

    [plugin_exporter_YAML]

All plugins have their own configuration options in sections called [plugin_type_name] where type can be importer, exporter or fixer and name is the name of the plugin.

All plugins define their own options as parameters to the respective plugin. All possible parameters are valid items in the configuration section.

If a plugin requires no options, you still need to create the (empty) configuration section (e.g. [plugin_exporter_LIDO] in the above example).

Importer plugin

The id_path option contains the path (in Fix syntax) of the identifier of each record in your data after the fix has been applied, but before it is submitted to the Exporter. It is used for reporting and logging.

Fixer plugin

    [plugin_fixer_Fix]
    condition = record.institution_name
    fixers = FOO, BAR

    [plugin_fixer_Fix]
    file_name = /home/datahub/my.fix

The [plugin_fixer_Fix] can directly load a fix file (via the option file_name) or can be configured to conditionally load a different fix file to support multiple fix files for the same data stream (e.g. when two institutions with different data models use the same API endpoint). This is done by setting the condition and fixers options.

Conditional fixers

    [plugin_fixer_Fix]
    condition = record.institution_name
    fixers = FOO, BAR

    [plugin_fixer_FOO]
    condition = 'Museum of Foo'
    file_name = '/home/datahub/foo.fix'

    [plugin_fixer_BAR]
    condition = 'Museum of Bar'
    file_name = '/home/datahub/bar.fix'

If you want to separate the data stream into multiple (smaller) streams with a different fix file for each stream, you can do this by setting the appropriate options in the [plugin_fixer_Fix] block. Note that id_path is still mandatory.

Set condition to the Fix-compatible path in the original stream that holds the condition you want to use to split the stream.

Provide a comma-separated list of fixer plugins in fixers.

For every fixer plugin in fixers, create a configuration block called [plugin_fixer_name] and provide the following options:

condition

The value that the condition from [plugin_fixer_Fix] must have for the record to belong to this block.

file_name

The location of the fix file that must be executed for every record in this block.

Example configuration file

  [Importer]
  plugin = Adlib
  id_path = 'record.id'

  [Fixer]
  plugin = Fix

  [Exporter]
  plugin = Datahub

  [plugin_importer_Adlib]
  file_name = '/tmp/adlib.xml'
  data_path = 'recordList.record.*'

  [plugin_fixer_Fix]
  file_name = '/tmp/msk.fix'

  [plugin_exporter_Datahub]
  datahub_url = https://my.thedatahub.io
  datahub_format = LIDO
  oauth_client_id = datahub
  oauth_client_secret = datahub
  oauth_username = datahub
  oauth_password = datahub

AUTHORS

Matthias Vandermaesen <matthias.vandermaesen@vlaamsekunstcollectie.be> Pieter De Praetere <pieter@packed.be>

COPYRIGHT

Copyright 2016 - PACKED vzw, Vlaamse Kunstcollectie vzw

LICENSE

This library is free software; you can redistribute it and/or modify it under the terms of the GPLv3.