Datahub::Factory::Command::transport - Implements the 'transport' command.
This command allows datamanagers to (a) fetch data from a (local) source (b) transform the data to LIDO using a fix (c) upload the LIDO transformed data to a Datahub instance.
--pipeline
Location of the pipeline configuration file.
--general
Location of the general configuration file.
--importer
Location of the importer configuration file.
--fixer
Location of the fixer configuration file.
--exporter
Location of the exporter configuration file.
--verbose
Set this flag for pretty output of the ETL processing.
The pipeline configuration file is in the INI format and its location is provided to the application using the --pipeline switch.
The file is broadly divided in two parts: the first (shortest) part configures the pipeline itself and sets the plugins to use for the import, fix and export actions. The second part sets options specific for the used plugins.
This part has three sections: [Importer], [Fixer] and [Exporter]. Every section has just one option: plugin. Set this to the plugin you want to use for every action.
[Importer]
[Fixer]
[Exporter]
plugin
All current supported plugins are in the Importer and Exporter folders. For the [Fixer], only the Fix plugin is supported.
Importer
Exporter
Supported Importer plugins:
Supported Exporter plugins:
[Importer] plugin = OAI id_path = 'lidoRecID.0._' [plugin_importer_OAI] endpoint = https://oai.my.museum/oai [Fixer] plugin = Fix [plugin_fixer_Fix] file_name = '/home/datahub/my.fix' [Exporter] plugin = YAML [plugin_exporter_YAML]
All plugins have their own configuration options in sections called [plugin_type_name] where type can be importer, exporter or fixer and name is the name of the plugin.
[plugin_type_name]
type
name
All plugins define their own options as parameters to the respective plugin. All possible parameters are valid items in the configuration section.
If a plugin requires no options, you still need to create the (empty) configuration section (e.g. [plugin_exporter_LIDO] in the above example).
[plugin_exporter_LIDO]
The id_path option contains the path (in Fix syntax) of the identifier of each record in your data after the fix has been applied, but before it is submitted to the Exporter. It is used for reporting and logging.
id_path
[plugin_fixer_Fix] condition = record.institution_name fixers = FOO, BAR [plugin_fixer_Fix] file_name = /home/datahub/my.fix
The [plugin_fixer_Fix] can directly load a fix file (via the option file_name) or can be configured to conditionally load a different fix file to support multiple fix files for the same data stream (e.g. when two institutions with different data models use the same API endpoint). This is done by setting the condition and fixers options.
[plugin_fixer_Fix]
file_name
condition
fixers
[plugin_fixer_Fix] condition = record.institution_name fixers = FOO, BAR [plugin_fixer_FOO] condition = 'Museum of Foo' file_name = '/home/datahub/foo.fix' [plugin_fixer_BAR] condition = 'Museum of Bar' file_name = '/home/datahub/bar.fix'
If you want to separate the data stream into multiple (smaller) streams with a different fix file for each stream, you can do this by setting the appropriate options in the [plugin_fixer_Fix] block. Note that id_path is still mandatory.
Set condition to the Fix-compatible path in the original stream that holds the condition you want to use to split the stream.
Provide a comma-separated list of fixer plugins in fixers.
For every fixer plugin in fixers, create a configuration block called [plugin_fixer_name] and provide the following options:
[plugin_fixer_name]
The value that the condition from [plugin_fixer_Fix] must have for the record to belong to this block.
The location of the fix file that must be executed for every record in this block.
[Importer] plugin = Adlib id_path = 'record.id' [Fixer] plugin = Fix [Exporter] plugin = Datahub [plugin_importer_Adlib] file_name = '/tmp/adlib.xml' data_path = 'recordList.record.*' [plugin_fixer_Fix] file_name = '/tmp/msk.fix' [plugin_exporter_Datahub] datahub_url = https://my.thedatahub.io datahub_format = LIDO oauth_client_id = datahub oauth_client_secret = datahub oauth_username = datahub oauth_password = datahub
Matthias Vandermaesen <matthias.vandermaesen@vlaamsekunstcollectie.be> Pieter De Praetere <pieter@packed.be>
Copyright 2016 - PACKED vzw, Vlaamse Kunstcollectie vzw
This library is free software; you can redistribute it and/or modify it under the terms of the GPLv3.
To install Datahub::Factory, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Datahub::Factory
CPAN shell
perl -MCPAN -e shell install Datahub::Factory
For more information on module installation, please visit the detailed CPAN module installation guide.