John A. Kunze > File-Namaste > nam

Download:
File-Namaste-1.04.tar.gz

Annotate this POD

View/Report Bugs
Source  

NAME ^

nam - command to set, get, and remove Namaste tag files

SYNOPSIS ^

nam [options] add N string [[maxlen] ellipsis]
nam [options] set N string [[maxlen] ellipsis]
nam [options] get [N ...]
nam [options] rm [N ...]
nam [options] rmall
nam [options] elide string [[maxlen] ellipsis]

DESCRIPTION ^

The nam command manages Namaste (Name-as-text) tag files, which are useful for describing directories. A Namaste tag file holds a single metadata string and its name is a filesystem-safe derivation of that string. The name of the file consists of an integer N, an '=', and the derivative string.

For example, consider a large collection of publically downloadable digital objects, each one in a directory that looks something like

  $ ls
  m_abbyy.gz  m_djvu.txt   m_jp2.zip         m_meta.xml
  m_bw.pdf    m_djvu.xml   m_marc.xml        m_orig_jp2.tar
  m_dc.xml    m_files.xml  m_meta.mrc        m.pdf
  m.djvu      m.gif        m_metasource.xml  

The directory layout reveals little to someone not already familiar with this kind of digital object. But if Namaste tags were added, a visitor who asks for a directory listing could be greeted by this instead:

  $ ls
  0=oca_book_1.1          m_abbyy.gz  m_djvu.xml   m_meta.mrc
  1=Carmichael, Orton H.  m_bw.pdf    m_files.xml  m_metasource.xml
  2=Lincoln's Gettysbu..  m_dc.xml    m.gif        m_meta.xml
  3=1917                  m.djvu      m_jp2.zip    m_orig_jp2.tar
  4=ark:=13960=t50g49p5m  m_djvu.txt  m_marc.xml   m.pdf

In the first column of the listing, the filenames themselves contain abbreviated metadata designed to permit a human being (e.g., an end user or a system administrator) with no training in this collection or in bibliographic description to quickly form a mental picture and to start a discussion about it (e.g., when using the content for schoolroom instruction or when notifying the collection manager of a system exception).

The integers correspond to simple metadata, mostly as per Dublin Core Kernel and roughly as follows:

  0   dir_type   directory type (e.g., bagit_0.97)
  1    who       creator (or contributor or publisher)
  2    what      title (human-oriented name or identifier)
  3    when      date of creation or collection of content
  4    where     machine-oriented identifier

Setting the above Namaste tags was done with

  $ nam set 0 'oca_book_1.1' 20
  $ nam set 1 'Carmichael, Orton H.' 20
  $ nam set 2 "Lincoln's Gettysburg address" 20
  $ nam set 3 '1917' 20
  $ nam set 4 'ark:/13960/t50g49p5m' 20

Tranforming the given metadata values into tag filenames may involve converting unsafe characters (e.g., '/' becomes '=') and truncation. The optional "20" on the end of each command above specifies the maximum width of a created tag name. Any filename that would be longer will be truncated and the missing part replaced by an ellipsis, which could have been given as a final optional argument. The maximum length (default 16) can be adjusted according to the desired "greeting" experience, or given as 0 to prevent truncation. For example, changing the "20" to "16m" in all the above settings would leave more display space for proper files.

  $ ls
  0=oca_book_1.1      m_abbyy.gz  m_djvu.xml   m_meta.mrc
  1=Carmic...rton H.  m_bw.pdf    m_files.xml  m_metasource.xml
  2=Lincol...address  m_dc.xml    m.gif        m_meta.xml
  3=1917              m.djvu      m_jp2.zip    m_orig_jp2.tar
  4=ark:=1...0g49p5m  m_djvu.txt  m_marc.xml   m.pdf

In this case, the "m" in "16m" specifies truncation in the middle of the string, as opposed to "s" (start) or "e" (end, the default). The ellipsis normally defaults to "..", but for middle truncation it defaults to "...". Tags in the same directory can be created with different truncation policies. For example, some values carry more specific or more interesting information towards the end of the string, as with many identifiers, rather than the beginning. If tag 4 above had been created with "16s" truncation, the last line of the listing would look like

  4=..3960=t50g49p5m  m_djvu.txt  m_marc.xml   m.pdf

Additional tags corresponding to an existing tag number can be created with add (but all tags for a given number are replaced with set):

  $ nam add 1 Lennon
  $ nam add 1 McCartney

The verbatim metadata value (unabbreviated and not transformed to comply with filesystem naming rules) is stored as the content of the corresponding file, where it can be conveniently retrieved by tag number,

  $ nam get 2
  Lincoln's Gettysburg address

or all at once with

  $ nam get
  oca_book_1.1
  Carmichael, Orton H.
  Lincoln's Gettysburg address
  1917
  ark:/13960/t50g49p5m

A fully labeled record can be retrieved by specifying a format such as JSON, XML, or ANVL:

  $ name --format anvl get > Namaste.txt
  $ cat Namaste.txt
  dir_type: oca_book_1.1
  who: Carmichael, Orton H.
  what: Lincoln's Gettysburg address
  when: 1917
  where: ark:/13960/t50g49p5m

Tag files can always be removed with rm(1), but it is much more convenient to use nam with a tag number, as in,

  $ nam rm 3

or to remove them all at once with

  $ nam rmall

Use elide for raw access to the same general-purpose string ellision functionality as described above, but without any filesystem-safe character transformations. It involves no interaction with the filesystem at all.

  $ nam elide 'The question is this: why and/or how?' 24s '**'
  ** this: why and/or how?

Portability

In creating filesystem-safe derivations of metadata values, lossy transformations may occur. Since the primary beneficiaries of tag filenames are human, the default mapping for Unix systems tries to convert as few characters as possible. It converts '/' to '=', runs of newlines and other whitespace to a single SPACE, and control characters to '?'.

The default mapping for Windows systems is more lossy but more portable than that for Unix. Filenames created with it will remain unchanged when transferred between Windows and Unix systems. In addition to the above mappings, it converts the characters

    " * : < > ? \ |

to '.' (period). To request the more portable mapping explicitly, use the --portable option.

OPTIONS ^

-d directory, --directory directory

Use directory instead of the current directory to look for tag files.

-m format, --format format

Output in the given format, currently one of "ANVL", "XML", "JSON", or "Plain" (default).

--portable

Request the most portable transformation of metadata values into tag filenames.

-v, --verbose

Output ancillary information (the tag filename itself) as a comment.

-h, --help

Print extended help documentation.

--man

Print full documentation.

--version

Print the current version number and exit.

SEE ALSO ^

Directory Description with Namaste Tags https://confluence.ucop.edu/display/Curation/Namaste

Dublin Core Kernel Metadata https://confluence.ucop.edu/display/Curation/ERC

rm(1)

AUTHOR ^

John Kunze jak at ucop dot edu

COPYRIGHT ^

Copyright 2009-2010 UC Regents. Open source BSD license.

syntax highlighting: