The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Data::Schema::Manual::Schema

VERSION

version 0.136

OVERVIEW

This document is explains the syntax of Data::Schema schema.

NAME

Data::Schema::Manual::Schema - Data::Schema schema reference

FORMS

Data::Schema schema is just a normal data structure: Perl scalars, arrays, and hashes.

There are three forms of schema. These different forms are supported for the convenience of schema writers. Internally all schemas and subschemas will be converted ("normalized") to the third form (HASH).

First Form (SCALAR)

 TYPE

The simplest form of schema is just a scalar (string) specifying type name. This states that the data must be of specified type

With this first form you cannot add any other value restrictions or anything else, so this form is very limited.

Example:

 "int"

The schema says that data must be an integer. Examples of valid data:

 5
 -2

Example of invalid data:

 "int"  # not an integer, but a string
 [1]    # not an integer, an array
 {}     # not an integer, an empty hash

TYPE can also be the name of another schema. For example if you already have defined a schema with name 'short_array' with this definition:

 [array => {maxlen: 10}]

Then you can also have a schema that says just:

 short_array

and it will also mean that the data must satisfy the 'short_array' schema.

Second Form (ARRAY)

 [TYPE, ATTRHASH, ATTRHASH, ...]

The second form is the array form. The first element of the array is required, the type name (or schema name). The rest is a list of attribute hashes, and is optional.

The first form is actually equivalent to this second form:

 [TYPE]

in which no attribute hashes are specified.

Attribute hash is a mapping of attribute names and values. This further limits the range of data values possible. Each type has its own set of known attributes, for example all numeric types (like int and float) has the min, max, et al. Most types have a one_of attribute to limit values to the list of values we specify, etc.

For type validation to succeed, the type requirement *as well as* the requirements of all attributes (from all attribute hashes) must be satisfied.

For more details on attribute hashes, see ATTRHASH section below.

Example:

 [str => {one_of => [qw/A B O AB/]}]

This schema states that data must be a string, and it must either be "A", "B", "O", or "AB". Examples of invalid data:

 []    # does not satisfy type requirement, not a string
 "C"   # a string value, but does not satisfy the one_of attribute

Another example:

 ["int", {min=>0, divisible_by=>2}, {divisible_by=>3}]

The schema effectively says that the data must be positive and divisible by 6 (since it must be divisible by 2 AND 3). Examples of valid data:

 6
 12

Examples of invalid data:

 -6      # an int, satisfies all divisible_by attributes, but not the min

If you specify a schema name as the first element, then the attributes will be of the base type of the schema. Example:

 # schema with name = 'even'
 [int => {divisible_by=>2}]

 # our schema
 [even => {min=>20}]

Our schema in effect says that the data must be an even number greater or equal than 20. Since our schema is based on the even schema, the attributes we can specify is that of the int type, since even is defined as an int.

Third Form (HASH)

 {type=>TYPE OR SCHEMA,
  attrs=>ATTRHASH, attr_hashes=>[ATTRHASH, ...],
  def=>SCHEMADEFS,
  ...}

The third form (HASH) is the most complete form where you can specify everything. The type key is required, while the rest are optional.

The second form is equivalent to this third form:

 {type=>TYPE, attr_hashes=>[ATTRHASH, ...]}

where nothing but type name and attribute hashes are specified.

The first form is equivalent to this third form:

 {type=>TYPE}

where nothing but type name is specified.

You can specify attribute hashes in attr_hashes key, or if you want to specify just one attribute hash, you can use the attrs key. If they are both present, attribute hashes from both will be used.

This third form allows us to define other schemas inside our schema, using the def keys, which must be a hashref of schema name and definition. This is a way to break down or organize a complex schema into several pieces.

Example:

 {
  def => {
      single_dice_throw    => [int => {one_of => [1,2,3,4,5,6]}],
      sdt                  => "single_dice_throw", # short notation
      dice_pair_throw      => [array => {len=>2, elems=>["sdt", "sdt"]}],
      dpt                  => "dice_pair_throw",   # short notation
      throw                => [either => {of => ["sdt", "dpt"]}],
      throws               => [array => {of => "throw"}],
  },
  type => "throws"
 }

This schema specifies that we are accepting a list of dice throws (throws). Each throw can be a single dice throw (sdt) which is a number between 1 and 6, OR a throw of two dices (dpt) which is a 2-element array (where each element is a number between 1 and 6).

Examples of valid data:

 [1, [1,3], 6, 4, 2, [3,5]]

Examples of invalid data:

 [1, [2, 3], 0]     # the third throw is invalid
 [1, [2,0,4], 4, 5] # the second throw (a dice pair throw) is invalid

TYPE

Data::Schema comes with several types out of the box, for example: bool, int, float, str, array, hash, etc.

Each type is handled by a type handler, which is a Perl module.

For more details on each type, refer to its handler module documentation. For example, for hash type, see Data::Schema::Type::Hash.

You can write your own type handler. For more information on how to write a type handler, see Data::Schema::Manual::TypeHandler.

ATTRHASH

An attribute hash is a mapping of attribute names and values:

 {
   "PrefixNameSuffix" => value,
   ...
 }

Example:

 {
   ^min => 0,
   "min:errmsg" => "Only positive numbers are accepted",
   max => 100,
 }

Each type has its own set of known attribute names. To see what attributes a type supports, see type handler module documentation. For example, for hash type, see Data::Schema::Type::Hash.

A schema can specify more than one attribute hashes, in which each attribute hash will be evaluated in order. However, if a key on one attribute hash contains a prefix (see Attribute prefix section below), merging will occur (see Merging of attribute hashes section below).

Attribute name

Attribute names must begin with letter/underscore and contain letters/numbers/underscores only. All attributes which begin with an underscore will be ignored.

Attribute prefix

Attribute prefix is one of these characters:

 + - . ! *

prepended to the attribute name.

These will affect merging behaviour of attribute hashes.

The first attribute hash in the schema is not allowed to have attribute prefixes on its keys.

Attribute suffix

Attribute suffix is the colon character (":") followed by one of these:

 err warn
 errmsg warnmsg
 comment note

They give additional information/instruction associated with the attribute. They are not necessarily passed to the type attribute handler sub (handle_attr_ATTRNAME()) of the type handler but can be useful only to the validator.

Validation will fail if an unknown suffix is specified.

err

When attribute checking fails, raise an error. This is the default behaviour.

warn

When attribute checking fails, raise a warning instead of an error. You can use this to add warnings which do not make the validation fail.

 [str=>{minlen        =>4,
        minlen:warn   =>8}]

In the above example, validation fails when data is shorter than 4 characters. When the data is between 4-7 characters, validation succeeds with a warning.

errmsg

This attribute suffix is used to supply custom error message. For example:

 [str=>{regex=>'^\w{4,8}$',
        regex:errmsg=>'4-8 alphanumeric characters only!'}]

When validating the regex attribute fails, instead of the default error message from type handler, validator will use the custom error message giving clearer information to the user.

Note: if gettext_function configuration is set, this message will be passed to the function first before being returned. See Data::Schema::Config for more on configuration.

warnmsg

Just like errmsg but for warnings. For example:

 [str=>{minlen        =>4,
        minlen:warn   =>8,
        minlen:warnmsg=>'password shorter than 8 letters is ok, but not recommended'}]

comment and note

They will be ignored during validation. You can use it to document your schema if you want.

 [hash,
  { ":comment" => "this schema validates event record",
    required_keys => [qw/time place parties/],
    "required_keys:note" => "at least date, time, place, and participants must be specified",
    keys => {
      date => [datetime, {set=>1}],
      ...
    },
  }]

Merging of attribute hashes

Given several attribute hashes in the schema like:

 [TYPE, AH1, AH2, AH3]

all AH1, AH2, and AH3 will be evaluated in that order:

 eval(AH1)
 eval(AH2)
 eval(AH3)

However, if AH2 keys contain prefixes, AH1 will be merged with AH2 first before evaluated. (Illustration: "*" notation indicates the presence of merge prefix and "|" notation indicates merging).

 eval(AH1|*AH2)
 eval(AH3)

If AH3 instead contains merge prefixes then AH1 will be evaluated, and then AH2 is merged first with AH3:

 eval(AH1)
 eval(AH2|*AH3)

If AH2 as well as AH3 contains merge prefixes, then the three will be merged first before evaluating:

 eval(AH1|*AH2|*AH3)

So in short, unless the right hand side is devoid of merge prefixes, merging will be done first from left to right.

Data::Schema uses Data::ModeMerge to do the merging. Data::ModeMerge style of merging allows keys on the left side to replace but also add, subtract, remove keys from the left side. This allows schema definition to add attributes (restrict types even more), or replace attributes (change type restriction) as well as delete attributes (relax type restriction).

Examples:

 [int => {divisible_by=>2}, {  divisible_by =>3}] # must be divisible by 2 & 3

 [int => {divisible_by=>2}, {'*divisible_by'=>3}] # will be merged and become:
 [int => {divisible_by=>3}                      ] # must be divisible by 3 ONLY

 [int => {divisible_by=>2}, {'!divisible_by'=>0}] # will be merged and become:
 [int => {}                                     ] # need not be divisible at all

 [int => {one_of=>[1,2,3,4,5]}, {  one_of =>[6]}] # impossible to satisfy

 [int => {one_of=>[1,2,3,4,5]}, {'+one_of'=>[6]}] # will be merged and become:
 [int => {one_of=>[1,2,3,4,5,6]}                ]

 [int => {one_of=>[1,2,3,4,5]}, {'-one_of'=>[4]}] # will be merged and become:
 [int => {one_of=>[1,2,3,  5]}                  ]

Refer to Data::ModeMerge for details on merging syntax and behaviour.

Merging and hash keys (and regexes)

If you use attribute hash merging, certain first character of hash keys, i.e. ^, +, -, ., and * will be treated as special and will be processed, and eventually removed, except the keep mode prefix ^. See Data::ModeMerge for more details on how merging works.

Since this is done recursively, you have to be careful when you have hash keys that might start with one of those characters. Particularly regex keys (e.g. when using type attributes like hash's keys_regex or array's element_regex) because regex often contains ^ at the beginning. One workaround is to use \A instead of ^.

NAMING SCHEMAS FOR USE IN OTHER SCHEMAS

Schemas can be defined for use in other schemas. Example:

 {
  def => {
      single_dice_throw    => [int => {one_of => [1,2,3,4,5,6]}],
      sdt                  => "single_dice_throw", # short notation
      dice_pair_throw      => [array => {len=>2, elems=>["sdt", "sdt"]}],
      dpt                  => "dice_pair_throw",   # short notation
      throw                => [or => {alts => ["sdt", "dpt"]}],
      throws               => [array => {of => "throw"}],
  },
  type => "throws"
 }

The above schema defines six other schemas (subschemas?). These subschemas will not be available outside of this schema.

Another way is by putting schemas in Perl hash or in YAML files and then loading them using DSP::LoadSchema::Hash or DSP::LoadSchema::YAMLFile.

When evaluating "schema types" (schema that is used as type), the schema type is expanded, and the resulting attribute hashes are merged when necessary. For example, if we have:

 [uint => {divisible_by => 2}]

where uint is defined as:

 [int => {min => 0}]

Then when evaluating the first schema, it will be expanded into:

 [int => {min => 0} => {divisible_by => 2}]

Another example:

 [special_provinces => { "+one_of" => ["DKI"] }]

where special_provinces is:

 [str => { one_of => ["Aceh", "Djogjakarta"] }]

will become:

 [str => { one_of => ["Aceh", "Djogjakarta"] } => { "+one_of" => ["DKI"] }]

when merged will become:

 [str => { one_of => ["Aceh", "Djogjakarta", "DKI"] }]

AUTHOR

  Steven Haryanto <stevenharyanto@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2009 by Steven Haryanto.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.