perlancar > Sah > Sah

Download:
Sah-0.9.31.tar.gz

Dependencies

Annotate this POD

Website

View/Report Bugs
Module Version: 0.9.31   Source  

NAME ^

Sah - Schema for data structures (specification)

VERSION ^

This document describes version 0.9.31 of Sah (from Perl distribution Sah), released on 2014-10-23.

SPECIFICATION VERSION ^

0.9

STATUS ^

In the 0.9.0 series, there will probably still be incompatible syntax changes between revision before the spec stabilizes into 1.0 series.

ABOUT ^

This document specifies Sah, a schema language for validating data structures.

In this document, schemas and data structures are mostly written in pseudo-JSON (JSON with comments // ..., ellipsis ..., or some JavaScript).

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

SCHEMA ^

Although it can contain extra stuffs, a schema is essentially a type definition, stating a set of valid values for data.

Sah schemas are regular data structures, specifically arrays:

 [TYPE_NAME, CLAUSE_SET, EXTRAS]

TYPE_NAME is a string, CLAUSE_SET is a hash of clauses, and EXTRAS is a hash and is optional. Some examples:

 ["int", {"min": 0, "max": 100}]

 // a definition of pos_even (positive even natural numbers). "pos" is defined
 // in the EXTRAS part.
 ["pos", {"div_by": 2}, {"def": {"pos": ["int": {"min": 0}]}}]

A shortcut string form containing only the type name is allowed when there are no clauses. It will be normalized into the array form:

 "int"

The type name can have a * suffix as a shortcut for the "req": 1 clause. This shortcut exists because stating something is required is very common.

 "int*"

 // equivalent to
 ["int", {"req": 1}]

 ["int*", {"min": 0}]

 // equivalent to
 ["int", {"req": 1, "min": 0}]

A flattened array form is also supported when there are no EXTRAS. It will be normalized into the non-flattened form. This shortcut exists to save a couple of keystrokes :-) And also reduce the number of nested structure, which can get a bit unwieldy for complex schemas.

 ["int", "min", 1, "max", 10]

 // is equivalent to
 ["int", {"min": 1, "max": 10}]

TYPE ^

Type classifies data and specifies the possible values of data.

Sah defines several standard types like bool, int, float, str, array, hash, and a few others. Please see Sah::Type for the complete list.

Type name must match this regular expression:

 \A[A-Za-z_][A-Za-z0-9_]+(::[A-Za-z_][A-Za-z0-9_]+)*\z

A type can have clauses. Most clauses declare constraints (thus, constraint clauses). Constraint clauses are like functions, they accept an argument, are evaluated against data and return a value. The returned value need not strictly be boolean, but for the clause to succeed, the return value must evaluate to true. The notion of true/false follows Perl's notion: undefined value, empty string (""), the string "0", and number 0 are considered false. Everything else is true.

For the schema to succeed, all constraint clauses must evaluate to true.

Aside from declaring constraints, clauses can also declare other stuffs. There is the default clause which specifies default value. There are metadata clauses which specify metadata, e.g. the summary, description, tags clauses.

Aside from clauses, type can also have type properties. Properties are different from clauses in the following ways: 1) they are used to find out something about the data, not to test/validate data; 2) they are allowed to not accept any argument. A type can have a property and a clause with the same name, for example the str type have a len clause to test its length against an integer, as well as a len property which returns its length. Properties are differentiated from clauses so that compilers to human text can generate a description like "string where its length is at least 1".

Type properties can be validated against a schema using the prop or if clause.

Base schema. You can define a schema, declare it as a new type, and then write subsequent schemas against that type, along with additional clauses. This is very much like subtyping. See "BASE SCHEMA" for more information.

BASE SCHEMA ^

As mentioned before, you can define a schema as a type and then write other schemas against that type. For example:

 // defined as pos_int type
 ["int", {"min": 0}]

and later:

 // a positive integer, divisible by 5
 ["pos_int", {"div_by": 5}]

During data validation, base schemas will be replaced by its original definition, and all the clause sets will be evaluated. Illustrated by the plus sign:

 ["int", {"min": 0} + {"div_by": 5}]

You can also declare base schemas/types locally using the def key in EXTRAS, for example:

 ["throws", {},
  {
      "def": {
          "single_dice_throw":  ["int": {"in": [1, 2, 3, 4, 5, 6]}],
          "sdt":                "single_dice_throw", // short notation
          "dice_pair_throw":    ["array": {"len": 2, "elems": ["sdt", "sdt"]}],
          "dpt":                "dice_pair_throw",   // short notation
          "throw":              ["any": {"of": ["sdt", "dpt"]}],
          "throws":             ["array": {"of": "throw"}],
      }
  }
 ]

The above schema describes a list of dice throws (throws). Each throw can be a single dice throw (sdt) which is a number between 1 and 6, or a throw of two dices (dpt) which is a 2-element array (where each element is a number between 1 and 6).

Examples of valid data for this schema:

 [1, [1,3], 6, 4, 2, [3,5]]

Examples of invalid data:

 1                  // not an array
 [1, [2, 3], 0]     // the third throw is invalid
 [1, [2, 0, 4], 4]  // the second throw is invalid

All the base schemas names throw, throws, sdt, etc is only declared locally and unknown outside the schema. You can even nest this.

Optional/conditional definition

If you put a ? suffix after the definition name then it means that the definition is optional and can be skipped if the type is already defined, e.g.:

  "def": {
      "emailaddr?": ["str", {"req": 1, "match": ".+\@.+"}],
      "username":   ["str", {"req": 1, "match": "^[a-z0-9_]+$"}]
  }

In the above example, if there is already an emailaddr type defined at that time, the definition will be skipped instead of a "cannot redefine type" error being generated.

Optional definition is useful if you want to provide some defaults (e.g. a rudimentary validation for email address) but don't mind if the validator already has something probably better (a stricter or more precise definition of email address).

CLAUSE AND CLAUSE SET ^

A clause set is a defhash (see DefHash) containing a mapping of clause name and clause values or clause attribute names and clause attribute values. Defhash properties map to Sah clauses, while defhash property attributes map to Sah clause attributes.

 {
     "CLAUSENAME1": CLAUSEVALUE,
     "CLAUSENAME1.ATTRNAME1": ATTRVALUE1,
     "CLAUSENAME1.ATTRNAME2": ATTRVALUE2,
     "CLAUSENAME1.ATTRNAME1.SUBATTR1": ...,
     ...
     "_IGNORED": ...,
     "CLAUSENAME1._IGNORED": ...
 }

For convenience, there are also some shortcuts:

Every clause has a priority between 0 and 100 to determine the order of evaluation (the lower the number, the higher the priority and the earlier the clause is evaluated). Most constraint clauses are at priority 50 (normal) so the order does not matter, but some clauses are early (like default and prefilters) and some are late (like postfilters). Variables mentioned in expression also determine ordering, for example:

 ["int", {"min=": "0.5*$clause:max", "max": 10}]

In the above example, although max and min are both at priority 50, min needs to be evaluated first because it refers to max (XXX syntax of variable not yet finalized).

Clause name

This specification comes from DefHash: Clause names must begin with letter/underscore and contain letters/numbers/underscores only. All clauses which begin with an _ (underscore) is ignored. You can use this to embed extra data for other purposes.

Clause attribute

This specification comes from DefHash: Attribute name must also only contain letters/numbers/underscores, but it can be a dotted-separated series of parts, e.g. alt.lang.id_ID. As with clauses, clause attributes which begin with _ (underscore) is ignored. You can use this to embed extra data.

Currently known general attributes:

Aside from the above general attributes, each clause might recognize its own specific attributes. See documentation of respective clauses.

Clause set merging

Clause set merging happens when a schema is based on another schema and the child schema's clause set contains merge prefixes (explained later) in its keys. For example:

 // schema1
 [TYPE1, CLSET1]

 // schema2, based on schema1
 [schema1, CLSET2]

 // schema3, based on schema2
 [schema2, CLSET3]

When compiling/evaluating schema2, Sah will check against TYPE1 and CLSET1 and then CLSET2. However, when CLSET2 contains a merge prefix (marked with an asterisk here for illustration), then Sah will check against TYPE1 and merge(CLSET1, *CLSET2).

When compiling/evaluating schema3, Sah will check against TYPE1 and CLSET1 and then CLSET2 and then CLSET3. However, when CLSET2 contains a merge prefix, then Sah will check against TYPE1, merge(CLSET1, *CLSET2), and then CLSET3. When CLSET2 and CLSET3 contains merge prefixes, Sah will check against TYPE1 and merge(CLSET1, *CLSET2, *CLSET3). So merging will be done from left to right.

The base schema's clause set must not contain any merge prefixes.

Merging is done using Data::ModeMerge, with merge prefixes changed to 'merge.add.', 'merge.delete.' and so on. In merging, Data::ModeMerge allows keys on the right side hash not only to replace but also add, subtract, remove keys from the left side. This is powerful because it allows schema definition to not only add clauses (restrict types even more), but also replace clauses (change type restriction) as well as delete clauses (relax type restriction). For more information, refer to the Data::ModeMerge documentation.

Illustration:

 int + {"div_by": 2} + {"div_by": 3}               // must be divisible by 2 & 3

 int + {"div_by": 2} + {"merge.normal.div_by": 3} // will be merged and become:
 int + {"div_by": 3}                              // must be divisible by 3 ONLY

 int + {"div_by": 2} + {"merge.delete.div_by": 0}  // will be merged and become:
 int + {}                                          // need not be divisible

 int + {"in": [1,2,3,4,5]} + {"in": [6]}           // impossible to satisfy

 int + {"in": [1,2,3,4,5]} + {"merge.add.in": [6]} // will be merged and become:
 int + {"in": [1,2,3,4,5,6]}

 int + {"in": [1,2,3,4,5]}, {"merge.subtract.in": [4]}  // will become:
 int + {"in": [1,2,3,  5]}

Merging is performed before schema is normalized.

Merging is not recursive.

EXPRESSION ^

XXX: Syntax of variables not yet finalized.

Sah supports expressions, using Language::Expr minilanguage. See Language::Expr::Manual::Syntax for details on the syntax. You can specify expression in the check clause, e.g.:

 ["int", {"check": "$_ >= 4"}]

Alternatively, expression can also be specified in any clause's attribute:

 ["int", {"min=": "floor(4.9)"}]

The above three schemas are equivalent to:

 ["int", {"min", 4}]

Expression can refer to elements of data and (normalized) schema, and can call functions, enabling more complex schema to be defined, for example:

 ["array*", {"len": 2, "elems": [
   ["str*", {"match": "^\w+$"}],
   ["str*", {"match=": "${../../0/clause_sets/0/match}",
             "min_len=": "2*length(${data:../0})"}]
 ]}]

The above schema requires data to be a two-element array containing strings, where the length of the second string has to be at least twice the length of the first. Both strings have to comply to the same regex, ^\w+$ (which is declared on the first string's clause and referred to in the second string's clause).

FUNCTION ^

Functions can be used in expressions. The syntax of calling function is:

 func()
 func(ARG, ...)

Functions in Sah can sometimes accept several types of arguments, e.g. len(ARRAY) will return the number of elements in the ARRAY, while len(STR) will return the number of characters in the string. However, when an inappropriate argument is given, an exception will be thrown.

EXTRAS ^

The extras part of a schema (the third element) contains various stuffs. It is a DefHash that can contain these keys:

HISTORY ^

2012-07-21 split specification to Sah

2011-11-23 Data::Sah

2009-03-30 Data::Schema (first CPAN release)

Previous incarnation as Schema-Nested (internal)

SEE ALSO ^

DefHash

Sah::Type, Sah::FAQ

HOMEPAGE ^

Please visit the project's homepage at https://metacpan.org/release/Sah.

SOURCE ^

Source repository is at https://github.com/perlancar/perl-Sah.

BUGS ^

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=Sah

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

AUTHOR ^

perlancar <perlancar@cpan.org>

COPYRIGHT AND LICENSE ^

This software is copyright (c) 2014 by perlancar@cpan.org.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

syntax highlighting: