The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Sah::FAQ - Frequently asked questions

VERSION

version 0.9.5

GENERAL

Why use a schema (a.k.a "Turing tarpit")? Why not use pure Perl?

Schema language is a specialized language (DSL) that should be more concise to write than equivalent Perl code for common validation tasks. Its goal is never to be as powerful as Perl.

90% of the time, my schemas are some variations of the simple cases like:

 "str*"
 ["str":   {"len_between": [1, 10], "match": "some regex"}]
 ["str":   {"in": ["a", "b", "c", ...]}]
 ["array": {"of": "some_other_type"}]
 ["hash":  {"keys": {"key1": "some schema", ...}, "req_keys": [...], ...}]

and writing schemas is faster and less tedious/error-prone than writing equivalent Perl code, plus Data::Sah can generate JavaScript code and human description text for me. For more complex validation I stay with Sah until it starts to get unwieldy. It usually can go pretty far since I can add functions and custom clauses to its types; it's for the very complex and dynamic validation needs that I go pure Perl. Your mileage may vary.

What does "Sah" mean?

Sah is an Indonesian word, meaning "valid" or "legal". It's picked because it's short.

The previous incarnation of this module uses the namespace Data::Schema, started in 2009 and deprecated in 2011 in favor of "Sah".

Comparison to JSON schema?

  • JSON schema limits its type system to that supported by JSON/JavaScript.

  • JSON schema's syntax is simpler.

    It's metaschema (schema for the schema) is only about 130 lines. There are no shortcut forms.

  • JSON schema's features are more limited.

    No expression, no function,

WRITING SCHEMAS

What is the difference between the keys and req_keys clauses?

req_keys require keys to exist, but its value is governed by the schema in keys. Here are four combination possibilities, each with the schema:

To require a hash key to exist, but its value can be undef:

 ["hash", "keys", {"a": "int"}, "req_keys": ["a"]]

To allow a hash key to not exist, but when it exists it must not be undef:

 ["hash", "keys", {"a": "int*"}]

To allow a hash key to not exist, or its value to be undef when exists:

 ["hash", "keys", {"a": "int"}]

To require hash key exist and its value must not be undef:

 ["hash", "keys", {"a": "int*"}, "req_keys": ["a"]]

How to express "not-something"? Why isn't there a not or not_in clause?

There are generally no not_CLAUSE clauses. Instead, a generic !CLAUSE syntax is provided. Examples:

 // an integer that is not 0
 ["int", {"!is": 0}]

How to state in as well as !in in the same clause set?

You can't do this since it will cause a conflict:

 ["str ", {"in": ["a","b","c"], "!in": ["x","y","z"]}]

However, you can do this:

 ["str ", {"cset&": [{"in": ["a","b","c"]}, {"!in": ["x","y","z"]}]}]

How to express mutual failure ("if A fails, B must also fails")?

You can use if_clause clause and negate the clauses. For example:

 "if_clause": ["!div_by": 2, "!div_by": 5]

Merging and hash keys?

XXX (Turn off hash merging using the '' Data::ModeMerge options key.

General advice when writing schemas?

  • Avoid any or all if you know that data is of a certain type

    For performance and ease of reflection, it is better to create a custom clause than using the any type, especially with long list of alternatives. An example:

     // dns_record is either a_record, mx_record, ns_record, cname_record, ...
     ["any", "of", [
             "a_record",
             "mx_record",
             "ns_record",
             "cname_record",
             ...
         ]
     ]
    
     // base_record
     ["hash", "keys", {
         "owner": "str*",
         "ttl": "int",
     }]

    // a_record ["hash", "[merge]keys", { "type": ["str*", "is", "A"], "address": "str*" }]

     // mx_record
     ["hash", "[merge]keys", {
         "type": ["str*", "is", "MX"],
         "host": "str*",
         "prio": "int"
     }]
    
     ...

    If you see the declaration above, every record is a hash. So it is better to declare dns_record as a hash instead of an any. But we need to select a different schema based on the type key. We can develop a custom clause like this:

     ["hash", "select_schema_on_key", ["type", {
         "A": "a_record",
         "MX": "mx_record",
         "NS": "ns_record",
         "CNAME": "cname_record",
         ...
     }]]

    This will be faster.

AUTHOR

Steven Haryanto <stevenharyanto@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2012 by Steven Haryanto.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.