The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Schema::Manual::Tutorial

VERSION

version 0.136

OVERVIEW

This document is meant to be first reading for people wanting to know and use Data::Schema (DS). It explains what DS is and what it is for, how to write DS schemas, and how to validate data structures using DS.

NAME

Data::Schema::Manual::Tutorial - Introduction to and using Data::Schema

INTRODUCTION

Often you want to be certain that a piece of data (a scalar, an array, or perhaps a hash of arrays of hashes, etc.) is of specific range of values/shape/structure. For example, you might want to make sure that the argument to your function is an array of just numbers, or that your command line arguments are valid email addresses that are no longer than 64 characters, and so on. In fact, data validation happens so often that you're totally sick of writing code like this:

 if (!defined($arg)) { die "Please specify an argument!" }
 if (ref($arg) ne 'ARRAY') { die "Argument is not an array!" }
 if (!@$arg) { die "Argument is empty array!" }
 for my $i (0..@$arg-1) {
     if (!defined($arg)) { die "Element #$i is undefined!" }
     if ($arg->[$i] !~ /^-?\d+\.?\d*$/) { die "Element #$i is not a number!" }
 }

This is why schemas are a good thing. A schema is a data structure, not code, that declaratively specifies this kind of validation. The functionality of the above code can be replaced with this DS schema:

 my $schema = [array=>{required=>1, of=>'float', minlen=>1}];

And the final code becomes:

 my $res = ds_validate($arg, $schema);
 if (!$res->{success}) { die ... }

See how much shorter and simpler it becomes?

DS schemas are less tedious to write, and thus less boring and less error-prone than manual data validation. Because DS schemas are just normal data structures, they are reusable across programming languages and can also be validated themselves (using, why, DS schemas of course).

You can also turn the schema into Perl code (which can run without DS), and other languages' code too in the future. This can be useful for increasing performance, or when you do not want DS itself in production environment and just want the validation code.

So essentially, DS is a way of writing validation code that is shorter/simpler and more cross-platform/cross-language than writing directly in Perl.

WRITING SCHEMAS

The simplest form of a schema is just a string specifying a type:

 TYPE

Example:

 int

or

 hash

If you want to restrict the values that the data can contain, you can add one or more type attributes. The schema becomes a two-element array with the type in the first element, and the hash of attributes as the second element:

 [ TYPE, ATTRIBUTES ]

Example:

 [ str => {minlen=>4, maxlen=>8} ]

or:

 [ hash => {required_keys=>[qw/name age address/]} ]

For a list of available types and their respective attributes, see the documentation for Data::Schema::Type::* modules. There are hash, array, int, float, bool, str, and object types, among others.

If you want, you can even write your own type in Perl.

VALIDATING USING DS

The simplest way is just by using the ds_validate() function. It is exported by default. The syntax is:

 ds_validate($data, $schema)

Example:

 use Data::Schema;
 my $res = ds_validate(12, [int => {min=>10}]);
 die "Invalid!" unless $res->{success};

The result ($res) is a hashref:

 {success=>(0 or 1), errors=>[...]}

The 'success' key will be set to 1 if validation is successful, or 0 if not. The 'errors' keys are each a list of errors provided should you want to check for details why the validation fails. Each error message is prefixed with data and schema path-like position to help you pinpoint where in the data and schema the validation fails.

The second way, OO-style, provides more control and options:

 use Data::Schema;
 my $validator = new Data::Schema;

You can set configuration using:

 $validator->config->CONFIGVAR('VALUE');

You can also load plugins:

 $validator->register_plugin('Data::Schema::Plugin::WHATEVER');

You can then validate using:

 my $res = $validator->validate($data, $schema);

The result is the same hashref described above.

Refer to Data::Schema for details on available configuration and other methods.

ANY AND ALL

Schemas can be as simple or as complex as you want.

To require that data be of some type OR of some other type, you can write something like this:

 [
  "any",
  of => ["array", "hash"],
 ]

This says that your data can be an array(ref) or a hash(ref). any is some "virtual" type that allows you to specifying several alternatives. Another example:

 [
  "any",
  of => [
    [int => {min=>1, max=>10}],
    [int => {min=>101, max=>110}],
    [int => {min=>1001, max=>1010}],
  ]
 ]

The above says that you want an int between 1-10, OR between 101-110, OR between 1001-1010.

There is also the all virtual type that requires the data to satisfy ALL requirements instead of just one. For example:

 [
  "all",
  of => [
    [str => {match=>'^\w+$'}],
    [str => {match=>'^(.)\1$'}],
    [str => {match=>'^[aeiou]$'}],
  ]
 ]

The above says that you need a string which is composed of alphanumeric characters only and it has a sequence of two identical characters, and also that it has a vowel. Strings such as google will validate, but these won't: foo bar, bing, http.

DEFINING SCHEMAS IN TERMS OF OTHER SCHEMAS

Schemas can actually be defined in terms of other schemas. For example:

 my $schema = {
     def => {
         even => [int => {divisible_by => 2}],
         odd  => [int => {mod => [2, 1]}],
         alt_array => [array => {elem_regex => {"[02468]\$"=>"even", "[13579]\$"=>"odd"}}],
     },
     type => "alt_array",
 };

 my $res;
 $res = ds_validate([2, 3, 8, -7, 10], $schema); # success
 $res = ds_validate([2, 2, 7, -7, 10], $schema); # fail on 2nd and 3rd element

The above schema says that you want an array with alternating even and odd integers. even and odd can be regarded as subschemas, and they are used by the alt_array subschema.

Of course you can also write the schema in "one go":

 $schema = [
   array => {
     elem_regex => {
       "[02468]\$"=>[int => {divisible_by => 2}],
       "[13579]\$"=>[int => {mod => [2, 1]}],
     }
  }
 ];

but some of us might find breaking down a complex schema into pieces help in better understanding it.

EXTERNAL SCHEMAS

Aside from putting subschemas in a schema, you can also put schemas in a separate hash:

 my $schema_types = {
     even          => [int   => {divisible_by => 2}],
     positive_even => [even  => {min => 0}],
     array_of_ints => [array => {of => int}],

     address       => [
         "hash",
         {
          required_keys => [qw/line1 line2 city province country postcode/],
          keys => {
              line1    => ["str", {required=>1}],
              line2    =>  "str",
              city     => ["str", {required=>1}],
              province => ["str", {required=>1}],
              country  => ["str", {match=>'/^[A-Z]{2}$/', required=>1}],
              postcode => ["str", {minlen=>4, maxlen=>15}],
          }
         }
     ],
 };

 $validator->register_plugin('Data::Schema::Plugin::LoadSchema::Hash');
 $validator->config->schema_search_path($schema_types);

 my $res;
 $res = validate(4, 'positive_even');              # success
 $res = validate(4, [positive_even => {min=>10}]); # fail: less than 10

The above address schema is for validating an address "record" (or "form"). There are also other schema types defined in $schema_types. They are loaded using DSP::LoadSchema::Hash.

Another alternative is putting schemas in YAML files.

 # in schemadir/address.yaml
 - hash
 - allowed_keys: [line1, line2, city, province, country, postcode]
   keys:
     line1:    [str, {required: 1}]
     line2:     str
     city:     [str, {required: 1}]
     province: [str, {required: 1}]
     country:  [str, {match: '^[A-Z]{2}$', required: 1}]
     postcode: [str, {minlen: 4, maxlen: 15}]
   deps:
     - [country, [str, {set: 1, is: US}], postcode, [str, {match: '^[0-9]{5}$'}]]
     - [country, [str, {set: 1, is: ID}], postcode, [str, {match: '^[0-9]{5}$'}]]
     # add postcode rules for more countries

 # in schemadir/us_address.yaml
 - us_address
 - allow_extra_keys: 1
   keys:
     country: [str, {is: US}]

 # in schemadir/even.yaml
 - int
 - divisible_by: 2

 # in your code
 $validator->register_plugin('Data::Schema::Plugin::LoadSchema::YAMLFile');
 $validator->config->schema_search_path(["schemadir"]);

 my $res;
 $res = validate(4, 'even');              # success
 $res = validate(4, [even => {min=>10}]); # fail: less than 10

MORE EXAMPLES

For now, please see the t/schemas/ directory in the distribution.

SEE ALSO

Data::Schema::Manual::Schema, Data::Schema::Manual::TypeHandler, Data::Schema::Manual::Plugin

AUTHOR

  Steven Haryanto <stevenharyanto@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2009 by Steven Haryanto.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.