The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
# just to make PodWeaver happy at the moment
package Data::Schema::Manual::Tutorial;


__END__
=pod

=head1 NAME

Data::Schema::Manual::Tutorial

=head1 VERSION

version 0.136

=head1 OVERVIEW

This document is meant to be first reading for people wanting to know
and use Data::Schema (DS). It explains what DS is and what it is for,
how to write DS schemas, and how to validate data structures using DS.

=head1 NAME

Data::Schema::Manual::Tutorial - Introduction to and using Data::Schema

=head1 INTRODUCTION

Often you want to be certain that a piece of data (a scalar, an array, or
perhaps a hash of arrays of hashes, etc.) is of specific range
of values/shape/structure. For example, you might want to make sure that the
argument to your function is an array of just numbers, or that your command line
arguments are valid email addresses that are no longer than 64 characters, and
so on. In fact, data validation happens so often that you're totally sick of
writing code like this:

 if (!defined($arg)) { die "Please specify an argument!" }
 if (ref($arg) ne 'ARRAY') { die "Argument is not an array!" }
 if (!@$arg) { die "Argument is empty array!" }
 for my $i (0..@$arg-1) {
     if (!defined($arg)) { die "Element #$i is undefined!" }
     if ($arg->[$i] !~ /^-?\d+\.?\d*$/) { die "Element #$i is not a number!" }
 }

This is why schemas are a good thing. A schema is a data structure, not code,
that declaratively specifies this kind of validation. The functionality of the
above code can be replaced with this DS schema:

 my $schema = [array=>{required=>1, of=>'float', minlen=>1}];

And the final code becomes:

 my $res = ds_validate($arg, $schema);
 if (!$res->{success}) { die ... }

See how much shorter and simpler it becomes?

DS schemas are less tedious to write, and thus less boring and less error-prone
than manual data validation. Because DS schemas are just normal data structures,
they are reusable across programming languages and can also be validated
themselves (using, why, DS schemas of course).

You can also turn the schema into Perl code (which can run without DS), and
other languages' code too in the future. This can be useful for increasing
performance, or when you do not want DS itself in production environment and
just want the validation code.

So essentially, DS is a way of writing validation code that is shorter/simpler
and more cross-platform/cross-language than writing directly in Perl.

=head1 WRITING SCHEMAS

The simplest form of a schema is just a string specifying a type:

 TYPE

Example:

 int

or

 hash

If you want to restrict the values that the data can contain, you can add one or
more type attributes. The schema becomes a two-element array with the type in
the first element, and the hash of attributes as the second element:

 [ TYPE, ATTRIBUTES ]

Example:

 [ str => {minlen=>4, maxlen=>8} ]

or:

 [ hash => {required_keys=>[qw/name age address/]} ]

For a list of available types and their respective attributes, see the
documentation for B<Data::Schema::Type::*> modules. There are hash, array,
int, float, bool, str, and object types, among others.

If you want, you can even write your own type in Perl.

=head1 VALIDATING USING DS

The simplest way is just by using the ds_validate() function. It is exported
by default. The syntax is:

 ds_validate($data, $schema)

Example:

 use Data::Schema;
 my $res = ds_validate(12, [int => {min=>10}]);
 die "Invalid!" unless $res->{success};

The result ($res) is a hashref:

 {success=>(0 or 1), errors=>[...]}

The 'success' key will be set to 1 if validation is successful, or 0 if not.
The 'errors' keys are each a list of errors provided should you want to
check for details why the validation fails. Each error message is prefixed
with data and schema path-like position to help you pinpoint where in the
data and schema the validation fails.

The second way, OO-style, provides more control and options:

 use Data::Schema;
 my $validator = new Data::Schema;

You can set configuration using:

 $validator->config->CONFIGVAR('VALUE');

You can also load plugins:

 $validator->register_plugin('Data::Schema::Plugin::WHATEVER');

You can then validate using:

 my $res = $validator->validate($data, $schema);

The result is the same hashref described above.

Refer to L<Data::Schema> for details on available configuration and other
methods.

=head1 ANY AND ALL

Schemas can be as simple or as complex as you want.

To require that data be of some type OR of some other type, you can write
something like this:

 [
  "any",
  of => ["array", "hash"],
 ]

This says that your data can be an array(ref) or a hash(ref). B<any> is some
"virtual" type that allows you to specifying several alternatives. Another
example:

 [
  "any",
  of => [
    [int => {min=>1, max=>10}],
    [int => {min=>101, max=>110}],
    [int => {min=>1001, max=>1010}],
  ]
 ]

The above says that you want an int between 1-10, OR between 101-110, OR between
1001-1010.

There is also the B<all> virtual type that requires the data to satisfy ALL
requirements instead of just one. For example:

 [
  "all",
  of => [
    [str => {match=>'^\w+$'}],
    [str => {match=>'^(.)\1$'}],
    [str => {match=>'^[aeiou]$'}],
  ]
 ]

The above says that you need a string which is composed of alphanumeric
characters only and it has a sequence of two identical characters, and also that
it has a vowel. Strings such as C<google> will validate, but these won't:
C<foo bar>, C<bing>, C<http>.

=head2 DEFINING SCHEMAS IN TERMS OF OTHER SCHEMAS

Schemas can actually be defined in terms of other schemas. For example:

 my $schema = {
     def => {
         even => [int => {divisible_by => 2}],
         odd  => [int => {mod => [2, 1]}],
         alt_array => [array => {elem_regex => {"[02468]\$"=>"even", "[13579]\$"=>"odd"}}],
     },
     type => "alt_array",
 };

 my $res;
 $res = ds_validate([2, 3, 8, -7, 10], $schema); # success
 $res = ds_validate([2, 2, 7, -7, 10], $schema); # fail on 2nd and 3rd element

The above schema says that you want an array with alternating even and
odd integers. B<even> and B<odd> can be regarded as subschemas, and
they are used by the B<alt_array> subschema.

Of course you can also write the schema in "one go":

 $schema = [
   array => {
     elem_regex => {
       "[02468]\$"=>[int => {divisible_by => 2}],
       "[13579]\$"=>[int => {mod => [2, 1]}],
     }
  }
 ];

but some of us might find breaking down a complex schema into pieces
help in better understanding it.

=head1 EXTERNAL SCHEMAS

Aside from putting subschemas in a schema, you can also put schemas in a separate
hash:

 my $schema_types = {
     even          => [int   => {divisible_by => 2}],
     positive_even => [even  => {min => 0}],
     array_of_ints => [array => {of => int}],

     address       => [
         "hash",
         {
          required_keys => [qw/line1 line2 city province country postcode/],
          keys => {
              line1    => ["str", {required=>1}],
              line2    =>  "str",
              city     => ["str", {required=>1}],
              province => ["str", {required=>1}],
              country  => ["str", {match=>'/^[A-Z]{2}$/', required=>1}],
              postcode => ["str", {minlen=>4, maxlen=>15}],
          }
         }
     ],
 };

 $validator->register_plugin('Data::Schema::Plugin::LoadSchema::Hash');
 $validator->config->schema_search_path($schema_types);

 my $res;
 $res = validate(4, 'positive_even');              # success
 $res = validate(4, [positive_even => {min=>10}]); # fail: less than 10

The above B<address> schema is for validating an address "record" (or
"form"). There are also other schema types defined in
B<$schema_types>. They are loaded using DSP::LoadSchema::Hash.

Another alternative is putting schemas in YAML files.

 # in schemadir/address.yaml
 - hash
 - allowed_keys: [line1, line2, city, province, country, postcode]
   keys:
     line1:    [str, {required: 1}]
     line2:     str
     city:     [str, {required: 1}]
     province: [str, {required: 1}]
     country:  [str, {match: '^[A-Z]{2}$', required: 1}]
     postcode: [str, {minlen: 4, maxlen: 15}]
   deps:
     - [country, [str, {set: 1, is: US}], postcode, [str, {match: '^[0-9]{5}$'}]]
     - [country, [str, {set: 1, is: ID}], postcode, [str, {match: '^[0-9]{5}$'}]]
     # add postcode rules for more countries

 # in schemadir/us_address.yaml
 - us_address
 - allow_extra_keys: 1
   keys:
     country: [str, {is: US}]

 # in schemadir/even.yaml
 - int
 - divisible_by: 2

 # in your code
 $validator->register_plugin('Data::Schema::Plugin::LoadSchema::YAMLFile');
 $validator->config->schema_search_path(["schemadir"]);

 my $res;
 $res = validate(4, 'even');              # success
 $res = validate(4, [even => {min=>10}]); # fail: less than 10

=head1 MORE EXAMPLES

For now, please see the B<t/schemas/> directory in the distribution.

=head1 SEE ALSO

L<Data::Schema::Manual::Schema>,
L<Data::Schema::Manual::TypeHandler>,
L<Data::Schema::Manual::Plugin>

=head1 AUTHOR

  Steven Haryanto <stevenharyanto@gmail.com>

=head1 COPYRIGHT AND LICENSE

This software is copyright (c) 2009 by Steven Haryanto.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.

=cut