The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=pod

=head1 NAME

UR::Object::Type::Initializer - Class definition syntax

=head1 SYNOPSIS

  UR::Object::Type->define(
      class_name => 'Namespace::MyClass',
      id_by => 'my_class_id',
      has => ['prop_a', 'prop_b']
  );

  UR::Object::Type->define(
      class_name => 'Namespace::MyChildClass',
      is => 'Namespace::MyClass',
      has => [
          'greeting' => { is => 'String', is_optional => 0,
                          valid_values => ["hello","good morning","good evening"],
                          default_value => 'hello' },
      ],
  );

  UR::Object::Type->define(
      class_name => 'Namespace::Helper',
      id_by => 'helper_id',
      has => [
          'my_class_id' => { is => 'String', is_optional => 0 },
          'my_class'    => { is => 'Namespace::MyClass', id_by => 'my_class_id' },
          'my_class_a'  => { via => 'my_class', to => 'prop_a' },
      ],
      has_optional => [
          'other_attribute' => { is => 'Integer' }
      ],
      data_source => 'Namespace::DataSource::DB',
      table_name  => 'HELPERS',
  );

  UR::Object::Type->define(
      class_name => 'Namespace::Users',
      id_by => ['uid'],
      has => [ 'login','passwd','gid','name','home','shell'],
      data_source => {
          is => 'UR::DataSource::File',
          file => '/etc/passwd',
          column_order => ['login','passwd','uid','gid','name','home','shell',
          skip_first_line => 0,
          delimiter => ':'
      },
      id_generator => '-uuid',
  );

=head1 DESCRIPTION

Defining a UR class is like drawing up a blueprint of what a particular kind
of object will look like. That blueprint includes the properties these
objects will have, what other classes the new class inherits from, and where
the source data comes from such as a database or file. 

=head2 The Simplest Class

The simplest class definition would look like this:

  use UR;
  class Thing {};

You can create an instance of this class like this:

  my $thing_object = Thing->create();

Instances of this class have no properties, no backing storage location, and
no inheritance.

Actually, none of those statements are fully true, but we'll come back to
that later... 

=head2 A Little Background

After using UR, or another class that inherits from UR::Namespace, the above
"class" syntax above can be used to define a class.

The equivalent, more plain-Perl way to define a class is like this:

  UR::Object::Type->define(
     class_name => 'Thing',
     # the remainder of the class definition would go here, if there were any
  );

Classes become instances of another class called L<UR::Object::Type>. It has
a property called class_name that contains the package that instances of
these objects are blessed into. Class properties are also instances of a
class called L<UR::Object::Property>, and those properties have properties
(also UR::Object::Properties) that describe it, such as the type of data it
holds, and its length. In fact, all the metadata about classes, properties,
relationships, inheritance, data sources and contexts are available as
instances of metadata classes. You can get information about any class
currently available from the command line with the command

  ur show properties Thing

with the caveat that "currently available" means you need to be under a
namespace directory that contains the class you're describing. 

=head2 Making Something Useful

  class Vehicle {
      id_by => 'serial_number',
      has => ['color', 'weight'],
      has_optional => ['license_plate'],
  };

Here we have a basic class definition for a thing we're calling a Vehicle.
It has 4 properties: serial_number, color, weight and license_plate. Three
of these properties are required, meaning that when you create one, you must
give a value for those properties; it is similar to a 'NOT NULL' constraint
on a database column. The serial_number property is an ID property of the
class, meaning that no two instances (objects) of that class can exist with
the same serial_number; it is similar to having a UNIQUE index on that
column (or columns) in a database. Not all vehicles have license plates, so
it is optional.

After that, you've effectively created five object instances. One
UR::Object::Type identified by its class_name being 'Vehicle', and four
UR::Object::Property objects identified by the pairs of class_name and
property_name. For these four properties, class_name is always 'Vehicle'
and property name is one each of serial_number, color, weight and
license_plate.

Objects always have one property that is called 'id'. If you have only one
property in the id_by section, then the 'id' property is effectively an alias
for it. If you have several id_by properties, then the 'id' property becomes
an amalgamation of the directly named id properties such that no two objects
of that class will have the same 'id'. If there are no id_by properties
given (including MyClass above that doesn't have _any_ properties), then an
implicit 'id' property will get created.  Instances of that class will have
an 'id' generated internally by an algorithm.  Finally, if the class has
more than one ID property, none of them may be called 'id', since that name
will be reserved for the amalgamated-value property.

You can control how IDs get autogenerated with the class' id_generator
metadata.  For classes that save their results in a relational database, it
will get new IDs from a sequence (or some equivalent mechanism for databases
that do not support sequences) based on the class' table name.  If you want
to force the system to use some specific sequence, for example if many
classes should use the same sequence generator, then put the name of this
sequence in.

If the id_generator begins with a dash (-), it indicates a method should
be called to generate a new ID.  For example, if the name is "-uuid", then the
system will call the internal method
C<$class_meta->autogenerate_new_object_id_uuid>. nd will make object IDs as hex
string UUIDs.  The default value is '-urinternal' which makes an ID string
composed of the hostname, process ID, the time the program was started and 
an increasing integer.  If id_generator is a subroutine reference, then the
sub will be called with the class metaobject and creation BoolExpr passed as
parameters.

You'll find that the parser for class definitions is pretty accepting about
the kinds of data structures it will take. The first thing after class is
used as a string to name the class. The second thing is a hashref
containing key/value pairs. If the value part of the pair is a single
string, as the id_by is in the Vehicle class definition, then one property
is created. If the value portion is an arrayref, then each member of the
array creates an additional property. 

=head2 Filling in the Details

That same class definition can be made this way:

  class Vehicle {
      id_by => [
          serial_number => { is => 'String', len => 25 },
      ],
      has => [
          color => { is => 'String' },
          weight => { is => 'Number' },
          license_plate => { is => 'String', len => 8, is_optional => 1 },
      ],
  };

Here we've more explicitly defined the class' properties by giving them a
type. serial_number and license_number are given a maximum length, and
license_number is declared as optional. Note that having a 'has_optional'
section is the same as explicitly putting 'is_optional => 1' for all those
properties. The same shortcut is supported for the other boolean properties
of UR::Object::Property, such as is_transient, is_mutable, is_abstract, etc.

The type system is pretty lax in that there's nothing stopping you from
using the method for the property to assign a string into a property
declared 'Number'.  Type, length, is_optional constraints are checked by
calling C<__errors__()> on the object, and indirectly when data is committed
back to its data source.

=head2 Inheritance

  class Car {
      is => 'Vehicle',
      has => [
          passenger_count   => { is => 'Integer', default_value => 0 },
          transmission_type => { is => 'String',
                                 valid_values => ['manual','automatic','cvt'] },
      ],
  };

  my $car = Car->create(color => 'blue',
                        serial_number => 'abc123',
                        transmission_type => 'manual');  

Here we define another class called Car. It inherits from Vehicle, meaning
that all the properties that apply to Vehicle instances also apply to Car
instances. In addition, Car instances have two new properties.
passenger_count has a default value of 0, and transmission_type is
constrained to three possible values.

=head2 More class properties

Besides property definitions, there are other things that can be specified
in a class definition.

=over 4

=item is

Used to name the parent class(es). Single inheritance can be specified by
just listing the name of the parent class as a string. Multiple inheritance
is specified by an arrayref containing the parent class names.  If no 'is'
is listed, then the class will inherit from 'UR::Entity'

=item doc

A single string to list some short, useful documentation about the class.

=item data_source

A string to list the data source ID. For classes with no data_source,
the only objects get() can return are those that had previously been
instantiated with create() or define() earlier in the program, and they do
not get saved anywhere during a commit(). They do, however, exist in the
object cache during the program's execution.

data_source can also be a hashref to define a data source in line with
the class definition.  See below for more information about
L</Inline Data Sources>.

=item table_name

When the class' data source is some kind of database, C<table_name> Specifies
the name of the table where this class' data is stored to.

=item select_hint

Some relational databases use hints as a way of telling the query optimizer
to behave differently than usual.  These hints are specified inside comments
like this:
    /* the hint */
If the class is the primary class of a query, and it has a hint, then the hint
will appear after the word 'select' in the SQL.

=item join_hint

If the class is part of a query where it is joined, then its hint will be
added to the hints already part of the query.  The primary table's hint will
be first, followed by the joined class' hints in the order they are joined.
All the hints are separated by a single space.

=item is_abstract

A flag indicating that no instances of this class may be instantiated,
instead it is used as a parent of other classes.

=item sub_classification_method_name

Holds the name of a method that is called whenever new instances of the
class are loaded from a data source. This method will be called with two
arguments: the name of the class the get() was called on, and the object
instance being loaded. The method should return the complete name of a
subclass the object should be blessed into.

=item sub_classification_property_name

Works like 'sub_classification_method_name', except that the value of the
property is directly used to subclass the loaded object. 

=back

=head2 Properties properties

C<ur show properties UR::Object::Property> will print out an exhaustive list of all
the properties of a Class Property. A class' properties are declared in the 'id_by'
or one of the 'has' sections.  Some of the more important ones:

=over 4

=item class_name

The name of the class this property belongs to.

=item property_name

The name of the property. 'property_name' and 'class_name' do not actually
appear in the hashref that defines the property inside a class definition,
though they are properties of UR::Object::Property instances.

=item is

Specifies the data type of this property. Basic types include 'String',
'Integer', 'Float'.  Relationships between classes are defined by having the
name of another class here. See the Relationships section of
L<UR::Manual::Cookbook> for more information.

Object properties do not normally hold Perl references to other objects, but you
may use 'ARRAY' or 'HASH' here to indicate that the object will store the
reference directly.  Note that these properties are not usually saveable to
outside data sources.

=item data_type

A synonym for 'is'

=item len

Specifies the maximum length of the data, usually in bytes.

=item doc

A space for useful documentation about the property

=item default_value

The value a property will have if it is not specified when the object is
created.  If used on a property that is 'via' another property (see the
Indirect Properties section below), it can trigger creation of a referent
object.

=item is_mutable

A flag indicating that this property can be changed. It is the default state
of a property.  Set this to 0 in the property definition if the property is
not changeable after the object is created.

=item is_constant

A flag indicating that the value of this property may not be changed after
the object is created. It is a synonym for having is mutable = 0

=item is_many

Indicates that this returns a list of values.  Usually used with
reverse_as properties.

=item is_optional

Indicates that this property can hold the value undef. 

=back

=head3 Calculated Properties

=over

=item is_calculated

A flag indicating that the value of this property is determined from a
function.

=item calculate_from

A listref of other property names used by the calculation

=item calculate

A specification for how the property is to be calculated in Perl.

=over 6

=item *

if the value is a coderef, it will be called when that property is accessed,
and the first argument will be the object instance being acted on.

=item *

the value may be a string containing Perl code that is eval-ed when the
accessor is called. The Perl code can refer to $self, which will hold
the correct object instance during execution of that code. Any properties
listed in the 'calculate_from' list will also be initialized 

=item *

The special value 'sum' means that the values of all the properties in the
calculate_from list are added together and returned 

=back

Any property can be effectively turned into a calculated property by defining
a method with the same name as the property. 

=back

=head3 Database-backed properties

=over 4 

=item column_name

For classes whose data is stored in a database table (meaning the class has
a data_source), the column_name holds the name of the database column in its
table. In the default case, the column_name is the same as the 'property_name'.

=item calc_sql

Specifies that this property is calculated, and its value is a string
containing SQL code inserted into that property's "column" in the SELECT
clause 

=back

=head2 Relation Properties

Some properties are not used to hold actual data, but instead describe some
kind of relationship between two classes.  For example:

  class Person {
      id_by => 'person_id',
      has => ['name'],
  };
  class Thing {
      id_by => 'thing_id',
      has => [
          owner => { is => 'Person', id_by => 'owner_id' },
      ],
  };
  $person = Person->create(person_id => 1, name => 'Bob');
  $thing = Thing->create(thing_id => 2, owner_id => 1);

Here, Thing has a property called C<owner>.  It implicitly defines a property
called C<owner_id>.  C<owner> becomes a read-only property that returns an
object of type Person by using the object's value for the C<owner_id>
property, and looking up a Person object where its ID matches.  In the above
case, C<$thing-E<gt>owner> will return the same object that C<$person>
contains.

Indirect properties can also link classes with multiple ID properties.

  class City {
      id_by => ['name', 'state']
  };
  class Location {
      has => [
         city    => { is => 'String' },
         state   => { is => 'String' },
         cityobj => { is => 'City',
                      id_by => ['city', 'state' ] },
      ],
  };

Note that the order the properties are linked must match in the relationship
property's C<id_by> and the related class's C<id_by>

=head2 Reverse Relationships

When one class has a relation property to another, the target class can also
define the converse relationship.  In this case, OtherClass is the same
as the first L</Relation Properties> example where the relationship from
OtherClass to MyClass, but we also define the relationship in the other
direction, from MyClass to OtherClass.

Many Things can point back to the same Person.

  class Person {
      id_by => 'person_id',
      has => ['name'],
      has_many => [
          things => { is => 'Thing', reverse_as => 'owner' },
      ]
  };
  class Thing {
      id_by => 'thing_id',
      has => [
          owner => { is => 'Person', id_by => 'owner_id' },
      ],
  };

Note that the value for C<reverse_as> needs to be the name of the relation
property in the related class that would point back to "me".  Yes, it's a bit
obtuse, but it's the best we have for now.

=head2 Indirect Properties

When the property of a related object has meaning to another object, that
relationship can be defined through an indirect property.  Things already
have owners, but it is also useful to know a Thing's owner's name.

  class Thing {
      id_by => 'thing_id',
      has => [
          owner => { is => 'Person', id_by => 'owner_id' },
          owner_name => { via => 'owner', to => 'name', default_value => 'No one' },
      ],
  };
  $name = $thing->owner_name();
  $name eq $person->name;  # evaluates to true

The values of indirect properties are not stored in the object.  When the
property's method is called, it looks up the related object through the
accessor named in C<via>, and on that result, returns whatever the method
named in C<to> returns.

If one of these Thing objects is created by calling Thing->create(), and 
no value is specified for owner_id, owner or owner_name, then the system will
find a Person object where its 'name' is 'No one' and assign the Thing's 
owner_id to point to that Person.  If no matching Person is found, it will
first create one with the name 'No one'.

=head2 Alias Properties

Sometimes it's useful to have a property that is an alias for another
property, perhaps as a refactoring tool or to make the API clearer.  The is
accomilished by defining an indirect property where the 'via' is __self__.

  class Thing {
      id_by => 'thing_id',
      has => [
          owner => { is => 'Person', id_by => 'owner_id' },
          titleholder => { via => '__self__', to => 'owner' },
      ]
  };

In this case, 'titleholder' is an alias for the 'owner' property.  titleholder
can be called as a method any place owner is a valid method call.  BoolExprs
may refer to titleholder, but any such references will be rewrittn to 'owner'
when they are normalized.

=head2 Subclassing Members of an Abstract Class

In some cases, objects may be loaded using a parent class, but all the
objects are binned into some other subclass.

  class Widget {
      has => [
          manufacturer => { is => 'String',
                            valid_values => ['CoolCo','Vectornox'] },
      ],
      is_abstract => 1,
      sub_classification_method_name => 'subclasser',
  };
  sub Widget::subclasser {
      my($class,$pending_object) = @_;
      my $subclass = 'Widget::' . $pending_object->manufacturer;
      return $subclass;
  }

  class Widget::CoolCo {
      is => 'Widget',
      has => 'serial_number',
  };
  class Widget::Vextornox {
      is => 'Widget',
      has => 'series_string',
  }
          
  my $cool_widget = Widget->create(manufacturer => 'CoolCo');
  $cool_widget->isa('Widget::CoolCo'); # evaluates to true
  $cool_widget->serial_number(12345);  # works
  $cool_widget->series_srting();       # dies

In the class definition for the parent class, Widget, it is marked as being
an abstract class, and the sub_classification_method_name specifies the name
of a method to call whenever a new Widget object is created or loaded.  That
method is passed the pre-subclassed object and must return the fully
qualified subclass name the object really belongs in.  All the objects
returned to the caller will be blessed into the appropriate subclass.

Alternatively, a property can be designated to hold the fully qualified
subclass name.

  class Widget {
      has => [
          subclass_name => { is => 'String',
                             valid_values => ['Widget::CoolCo',
                                              'Widget::Vectornox'] },
      ],
      is_abstract => 1,
      subclassify_by => 'subclass_name',
  }

  my $cool_widget = Widget->create(subclass_name => 'Widget::CoolCo');
  $cool_widget = Widget::CoolCo->create();  # subclass_name is automatically "Widget::CoolCo"

These subclass names will be saved to the data source if the class has a data
source.  Also, when objects of the base class are retrieved with get(), the
results will be automatically put in the appropriate child class.


=head2 Inline Data Sources

If the data_source of a class definition is a hashref instead of a simple
string, that defines an in-line data source.  The only required item in that
hashref is C<is>, which declares what class this data source will be created
from, such as "UR::DataSource::Oracle" or "UR::DataSource::File".  From
there, each type of data source will have its own requirements for what is
allowed in an inline definition.  

For L<UR::DataSource::RDBMS>-derived data sources, it accepts these keys
corresponding to the properties of the same name:

  server, user, auth, owner

For L<UR::DataSource::File> data sources:

  server, file_list, column_order, sort_order, skip_first_line,
  delimiter, record_separator

In addition, file is a synonym for server.

For L<UR::DataSource::FileMux> data sources:

  column_order, sort_order, skip_first_line, delimiter, 
  record_separator, required_for_get, constant_values, file_resolver

In addition, resolve_path_with can replace C<file_resolver> and accepts
several formats:

=over 4

=item subref

A reference to a subroutine.  In this case, C<resolve_path_with> is a
synonym for C<file_resolver>.

=item [ $subref, param1, param2, ..., paramn ]

The subref will be called to resolve the path.  Its arguments will be taken
from the values in the rule from properties mentioned.

=item [ $format, param1, param2, ..., paramn ]

$format will be interpreted as an sprintf() format.  The placeholders in the 
format will be filled in from the values in the rule from properties
mentioned.

=back

Finally, C<base_path> and C<resolve_path_with> can be used together.  In
this case, resolve_path_with is a listref of property names, base_path 
is a string specifying the first part of the pathname.  The final path
is created by joining the base_path and all the property's values together
with '/', as in
  join('/', $base_path, param1, param2, ..., paramn )

=head1 SEE ALSO

L<UR::Object::Type>, L<UR::Object::Property>, L<UR::Manual::Cookbook>