The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
An overview of Tangence
=======================

Before I go much further, I need to write a lot more about what Tangence
is. Tangence is all of the following:

 1. A single server/multiple client protocol for sharing information
    about objects.

 2. A data model - it defines the types of values that are transmitted
    between the server and clients.

 3. An object model - it defines the abstract look-and-feel of objects 
    that are visible in the server end, and the proxies to them that
    exist in the client ends.

 4. A wire protocol - it defines the bits down the wire of some stream.

 5. A collection of Perl modules (a Perl distribution) which implements
    all of the above.

These writings may sometimes suffer the "Java problem"; the problem of
the same name being applied to too many different concepts. I'll try to
make the context or wording clear to minimise confusions.


1. Server/Client
----------------

In a Tangence system, one program is distinct in being the server. It is
the program that hosts the actual objects being considered. It is also
the program that holds the networking socket to which the clients
connect.

The other programs are all clients, which connect to the server. While
each client is notionally distinct, they all share access to the same
objects within the server. The clients are not directly aware of each
other's existence, though each one's effects on the system may be
visible to the others as a result of calling methods or altering
properties on the objects. Internally, the clients will use proxy objects
through which to access the objects in the server. There will be a
one-to-one correspondance between server objects and client proxies. Not
every server object needs to have a corresponding proxy in every client -
proxies are created lazily when they are required.


2. Data Model
-------------

Whenever a value is sent across the connection between the server and a
client, that value has a fixed type. The underlying streaming layer
recognises the following fundamental types of values. Each type has a
string to identify call it, called the signature. These are used by
introspection data; see later.

 * Booleans

   Uses the type signature "bool".

 * Integers, both signed and unsigned, in 8, 16, 32 and 64bit lengths

   An integer of unspecified size uses the type signature "int".
   Specific sized integers use the type signatures
     "s8", "s16", "s32", "s64", "u8", "u16", "u32", "u64"

 * Unicode strings

   Uses the type signature "str".

 * References to Tangence objects

   Uses the type signature "obj".

 * Lists of values

   Uses the type signature "list(T)" where T is the type signature of its
   element type.

 * Dictionaries of (string) named keys to values

   Uses the type signature "dict(T)" where T is the type signature of its
   element type.

 * For type signatures, there is also the type of "any", which allows any
   type.

As Tangence is primarily an interprocess-communication layer, its main
focus is that of communication. The Data Model applies transiently, to
data as it is in transit between the server and a client. A consequence
here is that it only considers the surface value of the types of data,
rather than any deeper significance. It does not preserve self-referential
data, nor can it cope with cyclic structures. More complex shaped data
should be represented by real Tangence objects.

[
 There is an unresolved issue here to deal with export to other languages.
 This is whether a new type of "structure" should be created, which allows
 for a hetrogenous collection.
]


3. Object Model
---------------

In Tangence, the primary item of interaction is an object. Tangence
objects exist in the server, most likely bearing at least some
relationship to some native objects in the server implementation (though
if and when the occasion ever arises that a C program can host a Tangence
server, obviously this association will be somewhat looser).

In the server, two special objects exist - one is the Root object, the
other is the Repository. These are the only two well-known objects that
the client knows always exist. All the other objects are initially
accessed via these.

The client(s) interact with the server almost entirely by performing
operations on objects. When the client connects to the server, two special
object proxies are constructed in the client, to represent the Root and
Repository objects. These are the base through which all the other
interactions are performed. Other object proxies may only be obtained by
the return values of methods on existing objects, arguments passed in
events from them, or retrieved as the value of properties on objects.

Each object is an instance of some particular class. The class provides
all of the typing information for that instance. Principly, that class
defines a name, and the collection of methods, events, and properties that
exist on instances of that class. Each class may also name other classes
as parents; recursively merging the interface of all those named.

Tangence concerns itself with the interface of and ways to interact with
the objects in the server, and not with any ways in which the objects
themselves are actually implemented. The class inheritance therefore only
applies to the interface, and does not directly relate to any
implementation behaviour the server might implement.

3.1. Methods

Each object class may define named methods that clients can invoke on
objects in the server. Each method has:

  + a name
  + argument types
  + a return type

The arguments to a method are positional. The return is a single value
(not a list of values, such as Perl could represent).

Methods on objects in the server may be invoked by clients. Once a
method is invoked by a client, the client must wait until it returns
before it can send any other request to the server.

3.2 Events

Each object class may define named events that objects may emit. Each
method has:

    + a name
    + argument types

Like methods, the arguments to an event are positional.

Events do not have return types, as they are simple notifications from the
server to the client, to inform them that some event happened. Clients are
not automatically informed of every event on every object. Instead, the
client must specifically register interest in specific events on specific
objects.

3.3 Properties

Each object class may define named properties that the object has. Each
object in the class will have a value for the property. Each property has:

    + a name
    + a dimension - scalar, queue, array, hash or object set
    + a type
    + a boolean indicating if it is "smashed"

Properties do not have arguments. A client can request the current value
of a property on an object, or set a new value. It can also register an
interest in the property, where the server will inform the client of
changes to the value.

Each property has a dimension; one of scalar, queue, array, hash, or object
set. The behaviour of each type of property is:

3.3.1 Scalar Properties

The property is a single atomic scalar. It is set atomically by the
server, and may be queried.

3.3.2 Queue and Array Properties

The property is a contiguous array of individual elements. Each element is
indexed by a non-negative integer. The property type gives the type of each
element in the array. These properties differ in the types of operations they
can support. Queues do not support splice or move operations, arrays do.

3.3.3 Hash Properties

The property is an association between string and values. Each element is
uniquely indexed by a null-terminated string. The property type gives the
type of each element in the hash.  The elements do not have an inherent
ordering and are indexed by unique strings.

3.3.4 Object Set Properties

The property is an unordered collection of Tangence objects.

Scalar properties have a single atomic value. If it changes, the client is
informed of the entire new value, even if its type indicates it to be a
list or dictionary type. For non-scalar properties, the value of each
element in the collection is set individually by the server. Elements can
be changed, added or removed. Changes to individual elements can be sent
to the clients independently of the others.

Certain properties may be deemed by the application to be important enough
for all clients to be aware of all of the time (such as a name or other
key item of information). These properties are called "smashed
properties". When the server first sends a new object to a client, the
object construction message will also contain initial values of these
properties. The client will be automatically informed of any changes to
these properties when they change, as if the client had specifically
requested to be informed.  When the object is sent to a new client, it is
said to be "smashed"; the initial values of these automatic properties are
called "smash values".

[There are issues here that need resolving to move Tangence out from
being Perl-specific into a more general-purpose layer - more on this in
a later email].


4. Wire Protocol
----------------

The wire protocol used by Tangence operates over a reliable stream. This
stream may be provided by a TCP socket, UNIX local socket, or even the
STDIN/STDOUT pipe pair of an SSH connection.

The following message descriptions all use the symbolic constant names
from the Tangence::Constants perl module, to be more readable.

4.1. Messages

At its lowest level, the wire protocol consists of a pair of endpoints to
the stream, each sending and receiving messages to its peer. The protocol
at this level is symmetric between the client and the server. It consists
of messages that are either reqests or responses. 

An endpoint sends a request, which the peer must then respond to. Each
request has exactly one response. The requests and responses are paired
sequentially in a pipeline fashion.

The two endpoints are distinct from each other, in that there is no
requirement for a peer to respond to an outstanding request it has
received before sending a new request of its own. There is also no
requirement to wait on the response to a request it has sent, before
sending another.

The basic message format is a binary exchange of messages in the following
format:

 Type:    1 byte  integer
 Length:  4 bytes integer, big-endian
 Payload: n bytes

The type is a single byte which defines the message type. The collection 
of message types is given below. The length is a big-endian 4 byte integer
which gives the size of the message payload, excluding this header. Thus,
the length of the entire message will always be 5 bytes more. The data
payload of the message is encoded in the data serialisation scheme given
below. Each argument to the message is encoded as a single serialisation
item. For message types with a variable number of arguments, the length of
the message itself defines the number of arguments given.

The stream protocol is designed to be used in situations where the CPU
power of each endpoint is high, but the connection in between may have
high latency, or low bandwidth. It is therefore optimised in favour of
roundtrips and byte count overhead, at the expense of processing power
needed to encode or decode it. One consequence here is that no attempt is
made to align multi-byte values.

4.2. Data Serialisation

The data serialisation format applies recursively down a data structure
tree. Each node in structure is either a string, an object reference, or
a list or dictionary of other values. The serialised bytes encode the tree
structure recursively. Other types of entry also exist in the serialised
stream, which carry metadata about the types, such as object classes and
instances.

The encoding of each node in the data structure consists of a type, a
size, and the actual data payload. The type and size of a node are encoded
in its leader byte (or bytes). The top three bits of the first byte
determines the type:

 Type           Bits                    Description

 DATA_NUMBER    0 0 0 t t t t t         numeric
        where 'ttttt' gives the number subtype

 DATA_STRING    0 0 1 s s s s s         string
 DATA_LIST      0 1 0 s s s s s         list of values
 DATA_DICT      0 1 1 s s s s s         dictionary of string->value
 DATA_OBJECT    1 0 0 s s s s s         Tangence object reference
        where 'sssss' gives the size

 DATA_META      1 1 1 n n n n n
        where 'nnnnn' gives the metadata type

For numbers, the lower five bits encode the numeric type, which defines
how many more bytes will be used

 Subtype                Subtype bits    Extra bytes     Description

 DATANUM_BOOLFALSE      0 0 0 0 0       0               Boolean false
 DATANUM_BOOLTRUE       0 0 0 0 1       0               Boolean true
 DATANUM_UINT8          0 0 0 1 0       1               Unsigned 8bit
 DATANUM_SINT8          0 0 0 1 1       1               Signed 8bit
 DATANUM_UINT16         0 0 1 0 0       2               Unsigned 16bit
 DATANUM_SINT16         0 0 1 0 1       2               Signed 16bit
 DATANUM_UINT32         0 0 1 1 0       4               Unsigned 32bit
 DATANUM_SINT32         0 0 1 1 1       4               Signed 32bit
 DATANUM_UINT64         0 1 0 0 0       8               Unsigned 64bit
 DATANUM_SINT64         0 1 0 0 1       8               Signed 64bit

All multi-byte integers are always stored in big-endian form.

For string, list, dict and object types, the lower five bits give a
number, 0 to 31, which helps encode the size. For items of size 30 or
below, this size is encoded directly. Where the size is 31 or more, the
number 31 is encoded, and the actual size follows this leading byte. For
sizes 31 to 127, the next byte encodes it. For sizes 128 or above, the
next 4 bytes encode it in big-endian format, with the top bit set. Sizes
above 2^31 cannot be encoded.

Following the leader are bytes encoding the data. The exact meaning of the
size depends on the type of the node.

For strings, the size gives the number of bytes in the string. These
bytes then follow the leader.

For lists, the size gives the number of elements in the list. Following
the leader will be this number of data serialisations, one per list
element.

For dictionaries, this size gives the number of key/value pairs. Following
the leader will be this number of key/value pairs. Each pair consists of a
null-terminated string for the key name, then a data serialisation for the
value.

For objects, the size gives the number of bytes in the object's ID number,
followed by a big-endian encoding of the object's ID number. Currently,
this will always be a 4 byte number.

Meta-data items may be embedded within a data stream in order to create
the object classes and instances which it contains. These metadata items
do not count towards the overall size of a collection value.

Meta-data operations encode a subtype number, rather than a size, in the
bottom five bits.

 Metadata type          Bits                    Description

 DATAMETA_CONSTRUCT     1 1 1 0 0 0 0 1         Construct an object
 DATAMETA_CLASS         1 1 1 0 0 0 1 0         Create a new object class

Following each metadata item is an encoding of its arguments.

DATAMETA_CONSTRUCT:
  Object ID:    4 byte big-endian
  Object class: null-terminated string
  Smash values: 0 or more bytes, data encoded (list)

  If the object class defines smash properties, the construct message will
  also contain the values for the smash properties. These will be sent in
  a list, one value per property, in the same order as the object class's
  schema defines the smash keys.

DATAMETA_CLASS:
  Class name:   null-terminated string
  Schema:       data encoded (dict)
  Smash keys:   data encoded (list)

  The schema will be encoded as a dictionary, in the following layout.

  DATA_DICT:
    'methods' = DATA_DICT:
      one key per method, values are DATA_DICT:
        'args' = string description of argument types
        'ret'  = string description of return type

    'events' = DATA_DICT:
      one key per event, values are DATA_DICT:
        'args' = string description of argument types

    'properties' = DATA_DICT:
      one key per property, values are DATA_DICT:
        'dim' = dimension, one of:
          DIM_SCALAR, DIM_HASH, DIM_QUEUE, DIM_ARRAY, DIM_OBJSET
        'type' = string description of the type
        'smash' = true if this is a smashed property

    'isa' = DATA_LIST:
      class names

  The smash keys will be encoded as a possibly-empty list of strings.

4.3. Message Types

Each of the messages defines the layout of its data payload. Some messages
pass a fixed number of items, some have a variable number of items in the
last position. For these messages, no explicit encoding of the size is
given. Instead, the data payload area is packed with as many data
encodings as are required. The receiver should use the size of the
containing message to know when all the items have been unpacked.

The following request types are defined. Any message may be responded to
by MSG_ERROR in case of an error, so this response type is not listed.
Some of these messages are sent from the client to the server (C->S),
others are sent from the server to client (S->C)

MSG_CALL (C->S)
  INT           object ID
  STRING        method name
  data...       arguments

  Responses: MSG_RESULT

  Calls the named method on the given object.

MSG_SUBSCRIBE (C->S)
  INT           object ID
  STRING        event name

  Responses: MSG_SUBSCRIBED

  Subscribes the client to be informed of the event on given object.

MSG_UNSUBSCRIBE (C->S)
  INT           object ID
  STRING        event name

  Responses: MSG_OK

  Cancels an event subscription.

MSG_EVENT (S->C)
  INT           object ID
  STRING        event name
  data...       arguments

  Responses: MSG_OK

  Informs the client that the event has occured.

MSG_GETPROP (C->S)
  INT           object ID
  STRING        property name

  Responses: MSG_RESULT

  Requests the current value of the property

MSG_SETPROP (C->S)
  INT           object ID
  STRING        property name
  data          new value

  Responses: MSG_OK

  Sets the new value of the property

MSG_WATCH (C->S)
  INT           object ID
  STRING        property name
  BOOL          want initial?

  Responses: MSG_WATCHING

  Requests to be informed of changes to the property value. If the
  boolean 'want initial' value is true, the client will be sent an
  initial MSG_CHANGE message for the current value of the property.

MSG_UNWATCH (C->S)
  INT           object ID
  STRING        property name

  Responses: MSG_OK

  Cancels a request to watch a property

MSG_UPDATE (S->C)
  INT           object ID
  STRING        property name
  U8            change type
  data...       change value

  Responses: MSG_OK

  Informs the client that the property value has now changed. The
  type of change is given by the change type argument, and defines the
  data layout in the value arguments. The exact meaning of the operation
  depends on the dimension of the property it acts on.

  For DIM_SCALAR:

    CHANGE_SET:
      data      new value

      Sets the new value of the property.

  For DIM_HASH:
    CHANGE_SET:
      DICT      new value

      Sets the new value of the property.

    CHANGE_ADD:
      STRING    key
      data      value

      Adds a new element to the hash.

    CHANGE_DEL:
      STRING    key

      Deletes an element from the hash.

  For DIM_QUEUE:
    CHANGE_SET:
      LIST      new value

      Sets the new value of the property.

    CHANGE_PUSH:
      data...   additional values

      Appends the additional values to the end of the queue.

    CHANGE_SHIFT:
      INT       number of elements

      Removes a number of leading elements from the beginning of the
      queue.

  For DIM_ARRAY:
    CHANGE_SET:
      LIST      new value

      Sets the new value of the property.

    CHANGE_PUSH:
      data...   additional values

      Appends the additional values to the end of the array.

    CHANGE_SHIFT:
      INT       number of elements

      Removes a number of leading elements from the beginning of the
      array.

    CHANGE_SPLICE:
      INT       start
      INT       count
      data...   new elements

      Replaces the given range of the array with the new elements given.
      The new list of values may be a different length to the replaced
      section - in this case, subsequent elements will be shifted up or
      down accordingly.

    CHANGE_MOVE:
      INT       index
      INT       delta

      Moves the item currently at the index forward a (signed) delta amount,
      such that its new index becomes index+delta. The items inbetween the old
      and new index will be moved up or down as appropriate.

  For DIM_OBJSET:
    CHANGE_SET:
      LIST      objects

      Sets the new value for the property. Will be given a list of
      Tangence object references.

    CHANGE_ADD:
      OBJECT    new object

      Adds the given object to the set

    CHANGE_DEL:
      STRING    object ID

      Removes the object of the given ID from the set.

MSG_DESTROY (S->C)
  INT           object ID

  Responses: MSG_OK

  Informs the client that the object is due for destruction in
  the server. Upon receipt of this message the client should destroy
  any remaining references it has to the object. After it has sent the
  MSG_OK response, it will not be allowed to invoke any methods,
  subscribe to any events, nor interact with any properties on
  the object. Any existing event subscriptions or property
  watches will have been removed by the server before this message is
  sent.

MSG_GETROOT (C->S)
  data          identity

  Responses: MSG_RESULT

  Initial message to be sent by the client to obtain the root object. The
  identity may be used to identify this particular client, as part of its
  login procedure. The result will contain a single object reference,
  being the root object.

MSG_GETREGISTRY (C->S)
  [no arguments]

  Responses: MSG_RESULT

  Requests the registry object from the server. The result will contain a
  single object reference, being the registry object.


The following responses may be sent to a request:

MSG_OK
  [no arguments]

  A simple OK message, informing the requester that the operation was
  successful, an no error occured.

MSG_ERROR
  STRING        error message

  An error occured; the text of the message is included.

MSG_RESULT
  data...       values

  Contains the return value from a method call, a property value, or the
  initial root or registry object.

MSG_SUBSCRIBED
  [no arguments]

  Informs the client that a MSG_SUBSCRIBE was successful.

MSG_WATCHING
  [no arguments]

  Informs the client that a MSG_WATCH was successful.


5. Perl Distribution
--------------------

The perl distribution is available from

  http://bazaar.leonerd.dyndns.org/perl/Tangence/

At some stage when the details become more concrete this will start
gaining inline documentation, but for now it just has some commenting.

As a rough description of the modules:

5.1. Shared by server and client

  + Tangence::Constants
    Defines various magic numbers used in the wire streaming protocol.

  + Tangence::Stream
    Implements most of the lower level wire streaming protocol, including
    the symmetric parts of data serialisation.

5.2. Used by the client

  + Tangence::Connection
    The connection to the server. Handles the higher-level client-specific
    parts of the wire protocol.

  + Tangence::ObjectProxy
    Acts as a proxy to one particular object within the server. Used for
    invoking methods, subscribing to events, and interacting with
    properties.

5.3. Used by the server

  + Tangence::Object
    A base class for implementing Tangence objects within the server.

  + Tangence::Registry
    The object registry; keeps a reference to every Tangence object in the
    server.

  + Tangence::Server
    A base class for implementing the entire server.

  + Tangence::Server::Connection
    Server end of a client connection. Handles most of the higher-level
    server-specific parts of the wire protocol.

  + Tangence::Server::Context
    An object class to represent the client calling context during the
    invocation of a server object method or property change.


-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
http://www.leonerd.org.uk/