The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
NAME
    Bifcode - simple serialization format

VERSION
    0.001_12 (2017-11-12)

SYNOPSIS
        use boolean;
        use Bifcode qw( encode_bifcode decode_bifcode );

        my $bifcode = encode_bifcode {
            bools   => [ boolean::false, boolean::true, ],
            bytes   => \pack( 's<',       255 ),
            integer => 25,
            float   => 1.25e-5,
            undef   => undef,
            utf8    => "\x{df}",
        };

        # 7b 55 35 3a 62 6f 6f 6c 73 2c 5b 30    {U5:bools,[0
        # 31 5d 55 35 3a 62 79 74 65 73 2c 42    1]U5:bytes,B
        # 32 3a ff  0 2c 55 35 3a 66 6c 6f 61    2:..,U5:floa
        # 74 2c 46 31 2e 32 35 65 2d 35 2c 55    t,F1.25e-5,U
        # 37 3a 69 6e 74 65 67 65 72 2c 49 32    7:integer,I2
        # 35 2c 55 35 3a 75 6e 64 65 66 2c 7e    5,U5:undef,~
        # 55 34 3a 75 74 66 38 2c 55 32 3a c3    U4:utf8,U2:.
        # 9f 2c 7d                               .,}

        my $decoded = decode_bifcode $bifcode;

DESCRIPTION
    Bifcode implements the *bifcode* serialisation format, a mixed
    binary/text encoding with support for the following data types:

    *   Primitive:

        *   Undefined(null)

        *   Booleans(true/false)

        *   Integer numbers

        *   Floating point numbers

        *   UTF8 strings

        *   Binary strings

    *   Structured:

        *   Arrays(lists)

        *   Hashes(dictionaries)

    The encoding is simple to construct and relatively easy to parse. There
    is no need to escape special characters in strings. It is not considered
    human readable, but as it is mostly text it can usually be visually
    debugged.

    *bifcode* can only be constructed canonically; i.e. there is only one
    possible encoding per data structure. This property makes it suitable
    for comparing structures (using cryptographic hashes) across networks.

    In terms of size the encoding is similar to minified JSON. In terms of
    speed this module compares well with other pure Perl encoding modules
    with the same features.

MOTIVATION
    *bifcode* was created for a project because none of currently available
    serialization formats (Bencode, JSON, MsgPack, Netstrings, Sereal, YAML,
    etc) met the requirements of:

    *   Support for undef

    *   Support for binary data

    *   Support for UTF8 strings

    *   Universally-recognized canonical form for hashing

    *   Trivial to construct on the fly from SQLite triggers

    I have no lofty goals or intentions to promote this outside of my
    specific case, but would appreciate hearing about other uses or
    implementations.

SPECIFICATION
    The encoding is defined as follows:

  BIFCODE_UNDEF
    A null or undefined value correspond to "~".

  BIFCODE_TRUE and BIFCODE_FALSE
    Boolean values are represented by "1" and "0".

  BIFCODE_UTF8
    A UTF8 string is "U" followed by the octet length of the encoded string
    as a base ten number followed by a colon and the encoded string followed
    by ",". For example the Perl string "\x{df}" (ß) corresponds to
    "U2:\x{c3}\x{9f},".

  BIFCODE_BYTES
    Opaque data is 'B' followed by the octet length of the data as a base
    ten number followed by a colon and then the data itself followed by ",".
    For example a three-byte blob 'xyz' corresponds to 'B3:xyz,'.

  BIFCODE_INTEGER
    Integers are represented by an 'I' followed by the number in base 10
    followed by a ','. For example 'I3,' corresponds to 3 and 'I-3,'
    corresponds to -3. Integers have no size limitation. 'I-0,' is invalid.
    All encodings with a leading zero, such as 'I03,', are invalid, other
    than 'I0,', which of course corresponds to 0.

  BIFCODE_FLOAT
    Floats are represented by an 'F' followed by a decimal number in base 10
    followed by a 'e' followed by an exponent followed by a ','. For example
    'F3.0e-1,' corresponds to 0.3 and 'F-0.1e0,' corresponds to -0.1. Floats
    have no size limitation. 'F-0.0e0,' is invalid. All encodings with an
    extraneous leading zero, such as 'F03.0e0,', or an extraneous trailing
    zero, such as 'F3.10e0,', are invalid.

  BIFCODE_LIST
    Lists are encoded as a '[' followed by their elements (also *bifcode*
    encoded) followed by a ']'. For example '[U4:spam,U4:eggs,]' corresponds
    to ['spam', 'eggs'].

  BIFCODE_DICT
    Dictionaries are encoded as a '{' followed by a list of alternating keys
    and their corresponding values followed by a '}'. For example,
    '{U3:cow,U3:moo,U4:spam,U4:eggs,}' corresponds to {'cow': 'moo', 'spam':
    'eggs'} and '{U4:spam,[U1:a,U1:b,]}' corresponds to {'spam': ['a',
    'b']}. Keys must be BIFCODE_UTF8 or BIFCODE_BYTES and appear in sorted
    order (sorted as raw strings, not alphanumerics).

INTERFACE
  "encode_bifcode( $datastructure )"
    Takes a single argument which may be a scalar, or may be a reference to
    either a scalar, an array or a hash. Arrays and hashes may in turn
    contain values of these same types. Returns a byte string.

    The mapping from Perl to *bifcode* is as follows:

    *   'undef' maps directly to BIFCODE_UNDEF.

    *   The "true" and "false" functions from the boolean distribution
        encode to BIFCODE_TRUE and BIFCODE_FALSE.

    *   Plain scalars are treated as BIFCODE_UTF8 unless:

        *   They look like canonically represented integers in which case
            they are mapped to BIFCODE_INTEGER; or

        *   They look like floats in which case they are mapped to
            BIFCODE_FLOAT.

    *   SCALAR references become BIFCODE_BYTES.

    *   ARRAY references become BIFCODE_LIST.

    *   HASH references become BIFCODE_DICT.

    You can force scalars to be encoded a particular way by passing a
    reference to them blessed as Bifcode::BYTES, Bifcode::INTEGER,
    Bifcode::FLOAT or Bifcode::UTF8. The "force_bifcode" function below can
    help with creating such references.

    This subroutine croaks on unhandled data types.

  "decode_bifcode( $string [, $max_depth ] )"
    Takes a byte string and returns the corresponding deserialised data
    structure.

    If you pass an integer for the second option, it will croak when
    attempting to parse dictionaries nested deeper than this level, to
    prevent DoS attacks using maliciously crafted input.

    *bifcode* types are mapped back to Perl in the reverse way to the
    "encode_bifcode" function, with the exception that any scalars which
    were "forced" to a particular type (using blessed references) will
    decode as unblessed scalars.

    Croaks on malformed data.

  "force_bifcode( $scalar, $type )"
    Returns a reference to $scalar blessed as Bifcode::$TYPE. The value of
    $type is not checked, but the "encode_bifcode" function will only accept
    the resulting reference where $type is one of 'bytes', 'float',
    'integer' or 'utf8'.

  "diff_bifcode( $bc1, $bc2, [$diff_args] )"
    Returns a string representing the difference between two bifcodes. The
    inputs do not need to be valid Bifcode; they are only expanded with a
    very simple regex before the diff is done. The third argument
    ($diff_args) is passed directly to Text::Diff.

    Croaks if Text::Diff is not installed.

DIAGNOSTICS
    The following exceptions may be raised by Bifcode:

    Bifcode::Error::Decode
        Your data is malformed in a non-identifiable way.

    Bifcode::Error::DecodeBytes
        Your data contains a byte string with an invalid length.

    Bifcode::Error::DecodeBytesTrunc
        Your data includes a byte string declared to be longer than the
        available data.

    Bifcode::Error::DecodeBytesTerm
        Your data includes a byte string that is missing a "," terminator.

    Bifcode::Error::DecodeDepth
        Your data contains dicts or lists that are nested deeper than the
        $max_depth passed to "decode_bifcode()".

    Bifcode::Error::DecodeTrunc
        Your data is truncated.

    Bifcode::Error::DecodeFloat
        Your data contained something that was supposed to be a float but
        didn't make sense.

    Bifcode::Error::DecodeFloatTrunc
        Your data contains a float that is truncated.

    Bifcode::Error::DecodeInteger
        Your data contained something that was supposed to be an integer but
        didn't make sense.

    Bifcode::Error::DecodeIntegerTrunc
        Your data contains an integer that is truncated.

    Bifcode::Error::DecodeKeyType
        Your data violates the *bifcode* format constaint that all dict keys
        be BIFCODE_BYTES or BIFCODE_UTF8.

    Bifcode::Error::DecodeKeyDuplicate
        Your data violates the *bifcode* format constaint that all dict keys
        must be unique.

    Bifcode::Error::DecodeKeyOrder
        Your data violates the *bifcode* format constaint that dict keys
        must appear in lexical sort order.

    Bifcode::Error::DecodeKeyValue
        Your data contains a dictionary with an odd number of elements.

    Bifcode::Error::DecodeTrailing
        Your data does not end after the first *bifcode*-serialised item.

    Bifcode::Error::DecodeUTF8
        Your data contained a UTF8 string with an invalid length.

    Bifcode::Error::DecodeUTF8Trunc
        Your data includes a string declared to be longer than the available
        data.

    Bifcode::Error::DecodeUTF8Term
        Your data includes a UTF8 string that is missing a "," terminator.

    Bifcode::Error::DecodeUsage
        You called "decode_bifcode()" with invalid arguments.

    Bifcode::Error::DiffUsage
        You called "diff_bifcode()" with invalid arguments.

    Bifcode::Error::EncodeBytesUndef
        You attempted to encode "undef" as a byte string.

    Bifcode::Error::EncodeFloat
        You attempted to encode something as a float that isn't recognised
        as one.

    Bifcode::Error::EncodeFloatUndef
        You attempted to encode "undef" as a float.

    Bifcode::Error::EncodeInteger
        You attempted to encode something as an integer that isn't
        recognised as one.

    Bifcode::Error::EncodeIntegerUndef
        You attempted to encode "undef" as an integer.

    Bifcode::Error::EncodeUTF8Undef
        You attempted to encode "undef" as a UTF8 string.

    Bifcode::Error::EncodeUnhandled
        You are trying to serialise a data structure that contains a data
        type not supported by the *bifcode* format.

    Bifcode::Error::EncodeUsage
        You called "encode_bifcode()" with invalid arguments.

    Bifcode::Error::ForceUsage
        You called "force_bifcode()" with invalid arguments.

BUGS AND LIMITATIONS
    Strings and numbers are practically indistinguishable in Perl, so
    "encode_bifcode()" has to resort to a heuristic to decide how to
    serialise a scalar. This cannot be fixed.

    At the moment all Perl hash keys are encoded as BIFCODE_UTF8 as I have
    not yet had the need for BIFCODE_BYTES keys or found a cheap, obvious
    way to distinguish the two.

SEE ALSO
    This distribution includes the diff-bifcode command-line utility for
    comparing Bifcodes in files.

AUTHOR
    Mark Lawrence <nomad@null.net>, heavily based on Bencode by Aristotle
    Pagaltzis <pagaltzis@gmx.de>

COPYRIGHT AND LICENSE
    This software is copyright (c):

    *   2015 by Aristotle Pagaltzis

    *   2017 by Mark Lawrence.

    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.