The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
NAME
    `String::Tagged' - string buffers with value tags on extents

SYNOPSIS
     use String::Tagged;

     my $st = String::Tagged->new( "An important message" );

     $st->apply_tag( 3, 9, bold => 1 );

     $st->iter_substr_nooverlap(
        sub {
           my ( $substring, %tags ) = @_;

           print $tags{bold} ? "<b>$substring</b>"
                             : $substring;
        }
     );

DESCRIPTION
    This module implements an object class, instances of which store a
    (mutable) string buffer that supports tags. A tag is a name/value pair
    that applies to some non-empty extent of the underlying string.

    The types of tag names ought to be strings, or at least values that are
    well-behaved as strings, as the names will often be used as the keys in
    hashes or applied to the `eq' operator.

    The types of tag values are not restricted - any scalar will do. This
    could be a simple integer or string, ARRAY or HASH reference, or even a
    CODE reference containing an event handler of some kind.

    Tags may be arbitrarily overlapped. Any given offset within the string
    has in effect, a set of uniquely named tags. Tags of different names are
    independent. For tags of the same name, only the latest, shortest tag
    takes effect.

    For example, consider a string with three tags represented here:

     Here is my string with tags
     [-------------------------]  foo => 1
             [-------]            foo => 2
          [---]                   bar => 3

    Every character in this string has a tag named `foo'. The value of this
    tag is 2 for the words `my' and `string' and the space inbetween, and 1
    elsewhere. Additionally, the words `is' and `my' and the space between
    them also have the tag `bar' with a value 3.

    Since `String::Tagged' does not understand the significance of the tag
    values it therefore cannot detect if two neighbouring tags really
    contain the same semantic idea. Consider the following string:

     A string with words
     [-------]            type => "message"
              [--------]  type => "message"

    This string contains two tags. `String::Tagged' will treat this as two
    different tag values as far as `iter_tags_nooverlap' is concerned, even
    though `get_tag_at' yields the same value for the `type' tag at any
    position in the string. The `merge_tags' method may be used to merge tag
    extents of tags that should be considered as equal.

NAMING
    I spent a lot of time considering the name for this module. It seems
    that a number of people across a number of languages all created similar
    functionallity, though named very differently. For the benefit of
    keyword-based search tools and similar, here's a list of some other
    names this sort of object might be known by:

    *   Extents

    *   Overlays

    *   Attribute or attributed strings

    *   Markup

    *   Out-of-band data

CONSTRUCTOR
  $st = String::Tagged->new( $str )
    Returns a new instance of a `String::Tagged' object. It will contain no
    tags. If the optional `$str' argument is supplied, the string buffer
    will be initialised from this value.

    If `$str' is a `String::Tagged' object then it will be cloned, as if
    calling the `clone' method on it.

  $st = String::Tagged->new_tagged( $str, %tags )
    Shortcut for creating a new `String::Tagged' object with the given tags
    applied to the entire length. The tags will not be anchored at either
    end.

  $new = String::Tagged->clone( $orig, %opts )
    Returns a new instance of `String::Tagged' made by cloning the original,
    subject to the options provided. The returned instance will be in the
    requested class, which need not match the class of the original.

    The following options are recognised:

    only_tags => ARRAY
        If present, gives an ARRAY reference containing tag names. Only
        those tags named here will be copied; others will be ignored.

    except_tags => ARRAY
        If present, gives an ARRAY reference containing tag names. All tags
        will be copied except those named here.

    convert_tags => HASH
        If present, gives a HASH reference containing tag conversion
        functions. For any tags in the original to be copied whose names
        appear in the hash, the name and value are passed into the
        corresponding function, which should return an even-sized key/value
        list giving a tag, or a list of tags, to apply to the new clone.

         my @new_tags = $convert_tags->{$orig_name}->( $orig_name, $orig_value )
         # Where @new_tags is ( $new_name, $new_value, $new_name_2, $new_value_2, ... )

        As a further convenience, if the value for a given tag name is a
        plain string instead of a code reference, it gives the new name for
        the tag, and will be applied with its existing value.

  $new = $orig->clone( %args )
    Called as an instance (rather than a class) method, the newly-cloned
    instance is returned in the same class as the original.

METHODS
  $str = $st->str
  "$st"
    Returns the plain string contained within the object.

    This method is also called for stringification; so the `String::Tagged'
    object can be used in a plain string interpolation such as

     my $message = String::Tagged->new( "Hello world" );
     print "My message is $message\n";

  $len = $st->length
  $len = length( $st )
    Returns the length of the plain string. Because stringification works on
    this object class, the normal core `length' function works correctly on
    it.

  $str = $st->substr( $start, $len )
    Returns a `String::Tagged' instance representing a section from within
    the given string, containing all the same tags at the same conceptual
    positions.

  $str = $st->plain_substr( $start, $len )
    Returns as a plain perl string, the substring at the given position.
    This will be the same string data as returned by `substr', only as a
    plain string without the tags

  $st->apply_tag( $start, $len, $name, $value )
    Apply the named tag value to the given extent. The tag will start on the
    character at the `$start' index, and continue for the next `$len'
    characters.

    If `$start' is given as -1, the tag will be considered to start "before"
    the actual string. If `$len' is given as -1, the tag will be considered
    to end "after" end of the actual string. These special limits are used
    by `set_substr' when deciding whether to move a tag boundary. The start
    of any tag that starts "before" the string is never moved, even if more
    text is inserted at the beginning. Similarly, a tag which ends "after"
    the end of the string, will continue to the end even if more text is
    appended.

    This method returns the `$st' object.

  $st->apply_tag( $e, $name, $value )
    Alternatively, an existing extent object can be passed as the first
    argument instead of two integers. The new tag will apply at the given
    extent.

  $st->unapply_tag( $start, $len, $name )
    Unapply the named tag value from the given extent. If the tag extends
    beyond this extent, then any partial fragment of the tag will be left in
    the string.

    This method returns the `$st' object.

  $st->unapply_tag( $e, $name )
    Alternatively, an existing extent object can be passed as the first
    argument instead of two integers.

  $st->delete_tag( $start, $len, $name )
    Delete the named tag within the given extent. Entire tags are removed,
    even if they extend beyond this extent.

    This method returns the `$st' object.

  $st->delete_tag( $e, $name )
    Alternatively, an existing extent object can be passed as the first
    argument instead of two integers.

  $st->merge_tags( $eqsub )
    Merge neighbouring or overlapping tags of the same name and equal
    values.

    For each pair of tags of the same name that apply on neighbouring or
    overlapping extents, the `$eqsub' callback is called, as

      $equal = $eqsub->( $name, $value_a, $value_b )

    If this function returns true then the tags are merged.

    The equallity test function is free to perform any comparison of the
    values that may be relevant to the application; for example it may
    deeply compare referred structures and check for equivalence in some
    application-defined manner. In this case, the first tag of a pair is
    retained, the second is deleted. This may be relevant if the tag value
    is a reference to some object.

  $st->iter_extents( $callback, %opts )
    Iterate the tags stored in the string. For each tag, the CODE reference
    in `$callback' is invoked once, being passed an extent object that
    represents the extent of the tag.

     $callback->( $extent, $tagname, $tagvalue )

    Options passed in `%opts' may include:

    start => INT
        Start at the given position; defaults to 0.

    end => INT
        End after the given position; defaults to end of string. This option
        overrides `len'.

    len => INT
        End after the given length beyond the start position; defaults to
        end of string. This option only applies if `end' is not given.

    only => ARRAY
        Select only the tags named in the given ARRAY reference.

    except => ARRAY
        Select all the tags except those named in the given ARRAY reference.

  $st->iter_tags( $callback, %opts )
    Iterate the tags stored in the string. For each tag, the CODE reference
    in `$callback' is invoked once, being passed the start point and length
    of the tag.

     $callback->( $start, $length, $tagname, $tagvalue )

    Options passed in `%opts' are the same as for `iter_extents'.

  $st->iter_extents_nooverlap( $callback, %opts )
    Iterate non-overlapping extents of tags stored in the string. The CODE
    reference in `$callback' is invoked for each extent in the string where
    no tags change. The entire set of tags active in that extent is given to
    the callback. Because the extent covers possibly-multiple tags, it will
    not define the `anchor_before' and `anchor_after' flags.

     $callback->( $extent, %tags )

    The callback will be invoked over the entire length of the string,
    including any extents with no tags applied.

    Options may be passed in `%opts' to control the range of the string
    iterated over, in the same way as the `iter_extents' method.

    If the `only' or `except' filters are applied, then only the tags that
    survive filtering will be present in the `%tags' hash. Tags that are
    excluded by the filtering will not be present, nor will their bounds be
    used to split the string into extents.

  $st->iter_tags_nooverlap( $callback, %opts )
    Iterate extents of the string using `iter_extents_nooverlap', but
    passing the start and length of each extent to the callback instead of
    the extent object.

     $callback->( $start, $length, %tags )

    Options may be passed in `%opts' to control the range of the string
    iterated over, in the same way as the `iter_extents' method.

  $st->iter_substr_nooverlap( $callback, %opts )
    Iterate extents of the string using `iter_extents_nooverlap', but
    passing the substring of data instead of the extent object.

     $callback->( $substr, %tags )

    Options may be passed in `%opts' to control the range of the string
    iterated over, in the same way as the `iter_extents' method.

  @names = $st->tagnames
    Returns the set of tag names used in the string, in no particular order.

  $tags = $st->get_tags_at( $pos )
    Returns a HASH reference of all the tag values active at the given
    position.

  $value = $st->get_tag_at( $pos, $name )
    Returns the value of the named tag at the given position, or `undef' if
    the tag is not applied there.

  $extent = $st->get_tag_extent( $pos, $name )
    If the named tag applies to the given position, returns the extent of
    the tag at that position. If it does not, `undef' is returned. If an
    extent is returned it will define the `anchor_before' and `anchor_after'
    flags if appropriate.

  $extent = $st->get_tag_missing_extent( $pos, $name )
    If the named tag does not apply at the given position, returns the
    extent of the string around that position that does not have the tag. If
    it does exist, `undef' is returned. If an extent is returned it will not
    define the `anchor_before' and `anchor_after' flags, as these do not
    make sense for the range in which a tag is absent.

  $st->set_substr( $start, $len, $newstr )
    Modifies a extent of the underlying plain string to that given. The
    extents of tags in the string are adjusted to cope with the modified
    region, and the adjustment in length.

    Tags entirely before the replaced extent remain unchanged.

    Tags entirely within the replaced extent are deleted.

    Tags entirely after the replaced extent are moved by appropriate amount
    to ensure they still apply to the same characters as before.

    Tags that start before and end after the extent remain, and have their
    lengths suitably adjusted.

    Tags that span just the start or end of the extent, but not both, are
    truncated, so as to remove the part of the tag applied on the modified
    extent but preserving that applied outside.

    If `$newstr' is a `String::Tagged' object, then its tags will be applied
    to `$st' as appropriate. Edge-anchored tags in `$newstr' will not be
    extended through `$st', though they will apply as edge-anchored if they
    now sit at the edge of the new string.

  $st->insert( $start, $newstr )
    Insert the given string at the given position. A shortcut around
    `set_substr'.

    If `$newstr' is a `String::Tagged' object, then its tags will be applied
    to `$st' as appropriate. If `$start' is 0, any before-anchored tags in
    will become before-anchored in `$st'.

  $st->append( $newstr )
  $st .= $newstr
    Append to the underlying plain string. A shortcut around `set_substr'.

    If `$newstr' is a `String::Tagged' object, then its tags will be applied
    to `$st' as appropriate. Any after-anchored tags in will become
    after-anchored in `$st'.

  $st->append_tagged( $newstr, %tags )
    Append to the underlying plain string, and apply the given tags to the
    newly-inserted extent.

    Returns `$st' itself so that the method may be easily chained.

  $ret = $st->concat( $other )
  $ret = $st . $other
    Returns a new `String::Tagged' containing the two strings concatenated
    together, preserving any tags present. This method overloads normal
    string concatenation operator, so expressions involving `String::Tagged'
    values retain their tags.

    This method or operator tries to respect subclassing; preferring to
    return a new object of a subclass if either argument or operand is a
    subclass of `String::Tagged'. If they are both subclasses, it will
    prefer the type of the invocant or first operand.

  @subs = $st->matches( $regexp )
    Returns a list of substrings (as `String::Tagged' instances) for every
    non-overlapping match of the given `$regexp'.

    This could be used, for example, to build a formatted string from a
    formatted template containing variable expansions:

     my $template = ...
     my %vars = ...

     my $ret = String::Tagged->new;
     foreach my $m ( $template->matches( qr/\$\w+|[^$]+/ ) ) {
        if( $m =~ m/^\$(\w+)$/ ) {
           $ret->append_tagged( $vars{$1}, %{ $m->get_tags_at( 0 ) } );
        }
        else {
           $ret->append( $m );
        }
     }

    This iterates segments of the template containing variables expansions
    starting with a `$' symbol, and replaces them with values from the
    `%vars' hash, careful to preserve all the formatting tags from the
    original template string.

  @parts = $st->split( $regexp, $limit )
    Returns a list of substrings by applying the regexp to the string
    content; similar to the core perl `split' function. If `$limit' is
    supplied, the method will stop at that number of elements, returning the
    entire remainder of the input string as the final element. If the
    `$regexp' contains a capture group then the content of the first one
    will be added to the return list as well.

  $ret = $st->debug_sprintf
    Returns a representation of the string data and all the tags, suitable
    for debug printing or other similar use. This is a format such as is
    given in the DESCRIPTION section above.

    The output will consist of a number of lines, the first containing the
    plain underlying string, then one line per tag. The line shows the
    extent of the tag given by `[---]' markers, or a `|' in the special case
    of a tag covering only a single character. Special markings of `<' and
    `>' indicate tags which are "before" or "after" anchored.

    For example:

      Hello, world
      [---]         word       => 1
     <[----------]> everywhere => 1
            |       space      => 1

Extent Objects
    These objects represent a range of characters within the containing
    `String::Tagged' object. The range they represent is fixed at the time
    of creation. If the containing string is modified by a call to
    `set_substr' then the effect on the extent object is not defined. These
    objects should be considered as relatively short-lived - used briefly
    for the purpose of querying the result of an operation, then discarded
    soon after.

  $extent->string
    Returns the containing `String::Tagged' object.

  $extent->start
    Returns the start index of the extent. This is the index of the first
    character within the extent.

  $extent->end
    Returns the end index of the extent. This is the index of the first
    character beyond the end of the extent.

  $extent->anchor_before
    True if this extent begins "before" the start of the string. Only
    certain methods return extents with this flag defined.

  $extent->anchor_after
    True if this extent ends "after" the end of the string. Only certain
    methods return extents with this flag defined.

  $extent->length
    Returns the number of characters within the extent.

  $extent->substr
    Returns the substring contained by the extent.

  $extent->plain_substr
    Returns the substring of the underlying plain string buffer contained by
    the extent.

TODO
    *   There are likely variations on the rules for `set_substr' that could
        equally apply to some uses of tagged strings. Consider whether the
        behaviour of modification is chosen per-method, per-tag, or
        per-string.

    *   Consider how to implement a clone from one tag format to another
        which wants to merge multiple different source tags together into a
        single new one.

AUTHOR
    Paul Evans <leonerd@leonerd.org.uk>