The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

BusyBird::Util - utility functions for BusyBird

SYNOPSIS

    use BusyBird::Util qw(sort_statuses split_with_entities future_of);
    
    future_of($timeline, "get_statuses", count => 100)->then(sub {
        my ($statuses) = @_;
        my $sorted_statuses = sort_statuses($statuses);
        my $status = $sorted_statuses->[0];
        my $segments_arrayref = split_with_entities($status->{text}, $status->{entities});
        return $segments_arrayref;
    })->catch(sub {
        my ($error, $is_normal_error) = @_;
        warn $error;
    });

DESCRIPTION

This module provides some utility functions useful in BusyBird.

EXPORTABLE FUNCTIONS

The following functions are exported only by request.

$sorted = sort_statuses($statuses)

Sorts an array of status objects appropriately. Argument $statuses is an array-ref of statuses.

Return value $sorted is an array-ref of sorted statuses.

The sort refers to $status->{created_at} and $status->{busybird}{acked_at} fields. See "Order_of_Statuses" in BusyBird::StatusStorage section.

$segments_arrayref = split_with_entities($text, $entities_hashref)

Splits the given $text with the "entities" and returns the split segments.

$text is a string to be split. $entities_hashref is a hash-ref which has the same stucture as Twitter Entities. Each entity object annotates a part of $text with such information as linked URLs, mentioned users, mentioned hashtags, etc. If $entities_hashref doesn't conform to the said structure, it is ignored.

The return value $segments_arrayref is an array-ref of "segment" objects. A "segment" is a hash-ref containing a part of $text and the entity object (if any) attached to it. Note that $segments_arrayref has segments that no entity is attached to. $segments_arrayref is sorted, so you can assemble the complete $text by concatenating all the segments.

Example:

    my $text = 'aaa --- bb ---- ccaa -- ccccc';
    my $entities = {
        a => [
            {indices => [0, 3],   url => 'http://hoge.com/a/1'},
            {indices => [18, 20], url => 'http://hoge.com/a/2'},
        ],
        b => [
            {indices => [8, 10], style => "bold"},
        ],
        c => [
            {indices => [16, 18], footnote => 'first c'},
            {indices => [24, 29], some => {complex => 'structure'}},
        ],
        d => []
    };
    my $segments = split_with_entities($text, $entities);
    
    ## $segments = [
    ##     { text => 'aaa', start => 0, end => 3, type => 'a',
    ##       entity => {indices => [0, 3], url => 'http://hoge.com/a/1'} },
    ##     { text => ' --- ', start => 3, end => 8, type => undef,
    ##       entity => undef},
    ##     { text => 'bb', start => 8, end => 10, type => 'b',
    ##       entity => {indices => [8, 10], style => "bold"} },
    ##     { text => ' ---- ', start => 10, end =>  16, type => undef,
    ##       entity => undef },
    ##     { text => 'cc', start => 16, end => 18, type => 'c',
    ##       entity => {indices => [16, 18], footnote => 'first c'} },
    ##     { text => 'aa', start => 18, end => 20, type => 'a',
    ##       entity => {indices => [18, 20], url => 'http://hoge.com/a/2'} },
    ##     { text => ' -- ', start => 20, end => 24, type => undef,
    ##       entity => undef },
    ##     { text => 'ccccc', start => 24, end => 29, type => 'c',
    ##       entity => {indices => [24, 29], some => {complex => 'structure'}} }
    ## ];

Any entity object is required to have indices field, which is an array-ref of starting and ending indices of the text part. The ending index must be greater than or equal to the starting index. If an entitiy object does not meet this condition, that entity object is ignored.

Except for indices, all fields in entity objects are optional.

Text ranges annotated by entity objects must not overlap. In that case, the result is undefined.

A segment hash-ref has the following fields.

text

Substring of the $text.

start

Starting index of the segment in $text.

end

Ending index of the segment in $text.

type

Type of the entity. If the segment has no entity attached, it is undef.

entity

Attached entity object. If the segment has no entity attached, it is undef.

It croaks if $text is undef.

$future = future_of($invocant, $method, %args)

Wraps a callback-style method call with a Future::Q object.

This function executes $invocant->$method(%args), which is supposed to be a callback-style method. Before the execution, callback field in %args is overwritten, so that the result of the $method can be obtained from $future.

To use future_of(), the $method must conform to the following specification. (Most of BusyBird::Timeline's callback-style methods follow this specification)

  • The $method takes named arguments as in $invocant->$method(key1 => value1, key2 => value2 ... ).

  • When the $method's operation is done, the subroutine reference stored in $args{callback} must be called exactly once.

  • $args{callback} must be called as in

        $args{callback}->($error, @results)
  • In success, the $error must be a falsy scalar and the rest of the arguments is the result of the operation. The arguments other than $error are used to fulfill the $future.

  • In failure, the $error must be a truthy scalar that describes the error. The $error is used to reject the $future.

The return value ($future) is a Future::Q object, which represents the result of the $method call. If $method throws an exception, it is caught by future_of() and $future becomes rejected.

In success, $future is fulfilled with the results the $method returns.

    $future->then(sub {
        my @results = @_;
        ...
    });

In failure, $future is rejected with the error and a flag.

    $future->catch(sub {
        my ($error, $is_normal_error) = @_;
        ...
    });

If $error is the error passed to the callback, $is_normal_error is true. If $error is the exception the method throws, $is_normal_error does not even exist.

$tracking_timeline = make_tracking($tracking_timeline, $main_timeline)

Makes $tracking_timeline a tracking timeline for a certain source of statuses, which is then input to $main_timeline. $tracking_timeline and $main_timeline must be BusyBird::Timeline objects.

Return value is the given $tracking_timeline object.

This method uses BusyBird::Log to log error messages when something goes wrong.

A "tracking timeline" is a timeline dedicated to tracking status history of a single source. You might need it when you import statuses from various sources into a single "main" timeline.

For example,

    use BusyBird;
    use BusyBird::Input::Feed;
    
    my $input = BusyBird::Input::Feed->new();
    my $main_timeline = timeline("main");
    $main_timeline->add( $input->parse_url('http://example1.com/feed.rss') );
    $main_timeline->add( $input->parse_url('http://example2.com/feed.rss') );
    $main_timeline->add( $input->parse_url('http://example3.com/feed.rss') );

In the above example, statuses are imported from three different RSS feeds using BusyBird::Input::Feed. Because BusyBird::Timeline rejects duplicate statuses, the above code adds only new and unread statuses to $main_timeline.

However, if update rates of the three feeds are different, it's possible for old statuses to re-appear in $main_timeline as new statuses. This is because BusyBird::Timeline has limited capacity for storing statuses.

Suppose the example1 and example2 update quickly whereas example3's update rate is very slow. At first, $main_timeline keeps all statuses from the three feeds. After a while, the $main_timeline will be filled with statuses from example1 and example2, and at a certain point, statuses from example3 will be discarded because they are too old. After that, $main_timeline->add( $input->parse_url('http://example3.com/feed.rss') ) imports the same statuses just discarded, but $main_timeline now recognizes them as new because they are no longer in $main_timeline. So those old statuses from example3 will re-appear as unread.

To prevent that tragedy, you should create tracking timelines.

    use BusyBird;
    use BusyBird::Input::Feed;
    use BusyBird::Util qw(make_tracking);
    
    my $input = BusyBird::Input::Feed->new();
    my $main_timeline = timeline("main");
    make_tracking(timeline("example1"), $main_timeline);
    make_tracking(timeline("example2"), $main_timeline);
    make_tracking(timeline("example3"), $main_timeline);
    
    timeline("example1")->add( $input->parse_url('http://example1.com/feed.rss') );
    timeline("example2")->add( $input->parse_url('http://example2.com/feed.rss') );
    timeline("example3")->add( $input->parse_url('http://example3.com/feed.rss') );

You should add statuses into tracking timelines instead of directly into $main_timeline. Each tracking timeline keeps statuses from its source, and it forwards only new statuses to the $main_timeline.

AUTHOR

Toshio Ito <toshioito [at] cpan.org>