The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::RSS::Timing - understanding RSS skipHours, skipDays, sy:update*

SYNOPSIS

  ...after getting an RSS/RDF feed that contains the following:
     <sy:updateFrequency>3</sy:updateFrequency>
     <sy:updatePeriod>hourly</sy:updatePeriod>
     <sy:updateBase>1970-01-01T08:20+00:00</sy:updateBase>

  use XML::RSS::Timing;
  my $timing = XML::RSS::Timing->new;
  $timing->lastPolled(   time() );
  $timing->updatePeriod( 'hourly' );
  $timing->updateFrequency( 3 );
  $timing->updateBase( '1970-01-01T08:20+00:00' );
  
  # Find out the soonest I can expect new content:
  my $then = $timing->nextUpdate;
  print "I can next poll the feed after $then (",
    scalar(localtime($then)), " local time)\n";
  

Polling it before $then is unlikely to return any new content, according to the sy:update* elements' values.

DESCRIPTION

RSS/RDF modules can use the elements skipHours, skipDays, ttl, sy:updateBase, sy:updatePeriod, and sy:updateFrequency to express what days/times they won't update, so that RSS/RDF clients can conserve network resources by not bothering to poll a feed more than once during such a period.

This Perl module is for taking in the RSS/RDF skipHours, skipDays, ttl, and sy:update* elements' values, and figuring out when they say new content might be available.

Note: This module doesn't depend on XML::RSS, nor in fact have any particular relationship with it.

OVERVIEW

There are two perspectives on this problem:

The "When To Ignore Until?" Perspective

With this perspective, you have just polled the given RSS/RDF feed (regardless of whether its content turns out to be new), and you want to see if the feed says you can skip polling it until some other future time. With this perspective, you extract the sy:update* fields' values and/or the skipHours, skipDays, and ttl values and pass them to a new XML::RSS::Timing object, and then ask when you should avoid polling this until. And in the end you'll probably do this:

      my $wait_until = $timing->nextUpdate;
      $wait_until = time() + $Default_Polling_Delay
        # where $Default_Polling_Delay is some reader-defined value
       if $wait_until <= time();

...and then file away $wait_until's value in some internal table that is consulted before polling things, like so:

      foreach my $feed (@FeedObjects) {
        next if $feed->wait_until > time();
         # Don't poll it, there'll be nothing new
        
        ...Else go ahead and poll it, there could be something new...
      }  
The "Is It Time Yet?" Perspective

With this perspective, you polled the RSS feed at some time in the past, and are now considering whether its sy:update* fields' values and/or the skip* and ttl values (which you stored somewhere) say you can now poll the feed (or whether there'd be no point, if the skip*/update* fields say you shouldn't expect any new content). With this perspective, you use code like this:

      ...after calling ->skipHours and/or ->updatePeriod, etc
      $timing->lastPolled( $when_last_polled );
      if( time() < $timing->nextUpdate ) {
        # ...Don't poll it, there'll be nothing new...
      } else {
        ... go ahead and poll it, there could be something new...
      }

Of the two perspectives, this second one seems less efficient to me, but your mileage may vary.

METHODS

This class defines the following methods:

$timing = XML::RSS::Timing->new();

This constructor method creates a new object to be used on figuring feed timing. You should use a new object for each feed you're considering.

$timing->skipHours( hournum, hournum... )

This adds to this $timing object the given list of hours from the given feed's skipHours element. Hours are expressed as integers between 0 to 23 inclusive.

$timing->skipDays( dayname, dayname... )

This adds to this $timing object the given list of days from the given feed's skipDays element. The day name strings have to be from the set: "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday".

$timing->updateFrequency( integer )

This sets the given $timing object's updateFrequency value from the feed's (optional) sy:updateFrequency element. This has to be a nonzero positive integer.

$timing->updateBase( iso_time )

This sets the given $timing object's updateFrequency value from the feed's (optional) sy:updateFrequency element. This has to be a date in one of these formats:

         1997
         1997-07
         1997-07-16
         1997-07-16T19:20
         1997-07-16T19:20Z
         1997-07-16T19:20+01:00
         1997-07-16T19:20:30+01:00
         1997-07-16T19:20:30.45+01:00

The default value is "1970-01-01T00:00Z".

$timing->updatePeriod( periodname )

This sets the given $timing object's updatePeriod value from the feed's (optional) sy:updatePeriod element. This has to be a string from the set: "hourly", "daily", "weekly", "monthly", "yearly".

$timing->lastPolled( epoch_time )

This sets the time when you last polled this feed. If you don't set this, the current time (time()) will be used.

Note that by "polling", I mean not just requesting the feed, but requesting the feed and getting a successful response (regardless of whether it's an HTTP 200 "OK" response or an HTTP 304 "Not Modified" response). If you request a feed and get any sort of error, then don't count that as actually polling the feed.

$timing->ttl( integer )

This sets the given $timing object's "ttl" value from the feed's (optional) ttl element. This has to be a nonzero positive integer. It represents the minimum number of minutes that a reader can go between times it polls the given feed. It is a somewhat obsolescent (but common) predecessor to the sy:update* fields.

("TTL" stands for "time to live", a term borrowed from DNS cache jargon.)

$timing->maxAge( integer )

This sets the given $timing object's "maxAge" value. This has to be a nonzero positive integer.

This value comes not from the feed, but is an (optional) attribute of your client: it denotes the maximum amount of time (in seconds) that your client will go between polling, overriding whatever this feed says.

For example, if a feed says it updates only once a year, minAge is a two months, then this timing object will act as if the feed really said to update every two months.

If you set this, you should probably set it only to a large value, like the number of seconds in two months (62*24*60*60). By default, this is not set, meaning no maximum is enforced. (So if a feed says to update only once a year, then that's what this timing object faithfully implements.)

$timing->minAge( integer )

This sets the given $timing object's "minAge" value. This has to be a nonzero positive integer.

This value comes not from the feed, but is an (optional) attribute of your client: it denotes the minimum amount of time (in seconds) that your client will go between polling, overriding whatever this feed says.

For example, if a feed says it can update every 5 minutes, but your minAge is a half hour, then this timing object will act as if the feed really said to update only half hour at most.

If you set minAge, you should probably set it only to a smallish value, like the number of seconds in an hour (60*60). By default, this is not set, meaning no minimum is enforced.

$epochtime = $timing->nextUpdate();

This method returns the time (in seconds since the epoch) that's the soonest that this feed could return new content.

Note that this doesn't mean you have to actually poll the feed right at that second! (That's why this is called "nextUpdate", not something like "nextPoll".) Instead, I presume your RSS-reader will do something like

run at random intervals and will just look for what feeds' nextUpdate times are less than time() .)

Note that nextUpdate might return the same as this feed's lastPolled value, in the case of a feed without any ttl/sy:*/update* information and where you haven't specified a minAge.

$timing->use_exceptions( 0 )
$timing->use_exceptions( 1 )

This sets whether this object will (with a 1) or won't (with a 0) use exceptions (die's) to signal errors, or whether it will simply muddle through and collect them in complaints.

Basically, errors can come from passing invalid parameters to this module's methods, such as passing "friday" to skipDays (instead of "Friday"), or passing 123 to skipHours (instead of an integer in the range 0-23), etc.

By default, use_exceptions is on.

@complaints = $timing->complaints()

This returns a list of any errors that were encountered in dealing with this $timing object. Errors can result from blocking exceptions (if use_exceptions is off), or from non-fatal warnings of interest while debugging (like if skipHours was told to skip all 24 hours).

If there were no complaints, this will simply return an empty list.

LIMITATIONS

Because of currently common limitations on the size of integers used in reckoning dates, this module cannot process dates (whether as current time, or as updateBase time) before the year 1902 or after the year 2037. This is merely an implementational limitation, not something inherent to the RSS/RDF specs.

BUGS

Although the spec places no such limit, this implementation requires the updateBase's date to be between 1902 and 2038 (noninclusive).

SEE ALSO

The Perl modules XML::RSS , XML::RSS::SimpleGen , XML::RSS::Parser , XML::RSS::SimpleGen , XML::RSS::Tools

http://blogs.law.harvard.edu/tech/rss

http://web.resource.org/rss/1.0/modules/syndication/

http://groups.yahoo.com/group/rss-dev/

http://feedvalidator.org/

AUTHOR

Sean M. Burke, <sburke@cpan.org>, with the helpful consultation of the RSS-DEV group.

COPYRIGHT

Copyright (c) 2004, Sean M. Burke. All rights reserved.

This library is free software; you can redistribute it and/or modify it only under the terms of version 2 of the GNU General Public License (perlgpl).

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

(But if you have any problems with this library, I ask that you let me know.)

AUTHOR

Sean M. Burke <sburke@cpan.org>