The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Catalyst::Plugin::BigSitemap

Version 0.9

Change History

0.9

0.02

0.01

Description

A Catalyst Framework Plugin that allows for automatic generation, and caching to disk of Sitemap and Sitemap Index files to support beyond the 50,000 URL max of a single sitemap, to a maximum of 2.5 billion urls.

cpan install Catalyst::Plugin::BigSitemap

Synopsis

MyCatalystApp.pm

use Catalyst qw/BigSitemap/;

MyCatalystApp.conf (Config::General flavored)

<Plugin::BigSitemap> cache_dir /var/www/myapp/root/sitemaps url_base http://mywebsite/ sitemap_name_format sitemap%d.xml.gz sitemap_index_name sitemap_index.xml </Plugin::BigSitemap>

MyApacheConf.conf

Assuming you have mod_alias installed and you want to store your sitemap files on the disk and serve them straight through apache (Seriously.. if your sitemaps are large enough to warrant using this module, then you definitely don't want to be building and serving from Catalyst for each request..)

<VirtualHost *.80> ... your standard configuration ... Alias /sitemap_index.xml /var/www/mysite/root/sitemaps/sitemap_index.xml AliasMatch ^/sitemap(\d+).xml.gz$ /var/www/mysite/root/sitemaps/sitemap$1.xml.gz </VirtualHost>

MyController.pm

``` #

Actions you want included in your sitemap. In this example, there's a total of 10 urls that will be written

#

sub single_url_action :Local :Args(0) :Sitemap() { ... } sub single_url_with_attrs : Local :Args(0) :Sitemap( loc => 'http://www.mysite/here', changefreq => 'daily', priority => '0.5' ) { ... }

sub multiple_url_action :Local :Args(1) :Sitemap('*') { ... }
sub multiple_url_action_sitemap { my ( $self, $c, $sitemap ) = @_;

# just add 8 more arbitrary urls
my $a = $c->controller('MyController')->action_for('multiple_url_action');
for (my $i = 0; $i < 8; $i++) {
    my $uri = $c->uri_for($a, [ $i, ]);
    $sitemap->add( $uri );
}

}

#

Action to rebuild your sitemaps and writes them to your harddisk

!!! You want to protect this !!!

Best thing to do would be manually instantiate an instance of your

application from the cron job, mark this method private and call it.

You could also go crazy and use WWW::Mechanize .. or hell.. leave it

public and call it from your browser.. your call. I wouldn't do that,

though ;)

Your old sitemap files will automatically be overwritten each time this

is called.

#

sub rebuild_cache :Private { my ( $self, $c ) = @; $c->writesitemap_cache(); }

#

Serving the sitemap files is best to do directly through apache..

New version of catalyst have depreciated regex actions, which

makes doing sitemap files a little more difficult (though you

can still manually include support for regex actions)

Also, if you only have a single sitemap, and want to use this like

Catalyst::Plugin::Sitemap, see sub single_sitemap below.

#

sub sitemap_index :Private { my ( $self, $c ) = @_;

my $smi_xml = $c->sitemap_builder->sitemap_index->as_xml;
$c->response->body( $smi_xml );

}

sub single_sitemap :Private { my ( $self, $c ) = @_;

my $sm_xml = $c->sitemap_builder->sitemap(0)->as_xml;
$c->response->body( $sm_xml );

} ```

Note

This is designed to almost be a drop-in replacement for the existing Catalyst::Plugin::Sitemap, and the URL attributes work the exact same way. Your controller actions are attributed in the EXACT same way.
The only difference is when you want to serve your sitemap file.

TODO List