The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Catalyst::Plugin::BigSitemap - Auto-generated Sitemaps for up to 2.5 billion URLs.

VERSION

0.02

DESCRIPTION

A compete drop-in replacement for Catalyst::Plugin::Sitemap but gives you the following features as well.

Additional Functionality:

1. Automatic generation of a sitemap index with multiple sitemap files.

2. Writing sitemap files to disk as XML (can be gzipped as well).

3. Configurable naming of sitemap and sitemap index files.

4. 2,500,000,000 URL maximum

Even if you don't have a site with more than 50,000 URLS, this module can write (and rewrite) your sitemap files to disk (something the original Catalyst::Plugin::Sitemap module lacks), so you may wish to consider this module for no other reason than that.

SYNOPSIS

    #
    # Actions you want included in your sitemap.  In this example, there's a total of 10 urls that will be written
    #

    # loc will be automatically resolved for this action
    sub single_url_action :Local :Args(0) :Sitemap() { ... }
    
    # see WWW::Sitemap::XML::URL for all the possible attributes 
    sub single_url_with_attrs : Local :Args(0) :Sitemap( changefreq => 'daily', priority => '0.5' ) { ... }
    
    # If you have multiple urls that map to an action, you need to specify the string '*' as it's only attribute
    # then you must specify a method with the same name plus '_sitemap' as your action method.  Within that action,
    # you'll need to call the ->add method on your sitemap.  
    sub multiple_url_action :Local :Args(1) :Sitemap('*') { ... }    
    sub multiple_url_action_sitemap {
        my ( $self, $c, $sitemap ) = @_;
        
        my $a = $c->controller('MyController')->action_for('multiple_url_action');
        for (my $i = 0; $i < 8; $i++) {
            my $uri = $c->uri_for($a, [ $i, ]);
            $sitemap->add( $uri );
        }
        
    }

    #
    # Action to rebuild your sitemap -- you want to protect this!
    # Best thing to do would be manually instantiate an instance of your
    # application from the cron job, mark this method private and call it.  
    # You could also go crazy and use WWW::Mechanize .. or hell.. leave it
    # public and call it from your browser.. your call.  I wouldn't do that, 
    # though ;) 
    # Your old sitemap files will automatically be overwritten.  
    #
    
    sub rebuild_cache :Private {
        my ( $self, $c ) = @_;
        $c->write_sitemap_cache();
    }
    
    #
    # Serving the sitemap files is best to do directly through apache.. 
    # New version of catalyst have depreciated regex actions, which
    # makes doing sitemap files a little more difficult (though you
    # can still manually include support for regex actions)
    # 
    # Also, if you only have a single sitemap, and want to use this like 
    # Catalyst::Plugin::Sitemap, see sub single_sitemap below. 
    #
    
    sub sitemap_index :Private {
        my ( $self, $c ) = @_;
        
        my $smi_xml = $c->sitemap_builder->sitemap_index->as_xml;
        $c->response->body( $smi_xml );
    }
    
    sub single_sitemap :Private {
        my ( $self, $c ) = @_;
        
        my $sm_xml = $c->sitemap_builder->sitemap(0)->as_xml;
        $c->response->body( $sm_xml );
    }

CONFIGURATION

There are a few configuration settings that must be set for this application to function properly.

cache_dir - required

The absolute filesystem path to where your sitemap file will be written when calling $c->write_sitemap_cache()

url_base - optional: defaults to whichever base url the request is made to

This is the base url that will be used when building the urls for your application.

Note: This is important especially if your rebuild is being launched by a cronjob that's making a request to localhost. In that case, if you fail the specify this setting, all your urls will be resolved to http://localhost/my-action-here/ ... That probably isn't what you want.

Note: The trailing slash is important!

sitemap_name_format - optional: defaults to sitemap%d.xml.gz

A sprintf format string. Your sitemaps will be named beginning with 1 up through the total number of sitemaps that are necessary to build your data. By default, this will end up being something like

Note: The file extension should either be .xml or .xml.gz. The proper type of file will be built depending on which extension you specify.

sitema_index_name - optional: defaults to sitemap_index.xml

Note: Just like with sitename_name_format, .xml or .xml.gz should be specified as the file extension.

Config::General Example

    <Plugin::BigSitemap>
        cache_dir /var/www/myapp/root/sitemaps
        url_base http://mywebsite/
        sitemap_name_format sitemap%d.xml.gz
        sitemap_index_name sitemap_index.xml
    </Plugin::BigSitemap>

ATTRIBUTES

sitemap_builder

A lazy-loaded Catalyst::Plugin::BigSitemap::SitemapBuilder object. If you want access to the individual WWW::Sitemap::XML or the WWW::SitemapIndex::XML file, you'll do that through this object.

sitemap

Provided for compatability with Catalyst::Plugin::Sitemap.

Returns a WWW::Sitemap::XML file containing up to the first 50,000 URLs resolved from your application.

sitemap_as_xml

Provided for compatability with Catalyst::Plugin::Sitemap.

Returns a XML::LibXML::Document representation of the sitemap generated by this module's sitemap attribute.

METHODS

write_sitemap_cache()

Writes your sitemap_index and sitemap files to whichever cache_dir you've specified in your configuration.

On success, returns an array with the absolute path to the sitemap index (element 0), and all sitemap files.

INTERNAL USE METHODS

Methods you shouldn't be calling directly.. They're listed here for documentation purposes.

_get_sitemap_builder()

Returns a sitemap builder object that's fully populated with all the sitemap urls registered. This can take quite some time depending on the number of urls you're registering with the sitemap and how they're being generated.

You shouldn't ever need to call this directly -- it's set as the builder method for the sitemap_builder attribute.

Note: This can take an incredibly long time especially if you have a lot of URLs! Use with care!

_get_sitemap()

Builder method for sitemap attribute.

_get_sitemap_as_xml()

Builder method for sitemap_as_xml attribute.

ACKNOWLEDGEMENTS

This module is based on the great work of two other CPAN authors:

Yanick Champoux - Author of the original Catalyst::Plugin::Sitemap module. Some of his ideas (and some of his module code) was used to make this module.

Alex J. G. Burzyński - Author of WWW::Sitemap::XML, which is the underlying module used by BigSitemap to do it's work.

SEE ALSO

Catalyst::Plugin::Sitemap, WWW::Sitemap::XML, Sitemaps.org, Catalyst Framework

AUTHOR

Derek J. Curtis <djcurtis at summersetsoftware dot com>>

Summerset Software, LLC

http://www.summersetsoftware.com

LICENSE AND COPYRIGHT

Copyright 2013 Derek J. Curtis.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.