Stefan Talpalaru > WWW-phpBB-0.09 > WWW::phpBB

Download:
WWW-phpBB-0.09.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Module Version: 0.09   Source  

NAME ^

WWW::phpBB - phpBB2 forum scraper

SYNOPSIS ^

    use WWW::phpBB;

    # scrape as guest
    my $phpbb = WWW::phpBB->new(
        base_url => 'http://localhost/~stefan/forum1',
        db_host => 'localhost',
        db_user => 'stefan',
        db_passwd => 'somepass',
        db_database => 'stefan',
        db_prefix => 'phpbb2_',
    );

    $phpbb->empty_tables();
    $phpbb->get_users();
    $phpbb->scrape_forum_common();

    # scrape a german forum with a non-standard date format and a custom GET var
    my $phpbb = WWW::phpBB->new(
        base_url => 'http://localhost/~stefan/index.php?mforum=de',
        db_host => 'localhost',
        db_user => 'stefan',
        db_passwd => 'somepass',
        db_database => 'stefan',
        db_prefix => 'phpbb2_',
        post_date_format => qr/(\d+)\s+(\w+),\s+(\d+)\s+(\d+):(\d+)/,
        post_date_pos => [qw(day_of_month month_name year hour minutes)],
        forum_user => 'raDical',
        forum_passwd => 'lfdiugyh',
    );

    # login to access the private memberlist and some private forums
    $phpbb->empty_tables();
    $phpbb->forum_login();
    $phpbb->get_users();
    $phpbb->scrape_forum_common();
    $phpbb->forum_logout();

    # update an already scraped forum, maybe as a daily cron job
    # $phpbb->update_overwrite(1); # don't try to keep modified data
    $phpbb->update_users();
    $phpbb->update_forum_common();

FANCY EXAMPLE ^

    use WWW::phpBB;

    # custom subclass
    package WWW::phpBB::custom;
    use base 'WWW::phpBB';

    # override some methods
    sub forum_url_for_page {
            my $self = shift;
            my ($url, $forum_id, $page) = @_;

            $url =~ s%[^/]*$%%;
            $url .= "forum,$forum_id,$page.html";
            return $url;
    }

    sub topic_url_for_page {
            my $self = shift;
            my ($url, $topic_id, $page) = @_;

            $url =~ s%[^/]*$%%;
            $url .= "topic,$topic_id,$page.html";
            return $url;
    }


    my $phpbb = WWW::phpBB::custom->new(
     base_url => 'http://foobar.foren-city.de',
     db_host => 'localhost',
     db_user => '****',
     db_passwd => '****',
     db_database => '****',
     db_prefix => 'phpbb_',
     verbose => 1,
     months => [qw(jan feb mär apr mai jun jul aug sep okt nov dez)],
     forum_user => '****',
     forum_passwd => '****',
     post_date_format => qr/(\d+)\s+(\w+)\s+(\d+)\s+(\d+):(\d+)/,
     post_date_pos => [qw(day_of_month month_name year hour minutes)],
     reg_date_format => qr/(\d+)\.(\d+)\.(\d+)/,
     reg_date_pos => [qw(day_of_month month year)],
     quote_string => "hat folgendes geschrieben",
     forum_link_regex => qr/forum,(\d+),/,
     topic_link_regex_p => qr/topic,.*#(\d+)/,
     topic_link_regex_t => qr/topic,(\d+),/,
     topic_link1 => "topic,%d.html",
     topic_link2 => "",
     profile_string_occupation => "beruf",
     alternative_page_number_regex_forum => qr/forum,\d+,(\d+)/,
     alternative_page_number_regex_topic => qr/topic,\d+,(\d+)/,
    );

    $phpbb->empty_tables();
    $phpbb->forum_login();
    $phpbb->get_users();
    $phpbb->scrape_forum_common();
    $phpbb->forum_logout();

DESCRIPTION ^

This module can be used to scrape a phpBB2 instalation using the web interface. It requires a local phpBB2 setup (you can download the old 2.x versions from http://sourceforge.net/projects/phpbb/files/phpBB%202/ ) that will be overwritten and it can only access what is available to the web browser (i.e. no private messages or user settings). Make sure the username used during the local installation doesn't exist in the remote forum. Scraping is possible as a guest or as a loged in member. If used with an administrator name and password it will copy all the member e-mails (not just the public ones) allowing them to request a new random password from the new installation site and continue using the forum. The current implementation lacks search support, but this can be fixed by converting the forum to phpBB3 or SMF. The "mforum" script is supported.

REQUIRED MODULES ^

WWW::Mechanize

Compress::Zlib

HTML::TokeParser::Simple

DBI

DBD::mysql

EXPORT ^

None.

CONSTRUCTOR ^

new()

Creates a new WWW::phpBB object.

Required parameters:

Optional parameters:

ACCESSORS ^

The accessors have the same name as the constructor parameters. If called without a param, they return the value. With a param, they set a value.

    $phpbb->max_rows(100);
    print $phpbb->max_tries, "\n";

PUBLIC METHODS ^

$phpbb->empty_tables()

Empties the tables af a local phpBB installation. It leaves the admin account untouched.

$phpbb->forum_login()

Login into the original forum. Useful when access is restricted for a guest.

$phpbb->forum_logout()

$phpbb->get_users()

Scrape user data from the memberlist and profile pages.

$phpbb->scrape_forum_common()

Scrape categories, forums, topics and posts.

$phpbb->update_users()

Update the users for an already scraped forum.

$phpbb->update_forum_common()

Update categories, forums, topics and posts for an already scraped forum.

AUTHOR ^

Stefan Talpalaru, <stefantalpalaru@yahoo.com>

COPYRIGHT AND LICENSE ^

Copyright (c) 2006-2011 by Stefan Talpalaru

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.2 or, at your option, any later version of Perl 5 you may have available.

syntax highlighting: