The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

MediaWiki::Bot - a high-level bot framework for interacting with MediaWiki wikis

VERSION

version 5.005006

SYNOPSIS

use MediaWiki::Bot;

my $bot = MediaWiki::Bot->new({
    assert      => 'bot',
    host        => 'de.wikimedia.org',
    login_data  => { username => "Mike's bot account", password => "password" },
});

my $revid = $bot->get_last("User:Mike.lifeguard/sandbox", "Mike.lifeguard");
print "Reverting to $revid\n" if defined($revid);
$bot->revert('User:Mike.lifeguard', $revid, 'rvv');

DESCRIPTION

MediaWiki::Bot is a framework that can be used to write bots which interface with the MediaWiki API (http://en.wikipedia.org/w/api.php).

METHODS

new

my $bot = MediaWiki::Bot({
    host    => 'en.wikipedia.org',
});

Calling MediaWiki::Bot->new() will create a new MediaWiki::Bot object. The only parameter is a hashref with keys:

For example:

my $bot = MediaWiki::Bot->new({
    assert      => 'bot',
    protocol    => 'https',
    host        => 'secure.wikimedia.org',
    path        => 'wikipedia/meta/w',
    login_data  => { username => "Mike's bot account", password => "password" },
});

For backward compatibility, you can specify up to three parameters:

my $bot = MediaWiki::Bot->new('My custom useragent string', $assert, $operator);

This form is deprecated will never do auto-login or autoconfiguration, and emits deprecation warnings.

set_wiki

Set what wiki to use. The parameter is a hashref with keys:

If you don't set any parameter, it's previous value is used. If it has never been set, the default settings are 'http', 'en.wikipedia.org' and 'w'.

For example:

$bot->set_wiki({
    protocol    => 'https',
    host        => 'secure.wikimedia.org',
    path        => 'wikipedia/meta/w',
});

For backward compatibility, you can specify up to two parameters:

$bot->set_wiki($host, $path);

This form is deprecated, and will emit deprecation warnings.

login

This method takes a hashref with keys username and password at a minimum. See "Single User Login"(#Single User Login) and "Basic authentication"(#Basic authentication) for additional options.

Logs the use $username in, optionally using $password. First, an attempt will be made to use cookies to log in. If this fails, an attempt will be made to use the password provided to log in, if any. If the login was successful, returns true; false otherwise.

$bot->login({
    username => $username,
    password => $password,
}) or die "Login failed";

Once logged in, attempt to do some simple auto-configuration. At present, this consists of:

You can skip this autoconfiguration by passing autoconfig => 0

For backward compatibility, you can call this as

$bot->login($username, $password);

This form is deprecated, and will emit deprecation warnings. It will never do autoconfiguration or SUL login.

Single User Login

On WMF wikis, do_sul specifies whether to log in on all projects. The default is false. But even when false, you still get a CentralAuth cookie for, and are thus logged in on, all languages of a given domain (*.wikipedia.org, for example). When set, a login is done on each WMF domain so you are logged in on all ~800 content wikis. Since *.wikimedia.org is not possible, we explicitly include meta, commons, incubator, and wikispecies.

Basic authentication

If you need to supply basic auth credentials, pass a hashref of data as described by LWP::UserAgent:

$bot->login({
    username    => $username,
    password    => $password,
    basic_auth  => {    netloc  => "private.wiki.com:80",
                        realm   => "Authentication Realm",
                        uname   => "Basic auth username",
                        pass    => "password",
                    }
}) or die "Couldn't log in";

logout

$bot->logout();

The logout method logs the bot out of the wiki. This invalidates all login cookies.

edit

my $text = $bot->get_text('My page');
$text .= "\n\n* More text\n";
$bot->edit({
    page    => 'My page',
    text    => $text,
    summary => 'Adding new content',
    section => 'new',
});

This method edits a wiki page, and takes a hashref of data with keys:

An MD5 hash is sent to guard against data corruption while in transit.

You can also call this as:

$bot->edit($page, $text, $summary, $is_minor, $assert, $markasbot);

This form is deprecated, and will emit deprecation warnings.

move

$bot->move($from_title, $to_title, $reason, $options_hashref);

This moves a wiki page.

If you wish to specify more options (like whether to suppress creation of a redirect), use $options_hashref, which has keys:

get_history

my @hist = $bot->get_history($title, $limit, $revid, $direction);

Returns an array containing the history of the specified $page_title, with $limit number of revisions (default is as many as possible).

The array returned contains hashrefs with keys: revid, user, comment, minor, timestamp_date, and timestamp_time.

get_text

Returns an the wikitext of the specified $page_title. The second parameter is $revid - if defined, returns the text of that revision; the third is $section_number - if defined, returns the text of that section.

A blank page will return wikitext of "" (which evaluates to false in Perl, but is defined); a nonexistent page will return undef (which also evaluates to false in Perl, but is obviously undefined). You can distinguish between blank and nonexistent pages by using defined:

my $wikitext = $bot->get_text('Page title');
print "Wikitext: $wikitext\n" if defined $wikitext;

get_id

Returns the id of the specified $page_title. Returns undef if page does not exist.

my $pageid = $bot->get_id("Main Page");
die "Page doesn't exist\n" if !defined($pageid);

get_pages

Returns the text of the specified pages in a hashref. Content of undef means page does not exist. Also handles redirects or article names that use namespace aliases.

my @pages = ('Page 1', 'Page 2', 'Page 3');
my $thing = $bot->get_pages(\@pages);
foreach my $page (keys %$thing) {
    my $text = $thing->{$page};
    print "$text\n" if defined($text);
}

get_image

$buffer = $bot->get_image('File::Foo.jpg', {width=>256, height=>256});

Download an image from a wiki. This is derived from a similar function in MediaWiki::API. This one allows the image to be scaled down by passing a hashref with height & width parameters.

It returns raw data in the original format. You may simply spew it to a file, or process it directly with a library such as Imager.

use File::Slurp qw(write_file);
my $img_data = $bot->get_image('File::Foo.jpg');
write_file( 'Foo.jpg', {binmode => ':raw'}, \$img_data );

Images are scaled proportionally. (height/width) will remain constant, except for rounding errors.

Height and width parameters describe the maximum dimensions. A 400x200 image will never be scaled to greater dimensions. You can scale it yourself; having the wiki do it is just lazy & selfish.

revert

Reverts the specified $page_title to $revid, with an edit summary of $summary. A default edit summary will be used if $summary is omitted.

my $revid = $bot->get_last("User:Mike.lifeguard/sandbox", "Mike.lifeguard");
print "Reverting to $revid\n" if defined($revid);
$bot->revert('User:Mike.lifeguard', $revid, 'rvv');

undo

$bot->undo($title, $revid, $summary, $after);

Reverts the specified $revid, with an edit summary of $summary, using the undo function. To undo all revisions from $revid up to but not including this one, set $after to another revid. If not set, just undo the one revision ($revid).

See http://www.mediawiki.org/wiki/API:Edit#Parameters.

get_last

Returns the revid of the last revision to $page not made by $user. undef is returned if no result was found, as would be the case if the page is deleted.

my $revid = $bot->get_last('User:Mike.lifeguard/sandbox', 'Mike.lifeguard');
if defined($revid) {
    print "Reverting to $revid\n";
    $bot->revert('User:Mike.lifeguard', $revid, 'rvv');
}

update_rc

This method is deprecated, and will emit deprecation warnings. Replace calls to update_rc() with calls to the newer recentchanges(), which returns all available data, including rcid.

Returns an array containing the $limit most recent changes to the wiki's main namespace. The array contains hashrefs with keys title, revid, old_revid, and timestamp.

my @rc = $bot->update_rc(5);
foreach my $hashref (@rc) {
    my $title = $hash->{'title'};
    print "$title\n";
}

The "Options hashref"(#Options hashref) is also available:

# Use a callback for incremental processing:
my $options = { hook => \&mysub, };
$bot->update_rc($options);
sub mysub {
    my ($res) = @_;
    foreach my $hashref (@$res) {
        my $page = $hashref->{'title'};
        print "$page\n";
    }
}

recentchanges($wiki_hashref, $options_hashref)

Returns an array of hashrefs containing recentchanges data.

The first parameter is a hashref with the following keys:

An "Options hashref"(#Options hashref) can be used as the second parameter:

my @rc = $bot->recentchanges({ ns => 4, limit => 100 });
foreach my $hashref (@rc) {
    print $hashref->{title} . "\n";
}

# Or, use a callback for incremental processing:
$bot->recentchanges({ ns => [0,1], limit => 500 }, { hook => \&mysub });
sub mysub {
    my ($res) = @_;
    foreach my $hashref (@$res) {
        my $page = $hashref->{title};
        print "$page\n";
    }
}

The hashref returned might contain the following keys:

For backwards compatibility, the previous method signature is still supported:

$bot->recentchanges($ns, $limit, $options_hashref);

Returns an array containing a list of all pages linking to $page.

Additional optional parameters are:

A typical query:

my @links = $bot->what_links_here("Meta:Sandbox",
    undef, 1,
    { hook=>\&mysub }
);
sub mysub{
    my ($res) = @_;
    foreach my $hash (@$res) {
        my $title = $hash->{'title'};
        my $is_redir = $hash->{'redirect'};
        print "Redirect: $title\n" if $is_redir;
        print "Page: $title\n" unless $is_redir;
    }
}

Transclusions are no longer handled by what_links_here() - use "list_transclusions" instead.

list_transclusions

Returns an array containing a list of all pages transcluding $page.

Other parameters are:

A typical query:

$bot->list_transclusions("Template:Tlx", undef, 4, {hook => \&mysub});
sub mysub{
    my ($res) = @_;
    foreach my $hash (@$res) {
        my $title = $hash->{'title'};
        my $is_redir = $hash->{'redirect'};
        print "Redirect: $title\n" if $is_redir;
        print "Page: $title\n" unless $is_redir;
    }
}

get_pages_in_category

Returns an array containing the names of all pages in the specified category (include the Category: prefix). Does not recurse into sub-categories.

my @pages = $bot->get_pages_in_category('Category:People on stamps of Gabon');
print "The pages in Category:People on stamps of Gabon are:\n@pages\n";

The options hashref is as described in "Options hashref"(#Options hashref). Use { max => 0 } to get all results.

get_all_pages_in_category

my @pages = $bot->get_all_pages_in_category($category, $options_hashref);

Returns an array containing the names of all pages in the specified category (include the Category: prefix), including sub-categories. The $options_hashref is described fully in "Options hashref"(#Options hashref).

linksearch

Runs a linksearch on the specified $link and returns an array containing anonymous hashes with keys 'url' for the outbound URL, and 'title' for the page the link is on.

Additional parameters are:

purge_page

Purges the server cache of the specified $page. Returns true on success; false on failure. Pass an array reference to purge multiple pages.

If you really care, a true return value is the number of pages successfully purged. You could check that it is the same as the number you wanted to purge - maybe some pages don't exist, or you passed invalid titles, or you aren't allowed to purge the cache:

my @to_purge = ('Main Page', 'A', 'B', 'C', 'Very unlikely to exist');
my $size = scalar @to_purge;

print "all-at-once:\n";
my $success = $bot->purge_page(\@to_purge);

if ($success == $size) {
    print "@to_purge: OK ($success/$size)\n";
}
else {
    my $missed = @to_purge - $success;
    print "We couldn't purge $missed pages (list was: "
        . join(', ', @to_purge)
        . ")\n";
}

# OR
print "\n\none-at-a-time:\n";
foreach my $page (@to_purge) {
    my $ok = $bot->purge_page($page);
    print "$page: $ok\n";
}

get_namespace_names

my %namespace_names = $bot->get_namespace_names();

Returns a hash linking the namespace id, such as 1, to its named equivalent, such as "Talk".

image_usage

Gets a list of pages which include a certain $image. Include the File: namespace prefix to avoid incurring an extra round-trip (which will also emit a deprecation warnings).

Additional parameters are:

Or, make use of the "Options hashref"(#Options hashref) to do incremental processing:

$bot->image_usage("File:Albert Einstein Head.jpg",
    undef, undef,
    { hook=>\&mysub, max=>5 }
);
sub mysub {
    my $res = shift;
    foreach my $page (@$res) {
        my $title = $page->{'title'};
        print "$title\n";
    }
}

global_image_usage($image, $results, $filterlocal)

Returns an array of hashrefs of data about pages which use the given image.

my @data = $bot->global_image_usage('File:Albert Einstein Head.jpg');

The keys in each hashref are title, url, and wiki. $results is the maximum number of results that will be returned (not the maximum number of requests that will be sent, like max in the "Options hashref"(#Options hashref)); the default is to attempt to fetch 500 (set to 0 to get all results). $filterlocal will filter out local uses of the image.

A backward-compatible call to "image_usage". You can provide only the image title.

This method is deprecated, and will emit deprecation warnings.

is_blocked

my $blocked = $bot->is_blocked('User:Mike.lifeguard');

Checks if a user is currently blocked.

test_blocked

Retained for backwards compatibility. Use "is_blocked" for clarity.

This method is deprecated, and will emit deprecation warnings.

test_image_exists

Checks if an image exists at $page.

If you pass in an arrayref of images, you'll get out an arrayref of results.

my $exists = $bot->test_image_exists('File:Albert Einstein Head.jpg');
if ($exists == 0) {
    print "Doesn't exist\n";
}
elsif ($exists == 1) {
    print "Exists locally\n";
}
elsif ($exists == 2) {
    print "Exists on Commons\n";
}
elsif ($exists == 3) {
    print "Page exists, but no image\n";
}

get_pages_in_namespace

$bot->get_pages_in_namespace($namespace, $limit, $options_hashref);

Returns an array containing the names of all pages in the specified namespace. The $namespace_id must be a number, not a namespace name.

Setting $page_limit is optional, and specifies how many items to retrieve at once. Setting this to 'max' is recommended, and this is the default if omitted. If $page_limit is over 500, it will be rounded up to the next multiple of 500. If $page_limit is set higher than you are allowed to use, it will silently be reduced. Consider setting key 'max' in the "Options hashref"(#Options hashref) to retrieve multiple sets of results:

# Gotta get 'em all!
my @pages = $bot->get_pages_in_namespace(6, 'max', { max => 0 });

count_contributions

my $count = $bot->count_contributions($user);

Uses the API to count $user's contributions.

timed_count_contributions

($timed_edits_count, $total_count) = $bot->timed_count_contributions($user, $days);

Uses the API to count $user's contributions in last number of $days and total number of user's contributions (if needed).

Example: If you want to get user contribs for last 30 and 365 days, and total number of edits you would write something like this:

my ($last30days, $total) = $bot->timed_count_contributions($user, 30);
my $last365days = $bot->timed_count_contributions($user, 365);

You could get total number of edits also by separately calling count_contributions like this:

my $total = $bot->count_contributions($user);

and use timed_count_contributions only in scalar context, but that would mean one more call to server (meaning more server load) of which you are excused as timed_count_contributions returns array with two parameters.

last_active

my $latest_timestamp = $bot->last_active($user);

Returns the last active time of $user in YYYY-MM-DDTHH:MM:SSZ.

recent_edit_to_page

 my ($timestamp, $user) = $bot->recent_edit_to_page($title);

Returns timestamp and username for most recent (top) edit to $page.

get_users

my @recent_editors = $bot->get_users($title, $limit, $revid, $direction);

Gets the most recent editors to $page, up to $limit, starting from $revision and going in $direction.

was_blocked

for ("Mike.lifeguard", "Jimbo Wales") {
    print "$_ was blocked\n" if $bot->was_blocked($_);
}

Returns whether $user has ever been blocked.

test_block_hist

Retained for backwards compatibility. Use "was_blocked" for clarity.

This method is deprecated, and will emit deprecation warnings.

expandtemplates

my $expanded = $bot->expandtemplates($title, $wikitext);

Expands templates on $page, using $text if provided, otherwise loading the page text automatically.

get_allusers

my @users = $bot->get_allusers($limit, $user_group, $options_hashref);

Returns an array of all users. Default $limit is 500. Optionally specify a $group (like 'sysop') to list that group only. The last optional parameter is an "Options hashref"(#Options hashref).

db_to_domain

Converts a wiki/database name (enwiki) to the domain name (en.wikipedia.org).

my @wikis = ("enwiki", "kowiki", "bat-smgwiki", "nonexistent");
foreach my $wiki (@wikis) {
    my $domain = $bot->db_to_domain($wiki);
    next if !defined($domain);
    print "$wiki: $domain\n";
}

You can pass an arrayref to do bulk lookup:

my @wikis = ("enwiki", "kowiki", "bat-smgwiki", "nonexistent");
my $domains = $bot->db_to_domain(\@wikis);
foreach my $domain (@$domains) {
    next if !defined($domain);
    print "$domain\n";
}

domain_to_db

my $db = $bot->domain_to_db($domain_name);

As you might expect, does the opposite of "domain_to_db": Converts a domain name (meta.wikimedia.org) into a database/wiki name (metawiki).

diff

This allows retrieval of a diff from the API. The return is a scalar containing the HTML table of the diff. Options are passed as a hashref with keys:

prefixindex

This returns an array of hashrefs containing page titles that start with the given $prefix. The hashref has keys 'title' and 'redirect' (present if the page is a redirect, not present otherwise).

Additional parameters are:

This is a simple search for your $search_term in page text. It returns an array of page titles matching.

Additional optional parameters are:

Or, use a callback for incremental processing:

my @pages = $bot->search("Mike.lifeguard", 2, { hook => \&mysub });
sub mysub {
    my ($res) = @_;
    foreach my $hashref (@$res) {
        my $page = $hashref->{'title'};
        print "$page\n";
    }
}

get_log

This fetches log entries, and returns results as an array of hashes. The first parameter is a hashref with keys:

The second is the familiar "Options hashref"(#Options hashref).

my $log = $bot->get_log({
        type => 'block',
        user => 'User:Mike.lifeguard',
    });
foreach my $entry (@$log) {
    my $user = $entry->{'title'};
    print "$user\n";
}

$bot->get_log({
        type => 'block',
        user => 'User:Mike.lifeguard',
    },
    { hook => \&mysub, max => 10 }
);
sub mysub {
    my ($res) = @_;
    foreach my $hashref (@$res) {
        my $title = $hashref->{'title'};
        print "$title\n";
    }
}

is_g_blocked

my $is_globally_blocked = $bot->is_g_blocked('127.0.0.1');

Returns what IP/range block currently in place affects the IP/range. The return is a scalar of an IP/range if found (evaluates to true in boolean context); undef otherwise (evaluates false in boolean context). Pass in a single IP or CIDR range.

was_g_blocked

print "127.0.0.1 was globally blocked\n" if $bot->was_g_blocked('127.0.0.1');

Returns whether an IP/range was ever globally blocked. You should probably call this method only when your bot is operating on Meta - this method will warn if not.

was_locked

my $was_locked = $bot->was_locked('Mike.lifeguard');

Returns whether a user was ever locked. You should probably call this method only when your bot is operating on Meta - this method will warn if not.

get_protection

Returns data on page protection as a array of up to two hashrefs. Each hashref has a type, level, and expiry. Levels are 'sysop' and 'autoconfirmed'; types are 'move' and 'edit'; expiry is a timestamp. Additionally, the key 'cascade' will exist if cascading protection is used.

my $page = 'Main Page';
$bot->edit({
    page    => $page,
    text    => rand(),
    summary => 'test',
}) unless $bot->get_protection($page);

You can also pass an arrayref of page titles to do bulk queries:

my @pages = ('Main Page', 'User:Mike.lifeguard', 'Project:Sandbox');
my $answer = $bot->get_protection(\@pages);
foreach my $title (keys %$answer) {
    my $protected = $answer->{$title};
    print "$title is protected\n" if $protected;
    print "$title is unprotected\n" unless $protected;
}

is_protected

This is a synonym for "get_protection", which should be used in preference.

This method is deprecated, and will emit deprecation warnings.

patrol

$bot->patrol($rcid);

Marks a page or revision identified by the $rcid as patrolled. To mark several RCIDs as patrolled, you may pass an arrayref of them. Returns false and sets $bot->{error} if the account cannot patrol.

email

$bot->email($user, $subject, $body);

This allows you to send emails through the wiki. All 3 of $user (without the User: prefix), $subject and $body are required. If $user is an arrayref, this will send the same email (subject and body) to all users.

top_edits

Returns an array of the page titles where the $user is the latest editor. The second parameter is the familiar $options_hashref.

my @pages = $bot->top_edits("Mike.lifeguard", {max => 5});
foreach my $page (@pages) {
    $bot->rollback($page, "Mike.lifeguard");
}

Note that accessing the data with a callback happens before filtering the top edits is done. For that reason, you should use "contributions" if you need to use a callback. If you use a callback with top_edits(), you will not necessarily get top edits returned. It is only safe to use a callback if you check that it is a top edit:

$bot->top_edits("Mike.lifeguard", { hook => \&rv });
sub rv {
    my $data = shift;
    foreach my $page (@$data) {
        if (exists($page->{'top'})) {
            $bot->rollback($page->{'title'}, "Mike.lifeguard");
        }
    }
}

contributions

my @contribs = $bot->contributions($user, $namespace, $options);

Returns an array of hashrefs of data for the user's contributions. $ns can be an arrayref of namespace numbers. $options can be specified as in "linksearch".

Specify an arrayref of users to get results for multiple users.

upload

$bot->upload({ data => $file_contents, summary => 'uploading file' });
$bot->upload({ file => $file_name,     title   => 'Target filename.png' });

Upload a file to the wiki. Specify the file by either giving the filename, which will be read in, or by giving the data directly.

upload_from_url

Upload file directly from URL to the wiki. Specify URL, the new filename and summary. Summary and new filename are optional.

$bot->upload_from_url({ url => 'http://some.domain.ext/pic.png', title => 'Target_filename.png', summary => 'uploading new pic' });

If on your target wiki is enabled uploading from URL, meaning $wgAllowCopyUploads is set to true in LocalSettings.php and you have appropriate user rights, you can use this function to upload files to your wiki directly from remote server.

usergroups

Returns a list of the usergroups a user is in:

my @usergroups = $bot->usergroups('Mike.lifeguard');

Options hashref

This is passed through to the lower-level interface MediaWiki::API, and is fully documented there.

The hashref can have 3 keys:

ERROR HANDLING

All functions will return undef in any handled error situation. Further error data is stored in $bot->{error}->{code} and $bot->{error}->{details}.

AVAILABILITY

The project homepage is https://metacpan.org/module/MediaWiki::Bot.

The latest version of this module is available from the Comprehensive Perl Archive Network (CPAN). Visit http://www.perl.com/CPAN/ to find a CPAN site near you, or see https://metacpan.org/module/MediaWiki::Bot/.

SOURCE

The development version is on github at http://github.com/MediaWiki-Bot/MediaWiki-Bot and may be cloned from git://github.com/MediaWiki-Bot/MediaWiki-Bot.git

BUGS AND LIMITATIONS

You can make new bug reports, and view existing ones, through the web interface at https://github.com/MediaWiki-Bot/MediaWiki-Bot/issues.

AUTHORS

COPYRIGHT AND LICENSE

This software is Copyright (c) 2013 by the MediaWiki::Bot team perlwikibot@googlegroups.com.

This is free software, licensed under:

The GNU General Public License, Version 3, June 2007