The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

MediaWiki - OOP MediaWiki engine client

SYNOPSIS

 use MediaWiki;

 $c = MediaWiki->new;
 $is_ok = $c->setup("config.ini");
 $is_ok = $c->setup({
        'bot' => { 'user' => 'Vasya', 'pass' => '123456' },
        'wiki' => {
                'host => 'en.wikipedia.org',
                'path' => 'w'
        }})
 $is_ok = $c->switch('starwars.wikia.com');
 $is_ok = $c->switch('en.wikipedia.org', 'w', { 'has_query' => 1, 'has_filepath' => 1 });
 $whoami = $c->user();

 $text = $c->text("page_name_here");
 $is_ok = $c->text("page_name_here", "some new text");

 $c->refresh_messages();
 $msg = $c->message("MediaWiki_message_name");

 die unless $c->exists("page_name");

 my($articles_p, $subcats_p) = $c->readcat("category_name");

 $is_ok = $c->upload("image_name", `cat myfoto.jpg`, "some notes", $force);

 $is_ok = $c->block("VasyaPupkin", "2 days");
 $is_ok = $c->unblock("VasyaPupkin");

 $c->{summary} = "Automatic auto-replacements 1.2";
 $c->{minor} = 1;
 $c->{watch} = 1;

 if(!$is_ok)
 {
    $err = $c->{error};
    # do something
 }

 $pg = $c->random();
 $pg = $c->get("page_name");
 $pg = $c->get("page_name", "");
 $pg = $c->get("page_name", "rw");

 $is_ok = $pg->load();
 $is_ok = $pg->save();
 $text = $pg->oldid($old_version_id);
 $text = $pg->content();
 $title = $pg->title();

 $is_ok = $pg->delete();
 $is_ok = $pg->restore();
 $is_ok = $pg->protect();
 $is_ok = $pg->protect($edit_protection);
 $is_ok = $pg->protect($edit_protection, $move_protection);

 $is_ok = $pg->move("new_name");
 $is_ok = $pg->watch();
 $is_ok = $pg->unwatch();

 $is_ok = $pg->upload(`cat myfoto.jpg`, "some notes", $force);

 $is_ok = $pg->block("2 days");
 $is_ok = $pg->unblock();

 $pg->history(sub { my $edit_p = shift; } );
 $pg->history_clear();
 my $edit_p = $pg->last_edit;
 my $edit_p = $pg->find_diff(qr/some_regex/);
 $is_ok = $pg->markpatrolled();
 $is_ok = $pg->revert();

 $pg->{history_step} = 10;

 $is_ok = $pg->replace(sub { my $text_p = shift; } );
 $is_ok = $pg->remove("some_regex_here");
 $is_ok = $pg->remove_template("template_name");

 $pg->{content} = "new text";
 $pg->{summary} = "do something strange";
 $pg->{minor} = 0;
 $pg->{watch} = 1;

Functions and options

Client object (MediaWiki) functions

MediaWiki->new()

Performs basic initialization of the client structure. Returns client object.

$c->setup([ $ini_file_name | $config_hash_pointer ])

Reads configuration file in INI format; also performs login if username and password are specified. If file name is omited, "~/.bot.ini is used.

Configuration file can use [bot], [wiki] and [tmp] sections. Keys 'user' and 'pass' in 'bot' section specify login information, additionally the key 'realm' will trigger basic http authentication instead of a wiki login. 'wiki' section must have 'host' and 'path' keys (for example, host may be 'en.wikipedia.org' and path may be 'w') which specify path to index.php script. Also, the 'wiki' section may specify the 'ssl' key (boolean 0/1) if the server uses an SSL connection. Section 'tmp' and key 'msgcache' specify path to the MediaWiki messages cache.

Options 'has_query' and 'has_filepath' in 'wiki' section enable experimental optimized interfaces. Set has_query to 1 if there is query.php extension (this should reduce traffic usage and servers load). Set has_filepath to 1 if there is Special:Filepath page in target wiki (affects only filepath() and download() functions).

You may specify configuration in hash array (pass pointer to it instead of string with file name). It should contain something like { 'wiki' => { 'host' => ..., 'path' => ... }, 'bot' => { 'user' => ..., 'pass' => ... } } (key of global hash is section and keys of sub-hashes are keys).

$c->login([$user [, $password [, $realm]]])

Performs login if no login information was specified in configuration. Called automatically from setup().

$c->logout([$host])

Removes all HTTP cookies. All following edits will be anonymous until next login() call. If $host parameter is specified, only cookies for selected served (as in 'wiki'->'host' configuration key) are cleared.

$c->switch(($wiki_hash_pointer | $wiki_host [, $wiki_path] [, $wiki_hash_pointer]))

Reconfigures client with specified configuration (this is pointer to hash array describing _only_ 'wiki' section). Tries login with the same username and password if auth info specified. If you have already switched to this wiki (or this is initial wiki, set with $c-setup()>), login attempt will be ommited.

First parameter is either hash pointer ({ 'host' => ..., 'path' => ... }) or host in first parameter and path in second optional parameter. You may add hash array pointer as second or third parameter to set other keys, something like 'has_query'. Call to switch() preserves keys not specified in parameters.

Primary use of this function should be in interwiki bots.

$c->user()

Returns username from configuration file or makes a dummy edit in wiki sandbox to get client IP from page history. Note: no result caching is done.

$c->text( $page_name [, $new_text ])

If $new_text is specified, replaces content of $page_name article with $new_text. Returns current revision text otherwise. Errors: ERR_NOT_FOUND (article not exists).

$c->refresh_messages()

Downloads all MediaWiki messages and saves to messages cache.

$c->message($message_name)

Returns message from cache or undef it cache entry not exists. When no cache is present at all this functions downloads only one message.

$c->exists($page_name)

Returns true value if page exists.

$c->readcat($category_name);

Returns two array references, first with names of all articles, second with names of all subcategories (without 'Category:' namespace prefix).

$c->upload($image_name, $content [, $description [, $force]]);

Uploads an image with name 'Image:$image_name' and with content $content. If description is not specified, empty string will be used. Force flag may be set to 1 to disable warnings. Currently warnings are not handled properly (see "LIMITATIONS"), so force=1 is recommended. That's not default because each rewriting of the image creates new version, no matter are there any differences or not. If you never rewrite image, feel free to set $force to 1.

$c->filepath($image_name)

Returns direct URL for downloading raw image $image_name or undef if image not exists.

$c->download($image_name)

Returns content of $image_name image or undef if not exists.

$c->block($user_name, $block_time)

Blocks specified user from editing. Block time should be in format [0-9]+ (seconds|minutes|hours|days|months|years) or in ctime format.

Note: this operation requires sysop rights.

$c->unblock($user_name)

Unblocks specified user.

Note: this operation requires sysop rights.

$c->random()

Returns page handle for random article (page in namespace 0).

$c->get($page [, $mode])

Returns page handle for specified article. Mode parameter may be "", "r", "w" or "rw" (default "r"). If there is no 'r' in mode, no page content will be fetched.

If there is 'w' flag, page is loaded in Prepared Load Mode. There're some options in edit form required for saving page. When using prepared loading, text is fetched from edit form (not from raw page) with this values. This reduces traffic usage. For normal editing, edit form is loaded before saving.

Note: prepared mode is toggled off after first saving.

Client object (MediaWiki) options

$c->{minor}

If not set, default value for account will be used. If set to 0, major edits are made, it set to 1 - minor edits.

$c->{watch}

If set to 1, edited pages will be included to watch list. If not set, account default will be used; 0 disables adding to list.

$c->{summary}

Short description used by default for all edits.

$c->{error}

Contains advanced error code or 0 if no error/unknown error occured. See also "ERRORS HANDLING"

$c->{on_error}

Callback used each time the error occured.

Page object (MediaWiki::page) functions

$pg->load()

Loads page content.

$pg->save()

Saves changes to this page.

$pg->prepare()

Performs prepared load (do not use this function directly).

$pg->content()

Returns page content.

$pg->oldid($id)

Returns content of an old revision.

$pg->title()

Returns page title.

$pg->delete()

Deletes this page.

Note: this operation requires sysop rights.

$pg->restore()

Restores recently deleted page.

Note: this operation requires sysop rights.

$pg->protect([$edit_mode [, $move_mode]])

Protects page from edits and/or moves. Protection modes: 2 - for sysop only, 1 - for registered users only, 0 - default, means no protection. If no parameters specified, protects against anonymous edits. If only first parameter specified, move mode will be set to same value.

In order to unprotect page, use $pg-protect(0)>.

$pg->move($new_name)

Renames page setting new title to $new_name and creating redirect in place of old article. This is only possible if target article not exists or is redirect without non-redirect versions.

$pg->watch([$unwatch])

Adds page to watch list. If $unwatch is set, removes page from watch list

$pg->unwatch()

Synonym for $pg->watch(1)

$pg->upload($content, [, $description [, $force]])

See $c->upload

$pg->filepath()

See $c->filepath

$pg->download()

See $c->download()

$pg->block($block_time)

See $c->block

$pg->unblock()

See $c->unblock

$pg->history(&cb)

Iterates callback through page history. One parameter is passed, edit info (this is hash reference). Callback should return undef to continue listing of true value to stop it. Returns this true value or undef if all edits listed without interrupting.

Hash reference has the following keys: page - pointer to page handler ($pg) oldid - revision identifier (may be used in call to $pg->oldid()) user - username or ip anon - is 1 if 'user' contains IP address minor - is 1 if this is minor edit comment - contains short comment section - contains section name (so-called autocomment) time - edit time (in format 'HH:MM') date - edit date (in format 'D MONTH YYYY') datetime - contains time and date separated by ', '

Note: this function used the same history cache as last_edit(), revert() etc.

$pg->history_clear()

Clear history cache. This is done automatically when page is modified.

$pg->last_edit()

Return structure of the last edit

$pg->find_diff($regex)

Finds latest edit in which text matched against $regex added and returns it's structure.

$pg->markpatrolled()

Mark latest revision of this page as checked by administrator. This is experimental option and may not present in many MediaWiki installations.

Note: this operation requires sysop rights.

$pg->revert([$user])

Reverts all changes made by last user who edited this page. This functions not uses admin quick-revert interface and can be run by anybody. If $user parameter specified, revert() will do nothing if this user's edits were already reverted (something already reverted it). Usage of this optional parameter is recommended.

Note: MediaWiki message 'Revertpage' will be used as summary.

$pg->replace(&cb)

This is most common implementation of replacements bot. It splits wiki-code to parts which may and which should not be affected (for example, inside pre/nowiki/math tags) and runs callback for each allowed part. Callback gets pointer to text as parameter and may change it (and may not change). If text was not change after work of all callbacks, it will not be saved (this is checked at client-side - that reduces traffic usage).

Note: If page has '{{NO_BOT_TEXT_PROCESSING}}' template, no changes will be done.

$pg->remove($regex)

This function removes all matches against regex specified.

$pg->remove_template($template_name)

This function is wrapper for remove. It removes all matches of template specified.

Page object (MediaWiki::page) options

$pg->{content}

Raw page content. This is needed to set new content for article.

$pg->{minor}

See $c->{minor} - local setting (only for this page handle).

$pg->{watch}

See $c->{watch} - local setting (only for this page handle).

$pg->{summary}

See $c->{summary} - local setting (only for this page handle).

$pg->{history_step}

Number of edits fetched in one time. This field can be used for task-related optimization (increasing it decrease traffic usage and servers load). Default 50.

ERRORS HANDLING

Currently all methods where return value isn't documented return 1 for success and undef for failure. You may check advanced error code in $c->{error}, but now not all errors are properly handled (0 in $c->{error} can mean both success and unknown error).

Also callback may be specified: if there is pointer to subroutine in $c->{on_error} when error occures, it will be called (with no parameters - error code in $c->{error}).

ERR_NO_ERROR = 0

No error or unknown error.

ERR_NO_INIHASH

$c->setup() called with configuration file name but module Config::IniHash not found.

ERR_PARSE_INI

Parser Config::IniHash found fatal error in configuration file.

ERR_NO_AUTHINFO

$c->login() called but no auth info known (bot's username & password)

ERR_NO_MSGCACHE

$c->refresh_messages() called but no path specified in configuration (or Data::Dumper module not found).

ERR_LOGIN_FAILED

Login returned something unexpected (maybe password is incorrect).

ERR_LOOP

Endless loop in some of modules (internal module error or error in wiki engine).

ERR_NOT_FOUND

$c->text() called but page not exists.

EXAMPLE

All examples start with

 use MediaWiki;
 my $c = MediaWiki->new();
 $c->setup();

Very easy example: creating prepared articles

 opendir D, "articles";
 while(defined ($file = readdir(D)))
 {
   if(($file =~ s/\.txt$//) == 1)
   {
      my $text;
      open F, "$file.txt";
      read F, $text, -s F;
      close F;

      $c->text($file, $text);
   }
 }
 closedir D;

Easy example: replacements bot

 for(my $i = 0; $i < 10000; $i ++)
 {
    my $pg = $c->random();
    $pg->replace(\&my_replacements);
 }

More complex example: anti-vandalism bot

 $c->{summary} = "Vandalism: blanking more than 5 times";

 my %users = (); my %articles = ();
 while(1)
 {
    my $pg = $c->random();
    if($pg->content() eq '')
    {
      my $e = $pg->last_edit;
      $blanker = $e->{user};

      $pg->revert();
      $e = $pg->last_edit;

      if($e->{user} eq $blanker) # Only author
      {
         $pg->{content} .= "{{db-author}}"; # Delete note for admins
         $pg->{summary} = "+ {{db-author}}"
         $pg->save();
      }
      else
      {
        $users{$blanker} = 1 + (exists $users{$blanker} ? $users{$blanker} : 0);
        if($users{$blanker} > 5)
        {
          $c->block($blanker, "1 hour");
          delete $users{$blanker};
        }
      }
    }
 }

AUTHOR

Edward Chernenko <edwardspec@gmail.com>

COPYRIGHT

Copyright (C) 2006 Edward Chernenko. This program is protected by Artistic License and can be used and/or distributed by the same rules as perl. All right reserved.

SEE ALSO

CMS::MediaWiki, WWW::Wikipedia, WWW:Mediawiki::Client