WWW::GoKGS - KGS Go Server (http://www.gokgs.com/) Scraper
use WWW::GoKGS; my $gokgs = WWW::GoKGS->new( from => 'user@example.com' ); # Game archives my $game_archives_1 = $gokgs->scrape( '/gameArchives.jsp?user=foo' ); my $game_archives_2 = $gokgs->game_archives->query( user => 'foo' ); # Top 100 players my $top_100_1 = $gokgs->scrape( '/top100.jsp' ); my $top_100_2 = $gokgs->top_100->query; # List of tournaments my $tourn_list_1 = $gokgs->scrape( '/tournList.jsp?year=2014' ); my $tourn_list_2 = $gokgs->tourn_list->query( year => 2014 ); # Information for the tournament my $tourn_info_1 = $gokgs->scrape( '/tournInfo.jsp?id=123' ); my $tourn_info_2 = $gokgs->tourn_info->query( id => 123 ); # The tournament entrants my $tourn_entrants_1 = $gokgs->scrape( '/tournEntrans.jsp?id=123&sort=n' ); my $tourn_entrants_2 = $gokgs->tourn_entrants->query( id => 123, sort => 'n' ); # The tournament games my $tourn_games_1 = $gokgs->scrape( '/tournGames.jsp?id=123&round=1' ); my $tourn_games_2 = $gokgs->tourn_games->query( id => 123, round => 1 ); # List of time zones my $tz_list_1 = $gokgs->scrape( '/tzList.jsp' ); my $tz_list_2 = $gokgs->tz_list->query;
This module is a KGS Go Server (http://www.gokgs.com/) scraper. KGS allows the users to play a board game called go a.k.a. baduk (Korean) or weiqi (Chinese). Although the web server provides resources generated dynamically, such as Game Archives, they are formatted as HTML, the only format. This module provides yet another representation of those resources, Perl data structure.
http://www.gokgs.com/
This class maps a URI preceded by http://www.gokgs.com/ to a proper scraper. The supported resources on KGS are as follows:
Handled by WWW::GoKGS::Scraper::GameArchives.
Handled by WWW::GoKGS::Scraper::Top100.
Handled by WWW::GoKGS::Scraper::TournList, WWW::GoKGS::Scraper::TournInfo, WWW::GoKGS::Scraper::TournEntrants and WWW::GoKGS::Scraper::TournGames.
Handled by WWW::GoKGS::Scraper::TzList.
Can be used to get or set a user agent object which is used to GET the requested resource. Defaults to LWP::RobotUA object which consults http://www.gokgs.com/robots.txt before sending HTTP requests, and also sets a proper delay between requests.
GET
http://www.gokgs.com/robots.txt
NOTE: LWP::RobotUA fails to read /robots.txt since the KGS web server doesn't returns the Content-Type response header as of June 23rd, 2014. This module can not solve this problem.
LWP::RobotUA
/robots.txt
You can also set your own user agent object which inherits from LWP::UserAgent as follows:
use LWP::UserAgent; $gokgs->user_agent( LWP::UserAgent->new( agent => 'MyAgent/1.00' ) );
NOTE: You should set a delay between requests to avoid overloading the KGS server.
Returns a WWW::GoKGS::Scraper::GameArchives object.
Returns to a WWW::GoKGS::Scraper::Top100 object.
Returns a WWW::GoKGS::Scraper::TournList object.
Returns a WWW::GoKGS::Scraper::TournInfo object.
Returns a WWW::GoKGS::Scraper::TournEntrants object.
Returns a WWW::GoKGS::Scraper::TournGames object.
Returns a WWW::GoKGS::Scraper::TzList object.
Can be used to get or set your email address which is used to send the From request header that indicates who is making the request.
Can be used to get or set the product token that is used to send the User-Agent request header.
A shortcut for:
my $response = $gokgs->user_agent->get( URI->new(...) );
This method is used by scrape method to GET the requested resource. You can override this method by subclassing.
scrape
Can be used to get or set a cookie jar object to use.
Returns a scraper object which can scrape the resource specified by the given URL. If the scraper object does not exist, then undef is returned. This method can be used to check whether $gokgs can scrape the resource.
undef
$gokgs
my $uri = URI->new( 'http://www.gokgs.com/gameArchives.jsp?user=foo' ); my $game_archives = $gokgs->game_archives->scrape( $uri );
See WWW::GoKGS::Scraper::GameArchives for details.
my $uri = URI->new( 'http://www.gokgs.com/top100.jsp' ); my $top_100 = $gokgs->top_100->scrape( $uri );
See WWW::GoKGS::Scraper::Top100 for details.
my $uri = URI->new( 'http://www.gokgs.com/tournList.jsp?year=2014' ); my $tourn_list = $gokgs->tourn_list->scrape( $uri );
See WWW::GoKGS::Scraper::TournList for details.
my $uri = URI->new( 'http://www.gokgs.com/tournInfo.jsp?id=123' ); my $tourn_info = $gokgs->tourn_info->scrape( $uri );
See WWW::GoKGS::Scraper::TournInfo for details.
my $uri = URI->new( 'http://www.gokgs.com/tournEntrants.jsp?id=123&s=n' ); my $tourn_entrants = $gokgs->tourn_entrants->scrape( $uri );
See WWW::GoKGS::Scraper::TournEntrants for details.
my $uri = URI->new( 'http://www.gokgs.com/tournGames.jsp?id=123&round=1' ); my $tourn_games = $gokgs->tourn_games->scrape( $uri );
See WWW::GoKGS::Scraper::TournGames for details.
my $uri = URI->new( 'http://www.gokgs.com/tzList.jsp' ); my $tz_list = $gokgs->tz_list->scrape( $uri );
See WWW::GoKGS::Scraper::TzList for details.
Returns a scraper object which can scrape a resource located at $path on KGS. If the scraper object does not exist, then undef is returned.
$path
my $game_archives = $gokgs->get_scraper( '/gameArchives.jsp' ); # => WWW::GoKGS::Scraper::GameArchives object
Given a subref, applies the subroutine to each scraper object in turn. The callback routine is called with two parameters; the path to the resource on KGS and the scraper object which can scrape the resource.
$gokgs->each_scraper(sub { my $path = shift; # => "/gameArchives.jsp" my $scraper = shift; # isa WWW::GoKGS::Scraper::GameArchives # overwrite "user_agent" attributes of all the scraper objects $scraper->user_agent( $gokgs->user_agent ); });
This module throws the following exceptions:
This message is printed by the constructor of LWP::RobotUA. You must provide your email address when you use the module.
my $gokgs = WWW::GoKGS->new( from => 'user@example.com' );
You tried to scrape a resource which $gokgs can't handle. Use can_scrape before invoke the scrape method.
can_scrape
# scrape safely if ( $gokgs->can_scrape('/fooBar.jsp') ) { my $result = $gokgs->scrape('/fooBar.jsp'); }
$gokgs failed to GET the requested resource. The reason phrase is added to the end of the message.
Although KGS website allows you to set a locale and time zone by using HTTP cookie, this module ignores the settings. The scrapers assume the locale is set to en_US, and the time zone GMT.
en_US
GMT
# not supported $gokgs->user_agent->cookie_jar(...);
Some tests for scrapers send HTTP requests to GET resources on KGS. When you run ./Build test, they are skipped by default to avoid overloading the KGS server. To run those tests, you have to set AUTHOR_TESTING to true explicitly:
./Build test
AUTHOR_TESTING
$ perl Build.PL $ env AUTHOR_TESTING=1 ./Build test
Author tests are run by Travis CI once a day. You can visit the website to check whether the tests passed or not.
Thanks to wms, the author of KGS Go Server, we can enjoy playing go online for free.
KGS Go Server, Web::Scraper
Ryo Anazawa (anazawa@cpan.org)
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
To install WWW::GoKGS, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::GoKGS
CPAN shell
perl -MCPAN -e shell install WWW::GoKGS
For more information on module installation, please visit the detailed CPAN module installation guide.