Ron Savage > WWW-Scraper-Wikipedia-ISO3166 > WWW::Scraper::Wikipedia::ISO3166::Database::Import

Download:
WWW-Scraper-Wikipedia-ISO3166-1.02.tgz

Dependencies

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Module Version: 1.01   Source  

NAME ^

WWW::Scraper::Wikipedia::ISO3166::Database::Import - Part of the interface to www.scraper.wikipedia.iso3166.sqlite

Synopsis ^

See "Synopsis" in WWW::Scraper::Wikipedia::ISO3166.

Description ^

Documents the methods used to populate the SQLite database, www.scraper.wikipedia.iso3166.sqlite, which ships with this distro.

See "Description" in WWW::Scraper::Wikipedia::ISO3166 for a long description.

Distributions ^

This module is available as a Unix-style distro (*.tgz).

See http://savage.net.au/Perl-modules.html for details.

See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing.

Constructor and initialization ^

new(...) returns an object of type WWW::Scraper::Wikipedia::ISO3166::Database::Import.

This is the class's contructor.

Usage: WWW::Scraper::Wikipedia::ISO3166::Database::Import -> new().

This method takes a hash of options.

Call new() as new(option_1 => value_1, option_2 => value_2, ...).

Available options (these are also methods):

o code2 => $2_letter_code

Specifies the code2 of the country whose subcountry page is to be downloaded.

Methods ^

This module is a sub-class of WWW::Scraper::Wikipedia::ISO3166::Database and consequently inherits its methods.

code2($code)

Get or set the 2-letter country code of the country or subcountry being processed.

Also, code2 is an option to "new()".

get_content($element)

Extract, recursively if necessary, the content of the HTML element, as returned from HTML::TreeBuilder's look_down() method.

get_table($node, $column_type, $country_code)

Get the country or subcountry details from the HTML table ($node), as returned from HTML::TreeBuilder's look_down() method.

Use the arrayref $column_type of HTML attributes ('a', 'tt', '-', i.e. none) to determine exactly how to extract the data from the enclosing 'td'.

Use $country_code to handle some special cases, specifically:

o ET => Ethopia
o KM => Comoros
o LB => Lebanon
o TD => Chad

Returns an arrayref of hashrefs, where the (key => value) pair of each hashref are:

o code => $string

The country or subcountry code.

o detail => $arrayref

An indicator as to whether or not the country has subcountries.

o name => $string

The name of the country or subcountry.

new()

See "Constructor and initialization".

parse_country_code_page()

Parse the HTML page of 3-letter country codes, which has 3 tables side-by-side.

Return an arrayref of 3-letter codes.

Special cases are documented in "What is the database schema?" in WWW::Scraper::Wikipedia::ISO3166.

parse_country_page()

Parse the HTML page of country names.

Returns the result of calling "get_table($node, $column_type, $country_code)".

parse_subcountry_page()

Parse the HTML page of a subcountry.

Warning. The 2-letter code of the subcountry must be set with $self -> code2('XX') before calling this method.

Returns the result of calling "get_table($node, $column_type, $country_code)".

populate_countries()

Populate the countries table.

populate_subcountry($count)

Populate the subcountries table, for 1 subcountry.

Warning. The 2-letter code of the subcountry must be set with $self -> code2('XX') before calling this method.

populate_subcountries()

Populate the subcountries table, for all subcountries.

process_countries($table)

Clean up the detail key of the arrayref of hashrefs for the countries.

process_subcountries($table)

Delete the detail key of the arrayref of hashrefs for the subcountry.

save_countries($code3, $table)

Save the countries table, by combining the output of parse_country_code_page() with the output of "process_countries($table)".

save_subcountries($count, $table)

Save the subcountries table, for the given subcountry, using the output of "process_subcountries($table)".

$count is just used in the log for progress messages.

trim($s)

Remove leading and trailing spaces from $s, and return it.

FAQ ^

For the database schema, etc, see "FAQ" in WWW::Scraper::Wikipedia::ISO3166.

References ^

See "References" in WWW::Scraper::Wikipedia::ISO3166.

Support ^

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=WWW::Scraper::Wikipedia::ISO3166.

Author ^

WWW::Scraper::Wikipedia::ISO3166 was written by Ron Savage <ron@savage.net.au> in 2012.

Home page: http://savage.net.au/index.html.

Copyright ^

Australian copyright (c) 2012 Ron Savage.

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Artistic License, a copy of which is available at:
        http://www.opensource.org/licenses/index.html
syntax highlighting: