The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WWW::FetchStory::Fetcher - fetching module for WWW::FetchStory

VERSION

version 0.1815

DESCRIPTION

This is the base class for story-fetching plugins for WWW::FetchStory.

METHODS

new

$obj->WWW::FetchStory::Fetcher->new();

init

Initialize the object.

$obj->init(%args)

name

The name of the fetcher; this is basically the last component of the module name. This works as either a class function or a method.

$name = $self->name();

$name = WWW::FetchStory::Fetcher::name($class);

info

Information about the fetcher. By default this just returns the formatted name.

$info = $self->info();

priority

The priority of this fetcher. Fetchers with higher priority get tried first. This is useful where there may be a generic fetcher for a particular site, and then a more specialized fetcher for particular sections of a site. For example, there may be a generic LiveJournal fetcher, and then refinements for particular LiveJournal community, such as the sshg_exchange community. This works as either a class function or a method.

This must be overridden by the specific fetcher class.

$priority = $self->priority();

$priority = WWW::FetchStory::Fetcher::priority($class);

allow

If this fetcher can be used for the given URL, then this returns true. This must be overridden by the specific fetcher class.

    if ($obj->allow($url))
    {
        ....
    }

fetch

Fetch the story, with the given options.

    %story_info = $obj->fetch(
        urls=>\@urls,
        basename=>$basename,
        toc=>0,
        yaml=>0);
basename

Optional basename used to construct the filenames. If this is not given, the basename is derived from the title of the story.

epub

Create an EPUB file, deleting the HTML files which have been downloaded.

toc

Build a table-of-contents file if this is true.

yaml

Build a YAML file with meta-data about this story if this is true.

urls

The URLs of the story. The first page is scraped for meta-information about the story, including the title and author. Site-specific Fetcher plugins can find additional information, including the URLs of all the chapters in a multi-chapter story.

Private Methods

get_story_basename

Figure out the file basename for a story by using its title.

    $basename = $self->get_story_basename($title);

extract_story

Extract the story-content from the fetched content.

    my ($story, $title) = $self->extract_story(content=>$content,
        title=>$title);

make_css

Create site-specific CSS styling.

    $css = $self->make_css();

tidy

Make a tidy, compliant XHTML page from the given story-content.

    $content = $self->tidy(story=>$story,
                           title=>$title);

get_toc

Get a table-of-contents page.

get_page

Get the contents of a URL.

parse_toc

Parse the table-of-contents file.

This must be overridden by the specific fetcher class.

    %info = $self->parse_toc(content=>$content,
                         url=>$url,
                         urls=>\@urls);

This should return a hash containing:

chapters

An array of URLs for the chapters of the story. In the case where the story only takes one page, that will be the chapter. In the case where multiple URLs have been passed in, it will be those URLs.

title

The title of the story.

It may also return additional information, such as Summary.

parse_chapter_urls

Figure out the URLs for the chapters of this story.

parse_epub_url

Figure out the URL for the EPUB version of this story, if there is one.

parse_title

Get the title from the content

parse_ch_title

Get the chapter title from the content

parse_author

Get the author from the content

parse_summary

Get the summary from the content

parse_characters

Get the characters from the content

parse_universe

Get the universe/fandom from the content

parse_recipient

Get the recipient from the content

parse_category

Get the categories from the content

parse_rating

Get the rating from the content

derive_values

Calculate additional Meta values, such as current date.

get_chapter

Get an individual chapter of the story, tidy it, and save it to a file.

    $filename = $obj->get_chapter(base=>$basename,
                                    count=>$count,
                                    url=>$url,
                                    title=>$title);

get_epub

Get the EPUB version of the story, tidy it, and save it to a file.

    $filename = $obj->get_epub(base=>$basename,
                                    url=>$url);

epub_replace_description

Replace or add the description to an EPUB file.

epub_add_meta

Add the given meta-data to an EPUB file.

epub_parse_one_node

Parse a node of meta-information from an EPUB file.

wordcount

Figure out the word-count.

build_toc

Build a local table-of-contents file from the meta-info about the story.

    $self->build_toc(info=>\%info);

build_epub

Create an EPUB file from the story files and meta information.

    $self->build_epub()

tidy_chars

Remove nasty encodings.

    $content = $self->tidy_chars($content);