The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Pod::Extract::URI - Extract URIs from POD

SYNOPSIS

  use Pod::Extract::URI;

  # Get a list of URIs from a file
  my @uris = Pod::Extract::URI->uris_from_file( $file );

  # Or filehandle
  my @uris = Pod::Extract::URI->uris_from_filehandle( $filehandle );

  # Or the full OO
  my $parser = Pod::Extract::URI->new();
  $parser->parse_from_file( $file );
  my @uris = $parser->uris();
  my %uri_details = $parser->uri_details();

DESCRIPTION

This module parses POD and uses URI::Find or URI::Find::Schemeless to extract any URIs it can.

METHODS

new()

Create a new Pod::Extract::URI object.

new() takes an optional hash of options, whose names correspond to object methods described in more detail below.

schemeless (boolean, default 0)

Should the parser try to extract schemeless URIs (using URI::Find::Schemeless)?

L_only (boolean, default 0)

Should the parser only look for URIs in L<> sequences?

textblock (boolean, default 1)
verbatim (boolean, default 1)
command (boolean, default 1)

Should the parser look in POD text paragraph, verbatim blocks, or commands?

schemes (arrayref)

Restrict URIs to the schemes in the arrayref.

exclude_schemes (arrayref)

Exclude URIs with the schemes in the arrayref.

stop_uris (arrayref)

An arrayref of patterns to ignore.

stop_sub (coderef)

A reference to a subroutine to run for each URI to see if the URI should be ignored.

use_canonical (boolean, default 0)

Convert the URIs found to their canonical form.

strip_brackets (boolean, default 1)

Strip extra brackets which may appear around the URL returned by URI::Find. See method below for more details.

L_only()

Get/set the L_only flag. Takes one optional true/false argument to set the L_only flag. Defaults to false.

If true, Pod::Extract::URI will look for URIs only in L<> sequences, otherwise it will look anywhere in the POD.

want_command()

Get/set the want_command flag. Takes one optional true/false argument to set the want_command flag. Defaults to true.

If true, Pod::Extract::URI will look for URIs in command blocks (i.e. =head1, etc.).

want_textblock()

Get/set the want_textblock flag. Takes one optional true/false argument to set the want_textblock flag. Defaults to true.

If true, Pod::Extract::URI will look for URIs in textblocks (i.e. paragraphs).

want_verbatim()

Get/set the want_verbatim flag. Takes one optional true/false argument to set the want_verbatim flag. Defaults to true.

If true, Pod::Extract::URI will look for URIs in verbatim blocks (i.e. code examples, etc.).

schemes()

    $peu->schemes( [ 'http', 'ftp' ] );

Get/set the list of schemes to search for. Takes an optional arrayref of schemes to set.

If there are no schemes, Pod::Extract::URI will look for all schemes.

exclude_schemes()

    $peu->exclude_schemes( [ 'mailto', 'https' ] );

Get/set the list of schemes to ignore. Takes an optional arrayref of schemes to set.

stop_uris()

    $peu->stop_uris( [
                       qr/example\.com/,
                       'foobar.com'
                     ] );  

Get/set a list of patterns to apply to each URI to see if it should be ignored. Takes an optional arrayref of patterns to set. Strings in the list will be automatically converted to patterns (using qr//).

The URIs will be checked against the canonical URI form if use_canonical has been specified. Otherwise, they will be checked against the URI as it appears in the POD. If strip_brackets is specified, the brackets (and "URL:" prefix, if present) will be removed before testing.

Any URI that matches a pattern will be ignored.

stop_sub()

    sub exclude {
        my $uri = shift;
        return ( $uri->host =~ /example\.com/ ) ? 1 : 0;
    }
    $peu->stop_sub( \&exclude );

Get/set a subroutine to check each URI found to see if it should be ignored. Takes an optional coderef to set.

The subroutine will be passed a reference to the URI object, the text found by URI::Find, and a reference to the Pod::Extract::URI object. If it returns true, the URI will be ignored.

use_canonical()

Get/set the use_canonical flag. Takes one optional true/false argument to set the use_canonical flag. Defaults to false.

If true, Pod::Extract::URI will store the URIs it finds in the canonical form (as returned by URI-canonical()>. The original URI and text will still be available via uri_details().

strip_brackets()

Get/set the strip_brackets flag. Takes one optional true/false argument to set the strip_brackets flag. Defaults to true.

RFC 2396 Appendix E suggests the form <http://www.example.com/> or <URL:http://www.example.com/> when embedding URLs in plain text. URI::Find includes these in the URLs it returns. If strip_brackets is true, this extra stuff will be removed and won't appear in the URIs returned by Pod::Extract::URI.

parse_from_file()

    $peu->parse_from_file( $filename );

Parses the POD from the specified file and stores the URIs it finds for later retrieval.

parse_from_filehandle()

    $peu->parse_from_filehandle( $filehandle );

Parses the POD from the filehandle and stores the URIs it finds for later retrieval.

uris_from_file()

    my @uris = $peu->uris_from_file( $filename );

A shortcut for parse_from_file() then uris().

uris_from_filehandle()

    my @uris = $peu->uris_from_filehandle( $filename );

A shortcut for parse_from_filehandle() then uris().

uris()

    my @uris = $peu->uris();

Returns a list of the URIs found from parsing.

uri_details()

    my %details = $peu->uri_details();

Returns a hash of data about the URIs found.

The keys of the hash are the URIs (which match those returned by uris()).

The values of the hash are arrayrefs of hashrefs. Each hashref contains

uri

The URI object returned by URI::Find.

text

The text returned by URI::Find, which will have the brackets stripped from it if strip_brackets has been specified.

original_text

The original text returned by URI::Find.

line

The initial line number of the paragraph in which the URI was found.

para

The Pod::Paragraph object corresponding to the paragraph where the URI was found.

STOP URIS

You can specify URIs to ignore in your POD, using a =for stop_uris command, e.g.

    =for stop_uris www.foobar.com

These will be converted to patterns as if they had been passed in via stop_uris() directly, and will apply from the point of the command onwards.

AUTHOR

Ian Malpass (ian-cpan@indecorous.com)

COPYRIGHT

Copyright 2007, Ian Malpass

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

URI::Find, URI::Find::Schemeless, URI.