brian d foy > grepurl-0.10 > grepurl

Download:
grepurl-0.10.tar.gz

Annotate this POD

View/Report Bugs
Source   Latest Release: grepurl-1.02

NAME ^

grepurl - print links in HTML

SYNOPSIS ^

        grepurl [-bdv] [-e extension[,extension] [-E extension[,extension]
                [-h host[,host]] [-H host[,host]] [-p regex] [-P regex]
                [-s scheme[,scheme]] [-s scheme[,scheme]] [-u URL]

DESCRIPTION ^

The grepurl program searches through the URL specified in the -u switch and prints the URLs that satisfies the given set of options. It applies the options roughly in order of which part of the URL the option affects (scheme, host, path, extension).

So far, grepurl expects to search through HTML, although I want to add other content types, especially plain text, RSS feeds, and so on.

OPTIONS ^

-b

turn relative URLs into absolute ones

-d

turn on debugging output

-e EXTENSION

select links with these extensions (comma separated)

-E EXTENSION

exclude links with these extensions (comma separated)

-h HOST

select links with these hosts (comma separated)

-H HOST

exclude links with these hosts (comma separated)

-p PATH

select only paths that match this Perl regex

-P PATH

exclude paths that match this Perl regex

-s SCHEME

select only these schemes (comma separated)

-S SCHEME

exclude these schemes (comma separated)

-t FILE

extract URLs from plain text file (not implemented)

-u URL

extract URLs from URL (may be file://), expects HTML

-v

turn on verbose output

Examples

Print all the links

grepurl -u http://www.example.com/

Print all the links, and resolve relative URLs

grepurl -b -u http://www.example.com/

Print links with the edxtension .jpg

grepurl -e jpg -u http://www.example.com/

Print links with the edxtension .jpg and .jpeg

grepurl -e jpg,jpeg -u http://www.example.com/

Do not print links with the extension .cfm or .asp

grepurl -E cfm,asp -u http://www.example.com/

Print only links to www.panix.com

grepurl -h www.panix.com -u http://www.example.com/

Print only links to www.panix.com or www.perl.com

grepurl -h www.panix.com,www.perl.com -u http://www.example.com/

Do not print links to www.microsoft.com

grepurl -H www.microsoft.com -u http://www.example.com/

Print links with "perl" in the path

grepurl -p perl -u http://www.example.com

Print links with "perl" or "pearl" in the path

grepurl -p "pea?rl" -u http://www.example.com

Print links with "fred" or "barney" in the path

grepurl -p "fred|barney" -u http://www.example.com

Do not print links with "SCO" in the path

grepurl -P SCO -u http://www.example.com

Do not print links whose path matches "Micro.*"

grepurl -P "Micro.*" -u http://www.example.com

Print only web links

grepurl -s http -u http://www.example.com/

Print ftp and gopher links

grepurl -s ftp,gopher -u http://www.example.com/

Print ftp and gopher links

grepurl -s ftp,gopher -u http://www.example.com/

SOURCE AVAILABILITY ^

This source is part of a SourceForge project which always has the latest sources in CVS, as well as all of the previous releases.

        http://sourceforge.net/projects/brian-d-foy/

If, for some reason, I disappear from the world, one of the other members of the project can shepherd this module appropriately.

AUTHOR ^

brian d foy, <bdfoy@cpan.org>

COPYRIGHT ^

Copyright 2004, brian d foy, All rights reserved.

You may use this program under the same terms as Perl itself.

syntax highlighting: