The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
NAME
    urifind - find URIs in a document and dump them to STDOUT.

SYNOPSIS
        $ urifind file

DESCRIPTION
    urifind is a simple script that finds URIs in one or more files (using
    "URI::Find"), and outputs them to to STDOUT. That's it.

    To find all the URIs in file1, use:

        $ urifind file1

    To find the URIs in multiple files, simply list them as arguments:

        $ urifind file1 file2 file3

    urifind will read from "STDIN" if no files are given or if a filename of
    "-" is specified:

        $ wget http://www.boston.com/ -O - | urifind

    When multiple files are listed, urifind prefixes each found URI with the
    file from which it came:

        $ urifind file1 file2
        file1: http://www.boston.com/index.html
        file2: http://use.perl.org/

    This can be turned on for single files with the "-p" ("prefix") switch:

        $urifind -p file3
        file1: http://fsck.com/rt/

    It can also be turned off for multiple files with the "-n" ("no prefix")
    switch:

        $ urifind file1 file2
        http://www.boston.com/index.html
        http://use.perl.org/

    By default, URIs will be displayed in the order found; to sort them
    ascii-betically, use the "-s" ("sort") option. To reverse sort them, use
    the "-r" ("reverse") flag ("-r" implies "-s").

        $ urifind -s file1 file2
        http://use.perl.org/
        http://www.boston.com/index.html
        mailto:webmaster@boston.com

        $ urifind -r file1 file2
        mailto:webmaster@boston.com
        http://www.boston.com/index.html
        http://use.perl.org/

    Finally, urifind supports limiting the returned URIs by scheme or by
    arbitrary pattern, using the "-S" option (for schemes) and the "-P"
    option. Both "-S" and "-P" can be specified multiple times:

        $ urifind -S mailto file1
        mailto:webmaster@boston.com

        $ urifind -S mailto -S http
        mailto:webmaster@boston.com
        http://www.boston.com/index.html

    "-P" takes an arbitrary Perl regex. It might need to be protected from
    the shell:

        $ urifind -P 's?html?' file1
        http://www.boston.com/index.html

        $ urifind -P '\.org\b' -S http file4
        http://www.gnu.org/software/wget/wget.html

    Add a "-d" to have urifind dump the refexen generated from "-S" and "-P"
    to "STDERR". "-D" does the same but exits immediately:

        $ urifind -P '\.org\b' -S http -D 
        $scheme = '^(\bhttp\b):'
        @pats = ('^(\bhttp\b):', '\.org\b')

    To remove duplicates from the results, use the "-u" ("unique") switch.

OPTION SUMMARY
    -s  Sort results.

    -r  Reverse sort results (implies -s).

    -u  Return unique results only.

    -n  Don't include filename in output.

    -p  Include filename in output (0 by default, but 1 if multiple files
        are included on the command line).

    -P $re
        Print only lines matching regex '$re' (may be specified multiple
        times).

    -S $scheme
        Only this scheme (may be specified multiple times).

    -h  Help summary.

    -v  Display version and exit.

    -d  Dump compiled regexes for "-S" and "-P" to "STDERR".

    -D  Same as "-d", but exit after dumping.

VERSION
    This is urifind, revision $Revision: 1.2 $.

AUTHOR
    darren chamberlain <darren@cpan.org>

COPYRIGHT
    (C) 2003 darren chamberlain

    This library is free software; you may distribute it and/or modify it
    under the same terms as Perl itself.

SEE ALSO
    the Perl manpage