scrape.pl - simple HTML scraping from the command line
This is a simple program to extract data from HTML by specifying CSS3 or XPath selectors.
scrape.pl URL selector selector ... # Print page title scrape.pl http://perl.org title # The Perl Programming Language - www.perl.org # Print links with titles, make links absolute scrape.pl http://perl.org a //a/@href --uri=2 # Print all links to JPG images, make links absolute scrape.pl http://perl.org a[@href=$"jpg"]
This program fetches an HTML page and extracts nodes matched by XPath or CSS selectors from it.
If URL is -, input will be read from STDIN.
-
Separator character to use for columns. Default is tab.
Numbers of columns to convert into absolute URIs, if the known attributes do not everything you want.
Switches off the automatic translation to absolute URIs for known attributes like href and src.
href
src
The public repository of this module is http://github.com/Corion/App-scrape.
The public support forum of this program is http://perlmonks.org/.
Max Maischein corion@cpan.org
corion@cpan.org
Copyright 2011-2011 by Max Maischein corion@cpan.org.
This module is released under the same terms as Perl itself.
To install App::scrape, copy and paste the appropriate command in to your terminal.
cpanm
cpanm App::scrape
CPAN shell
perl -MCPAN -e shell install App::scrape
For more information on module installation, please visit the detailed CPAN module installation guide.