WWW::Mechanize::Chrome - automate the Chrome browser
use Log::Log4perl qw(:easy); use WWW::Mechanize::Chrome; Log::Log4perl->easy_init($ERROR); # Set priority of root logger to ERROR my $mech = WWW::Mechanize::Chrome->new(); $mech->get('https://google.com'); $mech->eval_in_page('alert("Hello Chrome")'); my $png = $mech->content_as_png();
A collection of other Examples is available to help you get started.
Like WWW::Mechanize, this module automates web browsing with a Perl object. Fetching and rendering of web pages is delegated to the Chrome (or Chromium) browser by starting an instance of the browser and controlling it with Chrome DevTools.
The Chrome browser provides advanced abilities useful for automating modern web applications that are not (yet) possible with WWW::Mechanize alone:
Page content can be created or modified with JavaScript. You can also execute custom JavaScript code on the page content.
Page content can be selected with CSS selectors.
Screenshots of the rendered page as an image or PDF file.
Installation of a Chrome compatible browser is required. There are some quirks including sporadic, but harmless, error messages issued by the browser when run with with DevTools.
WWW::Mechanize::Chrome (WMC) leverages developer tools built into Chrome and Chrome-like browsers to control a browser instance programatically. You can use WMC to automate tedious tasks, test web applications, and perform web scraping operations.
WWW::Mechanize::Chrome
Typically, WMC is used to launch both a host instance of the browser and provide a client instance of the browser. The host instance of the browser is visible to you on your desktop (unless the browser is running in "headless" mode, in which case it will not open in a window). The client instance is the Perl program you write with the WMC module to issue commands to control the host instance. As you navigate and "click" on various nodes in the client browser, you watch the host browser respond to these actions as if by magic.
This magic happens as a result of commands that are issued from your client to the host using Chrome's DevTools Protocol which implements the http protocol to send JSON data structures. The host also responds to the client with JSON to describe the web pages it has loaded. WMC conveniently hides the complexity of the lower level communications between the client and host browsers and wraps them in a Perl object to provide the easy-to-use methods documented here.
WWW::Mechanize::Chrome->new( %options )
my $mech = WWW::Mechanize::Chrome->new( headless => 0, );
autodie => 0 # make HTTP errors non-fatal
By default, autodie is set to true. If an HTTP error is encountered, the program dies along with its associated browser instances. This frees you from having to write error checks after every request. Setting this value to false makes HTTP errors non-fatal, allowing the program to continue running if there is an error.
autodie
Don't display a browser window. Default is to display a browser window.
Set the host the browser listens on:
host => '192.168.1.2' host => 'localhost'
Defaults to 127.0.0.1. The browser will listen for commands on the specified host. The host address should be inaccessible from the internet.
127.0.0.1
port => 9223 # set port the launched browser will use for remote operation
Defaults to 9222. Commands to the browser will be issued through this port.
9222
Specify the browser tab the Chrome browser will use:
tab => 'current' tab => qr/PerlMonks/
By default, a web page is opened in a new browser tab. Setting tab to current will use the current, active tab instead. Alternatively, to use an existing inactive tab, you can pass a regular expression to match against the existing tab's title. A false value implements the default behavior and a new tab will be created.
tab
current
autoclose => 0 # keep tab open after program end
By default, autoclose is set to true, closing the tab opened when running your code. If autoclose is set to a false value, the tab will remain open even after the program has finished.
autoclose
Set the name and/or path to the browser's executable program:
launch_exe => 'name-of-chrome-executable' # for non-standard executable names launch_exe => '/path/to/executable' # for non-standard paths launch_exe => '/path/to/executable/chrome' # full path
By default, WWW::Mechanize::Chrome will search the appropriate paths for Chrome's executable file based on the operating system. Use this option to set the path to your executable if it is in a non-standard location or if the executable has a non-standard name.
The default paths searched are those found in $ENV{PATH}. For OS X, the user and system Application directories are also searched. The default values for the executable file's name are chrome on Windows, Google Chrome on OS X, and google-chrome elsewhere.
$ENV{PATH}
Application
chrome
Google Chrome
google-chrome
If you want to use Chromium, you must specify that explicitly with something like:
launch_exe => 'chromium-browser', # if Chromium is named chromium-browser on your OS
Results my vary for your operating system. Use the full path to the browser's executable if you are having issues. You can also set the name of the executable file with the $ENV{CHROME_BIN} environment variable.
$ENV{CHROME_BIN}
cleanup_signal => 'SIGKILL'
The signal that is sent to Chrome to shut it down. On Linuxish OSes, this will be TERM, on OSX and Windows it will be KILL.
TERM
KILL
start_url => 'http://perlmonks.org' # Immediately navigate to a given URL
By default, the browser will open with a blank tab. Use the start_url option to open the browser to the specified URL. More typically, the ->get method is use to navigate to URLs.
start_url
->get
Pass additional switches and parameters to the browser's executable:
launch_arg => [ "--some-new-parameter=foo", "--another-option" ]
Examples of other useful parameters include:
'--start-maximized', '--window-size=1280x1696' '--ignore-certificate-errors' '--disable-web-security', '--allow-running-insecure-content', '--load-extension' '--no-sandbox' '--password-store=basic'
separate_session => 1 # create a new, empty session
This creates an empty, fresh Chrome session without any cookies. Setting this will disregard any data_directory setting.
incognito => 1 # open the browser in incognito mode
Defaults to false. Set to true to launch the browser in incognito mode.
Most likely, you want to use separate_session instead.
data_directory => '/path/to/data/directory' # set the data directory
By default, an empty data directory is used. Use this setting to change the base data directory for the browsing session.
use File::Temp 'tempdir'; # create a fresh Chrome every time my $mech = WWW::Mechanize::Chrome->new( data_directory => tempdir(CLEANUP => 1 ), );
Using the "main" Chrome cookies:
my $mech = WWW::Mechanize::Chrome->new( data_directory => '/home/corion/.config/chromium', );
profile => 'ProfileDirectory' # set the profile directory
By default, your current user profile directory is used. Use this setting to change the profile directory for the browsing session.
You will need to set the data_directory as well, so that Chrome finds the profile within the data directory. The profile directory/name itself needs to be a single directory name, not the full path. That single directory name will be relative to the data directory.
wait_file => "$tempdir/CrashpadMetrics-active.pma"
When shutting down, wait until this file does not exist anymore or can be deleted. This can help making sure that the Chrome process has really shut down.
startup_timeout => 5 # set the startup timeout value
Defaults to 20, the maximum number of seconds to wait for the browser to launch. Higher or lower values can be set based on the speed of the machine. The process attempts to connect to the browser once each second over the duration of this setting.
driver => $driver_object # specify the driver object
Use a Chrome::DevToolsProtocol::Target object that has been manually constructed.
report_js_errors => 1 # turn javascript error reporting on
Defaults to false. If true, tests for Javascript errors and warns after each request are run. This is useful for testing with use warnings qw(fatal).
use warnings qw(fatal)
mute_audio => 0 # turn sounds on
Defaults to true (sound off). A false value turns the sound on.
background_networking => 1 # turn background networking on
Defaults to false (off). A true value enables background networking.
client_side_phishing_detection => 1 # turn client side phishing detection on
Defaults to false (off). A true value enables client side phishing detection.
component_update => 1 # turn component updates on
Defaults to false (off). A true value enables component updates.
default_apps => 1 # turn default apps on
Defaults to false (off). A true value enables default apps.
hang_monitor => 1 # turn the hang monitor on
Defaults to false (off). A true value enables the hang monitor.
hide_scrollbars => 1 # hide the scrollbars
Defaults to false (off). A true value will hide the scrollbars.
infobars => 1 # turn infobars on
Defaults to false (off). A true value will turn infobars on.
popup_blocking => 1 # block popups
Defaults to false (off). A true value will block popups.
prompt_on_repost => 1 # allow prompts when reposting
Defaults to false (off). A true value will allow prompts when reposting.
save_password_bubble => 1 # allow the display of the save password bubble
Defaults to false (off). A true value allows the save password bubble to be displayed.
sync => 1 # turn syncing on
Defaults to false (off). A true value turns syncing on.
web_resources => 1 # turn web resources on
Defaults to false (off). A true value turns web resources on.
Filename to log all JSON communications to, one line per message/event/reply
Filehandle to log all JSON communications to, one line per message/event/reply
Open this filehandle via
open my $fh, '>:utf8', $logfilename or die "Couldn't create '$logfilename': $!";
The $ENV{WWW_MECHANIZE_CHROME_TRANSPORT} variable can be set to a different transport class to override the default transport class. This is primarily used for testing but can also help eliminate introducing bugs from the underlying websocket implementation(s).
$ENV{WWW_MECHANIZE_CHROME_TRANSPORT}
The $ENV{WWW_MECHANIZE_CHROME_CONNECTION_STYLE} variable can be set to either websocket or pipe to specify the kind of transport that you want to use.
$ENV{WWW_MECHANIZE_CHROME_CONNECTION_STYLE}
websocket
pipe
The pipe transport is only available on unixish OSes and only with Chrome v72 onwards.
WWW::Mechanize::Chrome->find_executable
my $chrome = WWW::Mechanize::Chrome->find_executable(); my $chrome = WWW::Mechanize::Chrome->find_executable( 'chromium.exe', '.\\my-chrome-66\\', ); my( $chrome, $diagnosis ) = WWW::Mechanize::Chrome->find_executable( ['chromium-browser','google-chrome'], './my-chrome-66/', ); die $diagnosis if ! $chrome;
Finds the first Chrome executable in the path ($ENV{PATH}). For Windows, it also looks in $ENV{ProgramFiles}, $ENV{ProgramFiles(x86)} and $ENV{"ProgramFilesW6432"}. For OSX it also looks in the user home directory as given through $ENV{HOME}.
$ENV{ProgramFiles}
$ENV{ProgramFiles(x86)}
$ENV{"ProgramFilesW6432"}
$ENV{HOME}
This is used to find the default Chrome executable if none was given through the launch_exe option or if the executable is given and does not exist and does not contain a directory separator.
launch_exe
$mech->chrome_version
print $mech->chrome_version;
Synonym for ->browser_version
->browser_version
$mech->browser_version
print $mech->browser_version;
Returns the version of the browser executable being used. This information needs launching the browser and asking for the version via the network.
$mech->chrome_version_info
print $mech->chrome_version_info->{product};
Returns the version information of the Chrome executable and various other APIs of Chrome that the object is connected to.
$mech->driver
deprecated - use ->target instead
->target
my $driver = $mech->driver
Access the Chrome::DevToolsProtocol instance connecting to Chrome.
Deprecated, don't use this anymore. Most likely you want to use ->target to talk to the Chrome tab or ->transport to talk to the Chrome instance.
->transport
$mech->target
my $target = $mech->target
Access the Chrome::DevToolsProtocol::Target instance connecting to the Chrome tab we use.
$mech->transport
my $transport = $mech->transport
Access the Chrome::DevToolsProtocol::Transport instance connecting to the Chrome instance.
$mech->tab
my $tab = $mech->tab
Access the tab hash of the Chrome::DevToolsProtocol::Target instance. This represents the tab we control.
$mech->new_tab
$mech->new_tab_future
my $tab2 = $mech->new_tab_future( start_url => 'https://google.com', )->get;
Creates a new tab (basically, a new WWW::Mechanize::Chrome object) connected to the same Chrome session.
# Use a targetInfo structure from Chrome my $tab2 = $mech->new_tab_future( tab => { 'targetId' => '1F42BDF32A30700805DDC21EDB5D8C4A', }, )->get;
It returns a Future because most event loops do not like recursing within themselves, which happens if you want to access a fresh new tab within another callback.
popup
my $opened; $mech->on( 'popup' => sub( $mech, $tab_f ) { # This is a bit heavyweight, but ... $tab_f->on_done(sub($tab) { say "New window/tab was popped up:"; $tab->uri_future->then(sub($uri) { say $uri; }); $opened = $tab; })->retain; }); $mech->click({ selector => '#popup_window' }); if( $opened ) { say $opened->title; } else { say "Did not find new tab?"; };
This event is sent whenever a new tab/window gets popped up or created. The callback is handed the current and a second WWW::Mechanize::Chrome instance. Note that depending on your event loop, you are quite restricted on what synchronous methods you can call from within the callback.
$mech->allow( %options )
$mech->allow( javascript => 1 );
Allow or disallow execution of Javascript
$mech->emulateNetworkConditions( %options )
# Go offline $mech->emulateNetworkConditions( offline => JSON::true, latency => 10, # ms ping downloadThroughput => 0, # bytes/s uploadThroughput => 0, # bytes/s connectionType => 'offline', # cellular2g, cellular3g, cellular4g, bluetooth, ethernet, wifi, wimax, other. );
$mech->setRequestInterception( @patterns )
$mech->setRequestInterception( { urlPattern => '*', resourceType => 'Document', interceptionStage => 'Request'}, { urlPattern => '*', resourceType => 'Media', interceptionStage => 'Response'}, );
Sets the list of request patterns and resource types for which the interception callback will be invoked.
$mech->continueInterceptedRequest( %options )
$mech->continueInterceptedRequest_future( interceptionId => ... );
Continues an intercepted request
$mech->add_listener
my $url_loaded = $mech->add_listener('Network.responseReceived', sub { my( $info ) = @_; warn "Loaded URL " . $info->{params}->{response}->{url} . ": " . $info->{params}->{response}->{status}; warn "Resource timing: " . Dumper $info->{params}->{response}->{timing}; });
Returns a listener object. If that object is discarded, the listener callback will be removed.
Calling this method in void context croaks.
To see the browser console live from your Perl script, use the following:
my $console = $mech->add_listener('Runtime.consoleAPICalled', sub { warn join ", ", map { $_->{value} // $_->{description} } @{ $_[0]->{params}->{args} }; });
If you want to explicitly remove the listener, either set it to undef:
undef
undef $console;
Alternatively, call
$console->unregister;
or call
$mech->remove_listener( $console );
$mech->on_request_intercepted( $cb )
$mech->on_request_intercepted( sub { my( $mech, $info ) = @_; warn $info->{request}->{url}; $mech->continueInterceptedRequest_future( interceptionId => $info->{interceptionId} ) });
A callback for intercepted requests that match the patterns set up via setRequestInterception.
setRequestInterception
If you return a future from this callback, it will not be discarded but kept in a safe place.
$mech->searchInResponseBody( $id, %options )
my $request_id = ...; my @matches = $mech->searchInResponseBody( requestId => $request_id, query => 'rumpelstiltskin', caseSensitive => JSON::true, isRegex => JSON::false, ); for( @matches ) { print $_->{lineNumber}, ":", $_->{lineContent}, "\n"; };
Returns the matches (if any) for a string or regular expression within a response.
$mech->on_dialog( $cb )
$mech->on_dialog( sub { my( $mech, $dialog ) = @_; warn $dialog->{message}; $mech->handle_dialog( 1 ); # click "OK" / "yes" instead of "cancel" });
A callback for Javascript dialogs (alert(), prompt(), ... )
alert()
prompt()
$mech->handle_dialog( $accept, $prompt = undef )
$mech->on_dialog( sub { my( $mech, $dialog ) = @_; warn "[Javascript $dialog->{type}]: $dialog->{message}"; $mech->handle_dialog( 1 ); # click "OK" / "yes" instead of "cancel" });
Closes the current Javascript dialog.
$mech->js_console_entries()
print $_->{type}, " ", $_->{message}, "\n" for $mech->js_console_entries();
An interface to the Javascript Error Console
Returns the list of entries in the JEC
$mech->js_errors()
print "JS error: ", $_->{message}, "\n" for $mech->js_errors();
Returns the list of errors in the JEC
$mech->clear_js_errors()
$mech->clear_js_errors();
Clears all Javascript messages from the console
$mech->eval_in_page( $str, %options )
$mech->eval( $str, %options )
my ($value, $type) = $mech->eval( '2+2' );
Evaluates the given Javascript fragment in the context of the web page. Returns a pair of value and Javascript type.
This allows access to variables and functions declared "globally" on the web page.
If you want to create an object in Chrome and only want to keep a handle to that remote object, use JSON::false for the returnByValue option:
JSON::false
returnByValue
my ($dummyObj,$type) = $mech->eval( 'new Object', returnByValue => JSON::false );
This is also helpful if the object in Chrome cannot be serialized as JSON. For example, window is such an object. The return value is a hash, whose objectId is the most interesting part.
window
objectId
This method is special to WWW::Mechanize::Chrome.
$mech->eval_in_chrome $code, @args
$mech->eval_in_chrome(<<'JS', "Foobar/1.0"); this.settings.userAgent= arguments[0] JS
Evaluates Javascript code in the context of Chrome.
This allows you to modify properties of Chrome.
This is currently not implemented.
$mech->callFunctionOn( $function, @arguments )
my ($value, $type) = $mech->callFunctionOn( 'function(greeting) { window.alert(greeting)}', objectId => $someObjectId, arguments => [{ value => 'Hello World' }] );
Runs the given function with the specified arguments. This is the only way to pass arguments to a function call without doing risky string interpolation. The Javascript this object will be set to the object referenced from the objectId.
this
The arguments option expects an arrayref of hashrefs. Each hash describes one function argument.
arguments
The objectId parameter is optional. Leaving out the objectId parameter will create a dummy object on which the function then is called.
->autoclose_tab
Set the autoclose option
->close
$mech->close()
Tear down all connections and shut down Chrome.
$mech->list_tabs
my @open_tabs = $mech->list_tabs()->get; say $open_tabs[0]->{title};
Returns the open tabs as a list of hashrefs.
$mech->highlight_node( @nodes )
my @links = $mech->selector('a'); $mech->highlight_node(@links); print $mech->content_as_png();
Convenience method that marks all nodes in the arguments with a red frame.
This is convenient if you need visual verification that you've got the right nodes.
$mech->get( $url, %options )
my $response = $mech->get( $url );
Retrieves the URL URL.
URL
It returns a HTTP::Response object for interface compatibility with WWW::Mechanize.
Note that the returned HTTP::Response object gets the response body filled in lazily, so you might have to wait a moment to get the response body from the result. This is a premature optimization and later releases of WWW::Mechanize::Chrome are planned to fetch the response body immediately when accessing the response body.
Note that Chrome does not support download of files through the API.
intrapage - Override the detection of whether to wait for a HTTP response or not. Setting this will never wait for an HTTP response.
intrapage
$mech->_collectEvents
my $events = $mech->_collectEvents( sub { $_[0]->{method} eq 'Page.loadEventFired' } ); my( $e,$r) = Future->wait_all( $events, $self->target->send_message(...));
Internal method to create a Future that waits for an event that is sent by Chrome.
The subroutine is the predicate to check to see if the current event is the event we have been waiting for.
The result is a Future that will return all captured events.
$mech->get_local( $filename , %options )
$mech->get_local('test.html');
Shorthand method to construct the appropriate file:// URI and load it into Chrome. Relative paths will be interpreted as relative to $0 or the basedir option.
file://
$0
basedir
This method accepts the same options as ->get().
->get()
This method is special to WWW::Mechanize::Chrome but could also exist in WWW::Mechanize through a plugin.
Warning: Chrome does not handle local files well. Especially subframes do not get loaded properly.
$mech->getRequestPostData
if( $info->{params}->{response}->{requestHeaders}->{":method"} eq 'POST' ) { $req->{postBody} = $m->getRequestPostData( $id ); };
Retrieves the data sent with a POST request
$mech->post( $url, %options )
not implemented
$mech->post( 'http://example.com', params => { param => "Hello World" }, headers => { "Content-Type" => 'application/x-www-form-urlencoded', }, charset => 'utf-8', );
Sends a POST request to $url.
$url
A Content-Length header will be automatically calculated if it is not given.
Content-Length
The following options are recognized:
headers - a hash of HTTP headers to send. If not given, the content type will be generated automatically.
headers
data - the raw data to send, if you've encoded it already.
data
$mech->reload( %options )
$mech->reload( ignoreCache => 1 )
Acts like the reload button in a browser: repeats the current request. The history (as per the "back" method) is not altered.
Returns the HTTP::Response object from the reload, or undef if there's no current request.
$mech->set_download_directory( $dir )
my $downloads = tempdir(); $mech->set_download_directory( $downloads );
Enables automatic file downloads and sets the directory where the files will be downloaded to. Setting this to undef will disable downloads again.
The directory in $dir must be an absolute path, since Chrome does not know about the current directory of your Perl script.
$dir
$mech->cookie_jar
my $cookies = $mech->cookie_jar
Returns all the Chrome cookies in a HTTP::Cookies::ChromeDevTools instance. Setting a cookie in there will also set the cookie in Chrome. Note that the ->cookie_jar does not automatically refresh when a new page is loaded. To manually refresh the state of the cookie jar, use:
->cookie_jar
$mech->get('https://example.com/some_page'); $mech->cookie_jar->load;
$mech->add_header( $name => $value, ... )
$mech->add_header( 'X-WWW-Mechanize-Chrome' => "I'm using it", Encoding => 'text/klingon', );
This method sets up custom headers that will be sent with every HTTP(S) request that Chrome makes.
Note that currently, we only support one value per header.
Chrome since version 63+ does not allow setting and sending the Referer header anymore. The bug report is at https://bugs.chromium.org/p/chromium/issues/detail?id=849972.
Referer
$mech->delete_header( $name , $name2... )
$mech->delete_header( 'User-Agent' );
Removes HTTP headers from the agent's list of special headers. Note that Chrome may still send a header with its default value.
$mech->reset_headers
$mech->reset_headers();
Removes all custom headers and makes Chrome send its defaults again.
$mech->block_urls()
$mech->block_urls( '//facebook.com/js/conversions/tracking.js' );
Sets the list of blocked URLs. These URLs will not be retrieved by Chrome when loading a page. This is useful to eliminate tracking images or to test resilience in face of bad network conditions.
$mech->res()
$mech->response(%options)
my $response = $mech->response(headers => 0);
Returns the current response as a HTTP::Response object.
$mech->success()
$mech->get('https://google.com'); print "Yay" if $mech->success();
Returns a boolean telling whether the last request was successful. If there hasn't been an operation yet, returns false.
This is a convenience function that wraps $mech->res->is_success.
$mech->res->is_success
$mech->status()
$mech->get('https://google.com'); print $mech->status(); # 200
Returns the HTTP status code of the response. This is a 3-digit number like 200 for OK, 404 for not found, and so on.
$mech->back()
$mech->back();
Goes one page back in the page history.
Returns the (new) response.
$mech->forward()
$mech->forward();
Goes one page forward in the page history.
$mech->stop()
$mech->stop();
Stops all loading in Chrome, as if you pressed ESC.
ESC
This function is mostly of use in callbacks or in a timer callback from your event loop.
$mech->uri()
$mech->uri_future()
print "We are at " . $mech->uri; print "We are at " . $mech->uri_future->get;
Returns the current document URI.
$mech->infinite_scroll( [$wait_time_in_seconds] )
$new_content_found = $mech->infinite_scroll(3);
Loads content into pages that have "infinite scroll" capabilities by scrolling to the bottom of the web page and waiting up to the number of seconds, as set by the optional $wait_time_in_seconds argument, for the browser to load more content. The default is to wait up to 20 seconds. For reasonbly fast sites, the wait time can be set much lower.
$wait_time_in_seconds
The method returns a boolean true if new content is loaded, false otherwise. You can scroll to the end (if there is one) of an infinitely scrolling page like so:
true
false
while( $mech->infinite_scroll ) { # Tests for exiting the loop earlier last if $count++ >= 10; }
$mech->document_future()
$mech->document()
print $self->document->{nodeId};
Returns the document node.
document
This is WWW::Mechanize::Chrome specific.
$mech->content( %options )
print $mech->content; print $mech->content( format => 'html' ); # default print $mech->content( format => 'text' ); # identical to ->text print $mech->content( format => 'mhtml' ); # identical to ->captureSnapshot
This always returns the content as a Unicode string. It tries to decode the raw content according to its input encoding. This currently only works for HTML pages, not for images etc.
Recognized options:
format - the stuff to return
format
The allowed values are html and text. The default is html.
html
text
$mech->text()
print $mech->text();
Returns the text of the current HTML content. If the content isn't HTML, $mech will die.
$mech->captureSnapshot_future()
$mech->captureSnapshot()
print $mech->captureSnapshot( format => 'mhtml' )->{data};
Returns the current page as MHTML.
$mech->content_encoding()
print "The content is encoded as ", $mech->content_encoding;
Returns the encoding that the content is in. This can be used to convert the content from UTF-8 back to its native encoding.
$mech->update_html( $html )
$mech->update_html($html);
Writes $html into the current document. This is mostly implemented as a convenience method for HTML::Display::MozRepl.
$html
The value passed in as $html will be stringified.
$mech->base()
print $mech->base;
Returns the URL base for the current page.
The base is either specified through a base tag or is the current URL.
base
This method is specific to WWW::Mechanize::Chrome.
$mech->content_type()
$mech->ct()
print $mech->content_type;
Returns the content type of the currently loaded document
$mech->is_html()
print $mech->is_html();
Returns true/false on whether our content is HTML, according to the HTTP headers.
$mech->title()
print "We are on page " . $mech->title;
Returns the current document title.
$mech->links()
print $_->text . " -> " . $_->url . "\n" for $mech->links;
Returns all links in the document as WWW::Mechanize::Link objects.
Currently accepts no parameters. See ->xpath or ->selector when you want more control.
->xpath
->selector
$mech->selector( $css_selector, %options )
my @text = $mech->selector('p.content');
Returns all nodes matching the given CSS selector. If $css_selector is an array reference, it returns all nodes matched by any of the CSS selectors in the array.
$css_selector
This takes the same options that ->xpath does.
This method is implemented via WWW::Mechanize::Plugin::Selector.
$mech->find_link_dom( %options )
print $_->{innerHTML} . "\n" for $mech->find_link_dom( text_contains => 'CPAN' );
A method to find links, like WWW::Mechanize's ->find_links method. This method returns DOM objects from Chrome instead of WWW::Mechanize::Link objects.
->find_links
Note that Chrome might have reordered the links or frame links in the document so the absolute numbers passed via n might not be the same between WWW::Mechanize and WWW::Mechanize::Chrome.
n
The supported options are:
text and text_contains and text_regex
text_contains
text_regex
Match the text of the link as a complete string, substring or regular expression.
Matching as a complete string or substring is a bit faster, as it is done in the XPath engine of Chrome.
id and id_contains and id_regex
id
id_contains
id_regex
Matches the id attribute of the link completely or as part
name and name_contains and name_regex
name
name_contains
name_regex
Matches the name attribute of the link
url and url_regex
url
url_regex
Matches the URL attribute of the link (href, src or content).
href
src
content
class - the class attribute of the link
class
n - the (1-based) index. Defaults to returning the first link.
single - If true, ensure that only one element is found. Otherwise croak or carp, depending on the autodie parameter.
single
one - If true, ensure that at least one element is found. Otherwise croak or carp, depending on the autodie parameter.
one
The method croaks if no link is found. If the single option is true, it also croaks when more than one link is found.
croak
$mech->find_link( %options )
print $_->text . "\n" for $mech->find_link( text_contains => 'CPAN' );
A method quite similar to WWW::Mechanize's method. The options are documented in ->find_link_dom.
->find_link_dom
Returns a WWW::Mechanize::Link object.
This defaults to not look through child frames.
$mech->find_all_links( %options )
print $_->text . "\n" for $mech->find_all_links( text_regex => qr/google/i );
Finds all links in the document. The options are documented in ->find_link_dom.
Returns them as list or an array reference, depending on context.
$mech->find_all_links_dom %options
print $_->{innerHTML} . "\n" for $mech->find_all_links_dom( text_regex => qr/google/i );
Finds all matching linky DOM nodes in the document. The options are documented in ->find_link_dom.
$mech->follow_link( $link )
$mech->follow_link( %options )
$mech->follow_link( xpath => '//a[text() = "Click here!"]' );
Follows the given link. Takes the same parameters that find_link_dom uses.
find_link_dom
Note that ->follow_link will only try to follow link-like things like A tags.
->follow_link
A
$mech->xpath( $query, %options )
my $link = $mech->xpath('//a[id="clickme"]', one => 1); # croaks if there is no link or more than one link found my @para = $mech->xpath('//p'); # Collects all paragraphs my @para_text = $mech->xpath('//p/text()', type => $mech->xpathResult('STRING_TYPE')); # Collects all paragraphs as text
Runs an XPath query in Chrome against the current document.
If you need more information about the returned results, use the ->xpathEx() function.
->xpathEx()
Note that Chrome sometimes returns a node with node id 0. This node then cannot be found again using the Chrome API. This is bad luck and results in a warning.
The options allow the following keys:
document - document in which the query is to be executed. Use this to search a node within a specific subframe of $mech->document.
$mech->document
frames - if true, search all documents in all frames and iframes. This may or may not conflict with node. This will default to the frames setting of the WWW::Mechanize::Chrome object.
frames
node
node - node relative to which the query is to be executed. Note that you will have to use a relative XPath expression as well. Use
.//foo
instead of
//foo
Querying relative to a node only works for restricting to children of the node, not for anything else. This is because we need to do the ancestor filtering ourselves instead of having a Chrome API for it.
maybe - If true, ensure that at most one element is found. Otherwise croak or carp, depending on the autodie parameter.
maybe
all - If true, return all elements found. This is the default. You can use this option if you want to use ->xpath in scalar context to count the number of matched elements, as it will otherwise emit a warning for each usage in scalar context without any of the above restricting options.
all
any - no error is raised, no matter if an item is found or not.
any
Returns the matched results as WWW::Mechanize::Chrome::Node objects.
You can pass in a list of queries as an array reference for the first parameter. The result will then be the list of all elements matching any of the queries.
This is a method that is not implemented in WWW::Mechanize.
In the long run, this should go into a general plugin for WWW::Mechanize.
$mech->by_id( $id, %options )
my @text = $mech->by_id('_foo:bar');
Returns all nodes matching the given ids. If $id is an array reference, it returns all nodes matched by any of the ids in the array.
$id
This method is equivalent to calling ->xpath :
$self->xpath(qq{//*[\@id="$_"]}, %options)
It is convenient when your element ids get mistaken for CSS selectors.
$mech->click( $name [,$x ,$y] )
# If the element is within a <form> element $mech->click( 'go' ); # If the element is anywhere on the page $mech->click({ xpath => '//button[@name="go"]' });
Has the effect of clicking a button (or other element) on the current form. The first argument is the name of the button to be clicked. The second and third arguments (optional) allow you to specify the (x,y) coordinates of the click.
If there is only one button on the form, $mech->click() with no arguments simply clicks that one button.
$mech->click()
If you pass in a hash reference instead of a name, the following keys are recognized:
text - Find the element to click by its contained text
selector - Find the element to click by the CSS selector
selector
xpath - Find the element to click by the XPath query
xpath
dom - Click on the passed DOM element
dom
You can use this to click on arbitrary page elements. There is no convenient way to pass x/y co-ordinates when using the dom option.
id - Click on the element with the given id
This is useful if your document ids contain characters that do look like CSS selectors. It is equivalent to
xpath => qq{//*[\@id="$id"]}
Returns a HTTP::Response object.
As a deviation from the WWW::Mechanize API, you can also pass a hash reference as the first parameter. In it, you can specify the parameters to search much like for the find_link calls.
find_link
$mech->click_button( ... )
$mech->click_button( name => 'go' ); $mech->click_button( input => $mybutton );
Has the effect of clicking a button on the current form by specifying its name, value, or index. Its arguments are a list of key/value pairs. Only one of name, number, input or value must be specified in the keys.
name - name of the button
value - value of the button
value
input - DOM node
input
id - id of the button
number - number of the button
number
If you find yourself wanting to specify a button through its selector or xpath, consider using ->click instead.
->click
$mech->current_form()
print $mech->current_form->{name};
Returns the current form.
This method is incompatible with WWW::Mechanize. It returns the DOM <form> object and not a HTML::Form instance.
<form>
The current form will be reset by WWW::Mechanize::Chrome on calls to ->get() and ->get_local(), and on calls to ->submit() and ->submit_with_fields.
->get_local()
->submit()
->submit_with_fields
$mech->dump_forms( [$fh] )
open my $fh, '>', 'form-log.txt' or die "Couldn't open logfile 'form-log.txt': $!"; $mech->dump_forms( $fh );
Prints a dump of the forms on the current page to the filehandle $fh. If $fh is not specified or is undef, it dumps to STDOUT.
$fh
STDOUT
$mech->form_name( $name [, %options] )
$mech->form_name( 'search' );
Selects the current form by its name. The options are identical to those accepted by the "$mech->xpath" method.
$mech->form_id( $id [, %options] )
$mech->form_id( 'login' );
Selects the current form by its id attribute. The options are identical to those accepted by the "$mech->xpath" method.
This is equivalent to calling
$mech->by_id($id,single => 1,%options)
$mech->form_number( $number [, %options] )
$mech->form_number( 2 );
Selects the numberth form. The options are identical to those accepted by the "$mech->xpath" method.
$mech->form_with_fields( [$options], @fields )
$mech->form_with_fields( 'user', 'password' );
Find the form which has the listed fields.
If the first argument is a hash reference, it's taken as options to ->xpath.
See also "$mech->submit_form".
$mech->forms( %options )
my @forms = $mech->forms();
When called in a list context, returns a list of the forms found in the last fetched page. In a scalar context, returns a reference to an array with those forms.
The options are identical to those accepted by the "$mech->selector" method.
The returned elements are the DOM <form> elements.
$mech->field( $selector, $value, [, $index, \@pre_events [,\@post_events]] )
$mech->field( user => 'joe' ); $mech->field( not_empty => '', 0, [], [] ); # bypass JS validation $mech->field( date => '2020-04-01', 2 ); # set second field named "date"
Sets the field with the name given in $selector to the given value. Returns the value.
$selector
The method understands very basic CSS selectors in the value for $selector, like the HTML::Form find_input() method.
A selector prefixed with '#' must match the id attribute of the input. A selector prefixed with '.' matches the class attribute. A selector prefixed with '^' or with no prefix matches the name attribute.
By passing the array reference @pre_events, you can indicate which Javascript events you want to be triggered before setting the value. @post_events contains the events you want to be triggered after setting the value.
@pre_events
@post_events
By default, the events set in the constructor for pre_events and post_events are triggered.
pre_events
post_events
$mech->sendkeys( %options )
$mech->sendkeys( string => "Hello World" );
Sends a series of keystrokes. The keystrokes can be either a string or a reference to an array containing the detailed data as hashes.
$mech->upload( $selector, $value )
$mech->upload( user_picture => 'C:/Users/Joe/face.png' );
Sets the file upload field with the name given in $selector to the given file. The filename must be an absolute path and filename in the local filesystem.
The method understands very basic CSS selectors in the value for $selector, like the ->field method.
->field
$mech->value( $selector_or_element, [ $index | %options] )
print $mech->value( 'user' );
Returns the value of the field given by $selector_or_name or of the DOM element passed in.
$selector_or_name
If you have multiple fields with the same name, you can use the index to specify the index directly:
print $mech->value( 'date', 2 ); # get the second field named "date"
The legacy form of
$mech->value( name => value );
is not supported anymore.
For fields that can have multiple values, like a select field, the method is context sensitive and returns the first selected value in scalar context and all values in list context.
select
Note that this method does not support file uploads. See the ->upload method for that.
->upload
$mech->get_set_value( %options )
Allows fine-grained access to getting/setting a value with a different API. Supported keys are:
name value pre post
in addition to all keys that $mech->xpath supports.
$mech->xpath
$mech->set_field( %options )
$mech->set_field( field => $field_node, value => 'foo', );
Low level value setting method. Use this if you have an input element outside of a <form> tag.
$mech->select( $name, $value )
$mech->select( $name, \@values )
$mech->select( 'items', 'banana' );
Given the name of a select field, set its value to the value specified. If the field is not <select multiple> and the $value is an array, only the first value will be set. Passing $value as a hash with an n key selects an item by number (e.g. {n => 3} or {n => [2,4]}). The numbering starts at 1. This applies to the current form.
<select multiple>
$value
{n => 3}
{n => [2,4]}
If you have a field with <select multiple> and you pass a single $value, then $value will be added to the list of fields selected, without clearing the others. However, if you pass an array reference, then all previously selected values will be cleared.
Returns true on successfully setting the value. On failure, returns false and calls $self>warn() with an error message.
$self>warn()
$mech->tick( $name, $value [, $set ] )
$mech->tick("confirmation_box", 'yes');
"Ticks" the first checkbox that has both the name and value associated with it on the current form. Dies if there is no named check box for that value. Passing in a false value as the third optional argument will cause the checkbox to be unticked.
(Un)ticking the checkbox is done by sending a click event to it if needed. If $value is undef, the first checkbox matching $name will be (un)ticked.
$name
If $name is a reference to a hash, that hash will be used as the options to ->find_link_dom to find the element.
$mech->untick( $name, $value )
$mech->untick('spam_confirm','yes',undef)
Causes the checkbox to be unticked. Shorthand for
$mech->tick($name,$value,undef)
$mech->submit( $form )
$mech->submit;
Submits the form. Note that this does not fire the onClick event and thus also does not fire eventual Javascript handlers. Maybe you want to use $mech->click instead.
onClick
$mech->click
The default is to submit the current form as returned by $mech->current_form.
$mech->current_form
$mech->submit_form( %options )
$mech->submit_form( with_fields => { user => 'me', pass => 'secret', } );
This method lets you select a form from the previously fetched page, fill in its fields, and submit it. It combines the form_number/form_name, ->set_fields and ->click methods into one higher level call. Its arguments are a list of key/value pairs, all of which are optional.
->set_fields
->click methods
form => $mech->current_form()
Specifies the form to be filled and submitted. Defaults to the current form.
fields => \%fields
Specifies the fields to be filled in the current form
with_fields => \%fields
Probably all you need for the common case. It combines a smart form selector and data setting in one operation. It selects the first form that contains all fields mentioned in \%fields. This is nice because you don't need to know the name or number of the form to do this.
(calls "$mech->form_with_fields()" and "$mech->set_fields()").
If you choose this, the form_number, form_name, form_id and fields options will be ignored.
$mech->set_fields( $name => $value, ... )
$mech->set_fields( user => 'me', pass => 'secret', );
This method sets multiple fields of the current form. It takes a list of field name and value pairs. If there is more than one field with the same name, the first one found is set. If you want to select which of the duplicate field to set, use a value which is an anonymous array which has the field value and its number as the 2 elements.
$mech->set_fields( user => 'me', pass => 'secret', pass => [ 'secret', 2 ], # repeated password field );
$mech->is_visible( $element )
$mech->is_visible( %options )
if ($mech->is_visible( selector => '#login' )) { print "You can log in now."; };
Returns true if the element is visible, that is, it is a member of the DOM and neither it nor its ancestors have a CSS visibility attribute of hidden or a display attribute of none.
visibility
hidden
display
none
You can either pass in a DOM element or a set of key/value pairs to search the document for the element you want.
xpath - the XPath query
selector - the CSS selector
dom - a DOM node
The remaining options are passed through to either the /$mech->xpath or /$mech->selector method.
$mech->wait_until_invisible( $element )
$mech->wait_until_invisible( %options )
$mech->wait_until_invisible( $please_wait );
Waits until an element is not visible anymore.
Takes the same options as "->is_visible" in $mech->is_visible.
In addition, the following options are accepted:
timeout - the timeout after which the function will croak. To catch the condition and handle it in your calling program, use an eval block. A timeout of 0 means to never time out.
timeout
0
See also max_wait if you want to wait a limited time for an element to appear.
max_wait
max_wait - the maximum time to wait until the function will return. A max_wait of 0 means to never time out. If the element is still visible, the function will return a false value.
sleep - the interval in seconds used to sleep. Subsecond intervals are possible.
sleep
Note that when passing in a selector, that selector is requeried on every poll instance. So the following query will work as expected:
xpath => '//*[contains(text(),"stand by")]'
This also means that if your selector query relies on finding a changing text, you need to pass the node explicitly instead of passing the selector.
$mech->wait_until_visible( %options )
$mech->wait_until_visible( selector => 'a.download' );
Waits until an query returns a visible element.
xpath => '//*[contains(text(),"click here for download")]'
$mech->content_as_png()
my $png_data = $mech->content_as_png(); # Create scaled-down 480px wide preview my $png_data = $mech->content_as_png(undef, { width => 480 });
Returns the given tab or the current page rendered as PNG image.
All parameters are optional.
$mech->saveResources_future
my $file_map = $mech->saveResources_future( target_file => 'this_page.html', target_dir => 'this_page_files/', wanted => sub { $_[0]->{url} =~ m!^https?:!i }, )->get();
Rough prototype of "Save Complete Page" feature
$mech->viewport_size
print Dumper $mech->viewport_size; $mech->viewport_size({ width => 1388, height => 792 });
Returns (or sets) the new size of the viewport (the "window").
The recognized keys are:
width height deviceScaleFactor mobile screenWidth screenHeight positionX positionY
$mech->element_as_png( $element )
my $shiny = $mech->selector('#shiny', single => 1); my $i_want_this = $mech->element_as_png($shiny);
Returns PNG image data for a single element
$mech->render_element( %options )
my $shiny = $mech->selector('#shiny', single => 1); my $i_want_this= $mech->render_element( element => $shiny, format => 'png', );
Returns the data for a single element or writes it to a file. It accepts all options of ->render_content.
->render_content
Note that while the image will have the node in the upper left corner, the width and height of the resulting image will still be the size of the browser window. Cut the image using element_coordinates if you need exactly the element.
element_coordinates
$mech->element_coordinates( $element )
my $shiny = $mech->selector('#shiny', single => 1); my ($pos) = $mech->element_coordinates($shiny); print $pos->{left},',', $pos->{top};
Returns the page-coordinates of the $element in pixels as a hash with four entries, left, top, width and height.
$element
left
top
width
height
This function might get moved into another module more geared towards rendering HTML.
$mech->render_content(%options)
my $pdf_data = $mech->render_content( format => 'pdf' );
Returns the current page rendered as PDF or PNG as a bytestring.
Note that the PDF format will only be successful with headless Chrome. At least on Windows, when launching Chrome with a UI, printing to PDF will be unavailable.
$mech->content_as_pdf(%options)
my $pdf_data = $mech->content_as_pdf(); my $pdf_data = $mech->content_as_pdf( format => 'A4' ); my $pdf_data = $mech->content_as_pdf( paperWidth => 8, paperHeight => 11 );
Returns the current page rendered in PDF format as a bytestring. The page format can be specified through the format option.
Note that this method will only be successful with headless Chrome. At least on Windows, when launching Chrome with a UI, printing to PDF will be unavailable. See the html-to-pdf.pl script in the examples/ directory of this distribution.
html-to-pdf.pl
examples/
These are methods that are available but exist mostly as internal helper methods. Use of these is discouraged.
$mech->element_query( \@elements, \%attributes )
my $query = $mech->element_query(['input', 'select', 'textarea'], { name => 'foo' });
Returns the XPath query that searches for all elements with tagNames in @elements having the attributes %attributes. The @elements will form an or condition, while the attributes will form an and condition.
tagName
@elements
%attributes
or
and
This module can collect the screencasts that Chrome can produce. The screencasts are sent to your callback which either feeds them to ffmpeg to create a video out of them or dumps them to disk as sequential images.
ffmpeg
sub saveFrame { my( $mech, $framePNG ) = @_; print $framePNG->{data}; } $mech->setScreenFrameCallback( \&saveFrame ); ... do stuff ... $mech->setScreenFrameCallback( undef ); # stop recording
If you want a premade screencast receiver for debugging headless Chrome sessions, see Mojolicious::Plugin::PNGCast.
$mech->sleep
$mech->sleep( 2 ); # wait for things to settle down
Suspends the progress of the program while still handling messages from Chrome.
The main use of this method is to give Chrome enough time to send all its screencast frames and to catch up before shutting down the connection.
As this module is in a very early stage of development, there are many incompatibilities. The main thing is that only the most needed WWW::Mechanize methods have been implemented by me so far.
At least the following methods are unsupported:
->find_all_inputs
This function is likely best implemented through $mech->selector.
$mech->selector
->find_all_submits
->images
->find_image
->find_all_images
These functions are unlikely to be implemented because they make little sense in the context of Chrome.
->clone
->credentials( $username, $password )
->get_basic_credentials( $realm, $uri, $isproxy )
->clear_credentials()
->put
I have no use for it
->post
This module does not yet support POST requests
See WWW::Mechanize::Chrome::Install
https://developer.chrome.com/devtools/docs/debugging-clients - the Chrome DevTools homepage
https://github.com/GoogleChrome/lighthouse - Google Lighthouse, the main client of the Chrome API
WWW::Mechanize - the module whose API grandfathered this module
WWW::Mechanize::Chrome::Node - objects representing HTML in Chrome
WWW::Mechanize::Firefox - a similar module with a visible application automating Firefox , currently on hiatus, since Mozilla does not yet implement the Chrome DevTools Protocol properly
WWW::Mechanize::PhantomJS - a similar module without a visible application automating PhantomJS , now discontinued since PhantomJS is discontinued
Some articles about what you need to change to appear as a different browser
https://multilogin.com/why-mimicking-a-device-is-almost-impossible/
https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth
The public repository of this module is https://github.com/Corion/www-mechanize-chrome.
The public support forum of this module is https://perlmonks.org/.
I've given a German talk at GPW 2017, see http://act.yapc.eu/gpw2017/talk/7027 and https://corion.net/talks for the slides.
At The Perl Conference 2017 in Amsterdam, I also presented a talk, see http://act.perlconference.org/tpc-2017-amsterdam/talk/7022. The slides for the English presentation at TPCiA 2017 are at https://corion.net/talks/WWW-Mechanize-Chrome/www-mechanize-chrome.en.html.
At the London Perl Workshop 2017 in London, I also presented a talk, see Youtube . The slides for that talk are here.
Please report bugs in this module via the Github bug queue at https://github.com/Corion/WWW-Mechanize-Chrome/issues
Please see WWW::Mechanize::Chrome::Contributing.
Please see WWW::Mechanize::Chrome::Troubleshooting.
Max Maischein corion@cpan.org
corion@cpan.org
Andreas König andk@cpan.org
andk@cpan.org
Tobias Leich froggs@cpan.org
froggs@cpan.org
Steven Dondley s@dondley.org
s@dondley.org
Joshua Pollack
Copyright 2010-2024 by Max Maischein corion@cpan.org.
This module is released under the same terms as Perl itself.
To install WWW::Mechanize::Chrome, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::Mechanize::Chrome
CPAN shell
perl -MCPAN -e shell install WWW::Mechanize::Chrome
For more information on module installation, please visit the detailed CPAN module installation guide.