Mojo::DOM - Minimalistic HTML5/XML DOM parser with CSS3 selectors
use Mojo::DOM; # Parse my $dom = Mojo::DOM->new('<div><p id="a">A</p><p id="b">B</p></div>'); # Find my $b = $dom->at('#b'); say $b->text; # Walk say $dom->div->p->[0]->text; say $dom->div->children('p')->first->{id}; # Iterate $dom->find('p[id]')->each(sub { say shift->{id} }); # Loop for my $e ($dom->find('p[id]')->each) { say $e->text; } # Modify $dom->div->p->[1]->append('<p id="c">C</p>'); # Render say $dom;
Mojo::DOM is a minimalistic and relaxed HTML5/XML DOM parser with CSS3 selector support. It will even try to interpret broken XML, so you should not use it for validation.
Mojo::DOM defaults to HTML5 semantics, that means all tags and attributes are lowercased and selectors need to be lowercase as well.
my $dom = Mojo::DOM->new('<P ID="greeting">Hi!</P>'); say $dom->at('p')->text; say $dom->p->{id};
If XML processing instructions are found, the parser will automatically switch into XML mode and everything becomes case sensitive.
my $dom = Mojo::DOM->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>'); say $dom->at('P')->text; say $dom->P->{ID};
XML detection can be also disabled with the xml method.
xml
# XML sematics $dom->xml(1); # HTML5 semantics $dom->xml(0);
Mojo::DOM implements the following methods.
new
my $dom = Mojo::DOM->new; my $dom = Mojo::DOM->new('<foo bar="baz">test</foo>');
Construct a new Mojo::DOM object.
all_text
my $trimmed = $dom->all_text; my $untrimmed = $dom->all_text(0);
Extract all text content from DOM structure, smart whitespace trimming is disabled by default. Note that the trim argument of this method is EXPERIMENTAL and might change without warning!
# "foo bar baz" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->all_text; # "foo\nbarbaz\n" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->all_text(0);
append
$dom = $dom->append('<p>Hi!</p>');
Append to element.
# "<div><h1>A</h1><h2>B</h2></div>" $dom->parse('<div><h1>A</h1></div>')->at('h1')->append('<h2>B</h2>');
append_content
$dom = $dom->append_content('<p>Hi!</p>');
Append to element content.
# "<div><h1>AB</h1></div>" $dom->parse('<div><h1>A</h1></div>')->at('h1')->append_content('B');
at
my $result = $dom->at('html title');
Find a single element with CSS3 selectors. All selectors from Mojo::DOM::CSS are supported.
attrs
my $attrs = $dom->attrs; my $foo = $dom->attrs('foo'); $dom = $dom->attrs({foo => 'bar'}); $dom = $dom->attrs(foo => 'bar');
Element attributes.
charset
my $charset = $dom->charset; $dom = $dom->charset('UTF-8');
Alias for "charset" in Mojo::DOM::HTML.
children
my $collection = $dom->children; my $collection = $dom->children('div');
Return a Mojo::Collection object containing the children of this element, similar to find.
find
content_xml
my $xml = $dom->content_xml;
Render content of this element to XML.
my $collection = $dom->find('html title');
Find elements with CSS3 selectors and return a Mojo::Collection object. All selectors from Mojo::DOM::CSS are supported.
# Find a specific element and extract information my $id = $dom->find('div')->[23]->{id}; # Extract information from multiple elements my @headers = $dom->find('h1, h2, h3')->map(sub { shift->text })->each;
namespace
my $namespace = $dom->namespace;
Find element namespace.
parent
my $parent = $dom->parent;
Parent of element.
parse
$dom = $dom->parse('<foo bar="baz">test</foo>');
Alias for "parse" in Mojo::DOM::HTML.
prepend
$dom = $dom->prepend('<p>Hi!</p>');
Prepend to element.
# "<div><h1>A</h1><h2>B</h2></div>" $dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend('<h1>A</h1>');
prepend_content
$dom = $dom->prepend_content('<p>Hi!</p>');
Prepend to element content.
# "<div><h2>AB</h2></div>" $dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend_content('A');
replace
$dom = $dom->replace('<div>test</div>');
Replace elements.
# "<div><h2>B</h2></div>" $dom->parse('<div><h1>A</h1></div>')->at('h1')->replace('<h2>B</h2>');
replace_content
$dom = $dom->replace_content('test');
Replace element content.
# "<div><h1>B</h1></div>" $dom->parse('<div><h1>A</h1></div>')->at('h1')->replace_content('B');
root
my $root = $dom->root;
Find root node.
text
my $trimmed = $dom->text; my $untrimmed = $dom->text(0);
Extract text content from element only (not including child elements), smart whitespace trimming is disabled by default. Note that the trim argument of this method is EXPERIMENTAL and might change without warning!
# "foo baz" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->text; # "foo\nbaz\n" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->text(0);
text_after
my $trimmed = $dom->text_after; my $untrimmed = $dom->text_after(0);
Extract text content immediately following element, smart whitespace trimming is disabled by default. Note that this method is EXPERIMENTAL and might change without warning!
# "baz" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_after; # "baz\n" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_after(0);
text_before
my $trimmed = $dom->text_before; my $untrimmed = $dom->text_before(0);
Extract text content immediately preceding element, smart whitespace trimming is disabled by default. Note that this method is EXPERIMENTAL and might change without warning!
# "foo" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_before; # "foo\n" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_before(0);
to_xml
my $xml = $dom->to_xml;
Render DOM to XML.
tree
my $tree = $dom->tree; $dom = $dom->tree(['root', ['text', 'lalala']]);
Alias for "tree" in Mojo::DOM::HTML.
type
my $type = $dom->type; $dom = $dom->type('div');
Element type.
$dom->children->each(sub { say $_->type });
my $xml = $dom->xml; $dom = $dom->xml(1);
Alias for "xml" in Mojo::DOM::HTML. Note that this method is EXPERIMENTAL and might change without warning!
In addition to the methods above, many child elements are also automatically available as object methods, which return a Mojo::DOM or Mojo::Collection object, depending on number of children.
say $dom->div->text; say $dom->div->[23]->text; $dom->div->each(sub { say $_->text });
Direct hash access to element attributes is also possible.
say $dom->{foo}; say $dom->div->{id};
Mojolicious, Mojolicious::Guides, http://mojolicio.us.
To install Mojolicious, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Mojolicious
CPAN shell
perl -MCPAN -e shell install Mojolicious
For more information on module installation, please visit the detailed CPAN module installation guide.