Toby Inkster > HTML-HTML5-Outline > HTML::HTML5::Outline

Download:
HTML-HTML5-Outline-0.006.tar.gz

Dependencies

Annotate this POD

Website

CPAN RT

Open  0
View/Report Bugs
Module Version: 0.006   Source  

NAME ^

HTML::HTML5::Outline - implementation of the HTML5 Outline algorithm

SYNOPSIS ^

        use JSON;
        use HTML::HTML5::Outline;
        
        my $html = <<'HTML';
        <!doctype html>
        <h1>Hello</h1>
        <h2>World</h2>
        <h1>Good Morning</h1>
        <h2>Vietnam</h2>
        HTML
        
        my $outline = HTML::HTML5::Outline->new($html);
        print to_json($outline->to_hashref, {pretty=>1,canonical=>1});

DESCRIPTION ^

This is an implementation of the HTML5 Outline algorithm, as per http://www.w3.org/TR/html5/sections.html#outlines.

The module can output a JSON-friendly hashref, or an RDF model.

Constructor

Object Methods

Class Methods

USE WITH RDF::RDFA::PARSER ^

This module produces RDF data where many of the resources described are HTML elements. RDFa data typically does not, but RDF::RDFa::Parser does also support some extensions to RDFa which do (e.g. support for the cite and role attributes). It's useful to combine the RDF data from each, and RDF::RDFa::Parser 1.093 and upwards contains a few shims to make this possible.

Without further ado...

        use HTML::HTML5::Outline;
        use RDF::RDFa::Parser 1.093;
        use RDF::TrineShortcuts;

        my $rdfa = RDF::RDFa::Parser->new(
                $html_source,
                $base_url,
                RDF::RDFa::Parser::Config->new(
                        'html5', '1.1',
                        role_attr     => 1,
                        cite_attr     => 1,
                        longdesc_attr => 1,
                        ),
                )->consume;
        
        my $outline = HTML::HTML5::Outline->new(
                $rdfa->dom,
                uri              => $rdfa->uri,
                element_subjects => $rdfa->element_subjects,
                );
        
        # Merging two graphs is pretty complicated in RDF::Trine
        # but a little easier with RDF::TrineShortcuts...
        my $combined = rdf_parse();
        rdf_parse($rdfa->graph,     model => $combined);
        rdf_parse($outline->to_rdf, model => $combined);
        
        my $NS = {
                dc    => 'http://purl.org/dc/terms/',
                o     => 'http://ontologi.es/outline#',
                type  => 'http://purl.org/dc/dcmitype/',
                xs    => 'http://www.w3.org/2001/XMLSchema#',
                xhv   => 'http://www.w3.org/1999/xhtml/vocab#',
                };
        
        print rdf_string($combined => 'Turtle', namespaces => $NS);

SEE ALSO ^

HTML::HTML5::Outline::RDF, HTML::HTML5::Outline::Outlinee, HTML::HTML5::Outline::Section.

HTML::HTML5::Parser, HTML::HTML5::Sanity.

AUTHOR ^

Toby Inkster, <tobyink@cpan.org>

ACKNOWLEDGEMENTS ^

This module is a fork of the document structure parser from Swignition <http://buzzword.org.uk/swignition/>.

That in turn includes the following credits: thanks to Ryan King and Geoffrey Sneddon for pointing me towards [the HTML5] algorithm. I also used Geoffrey's python implementation as a crib sheet to help me figure out what was supposed to happen when the HTML5 spec was ambiguous.

COPYRIGHT AND LICENCE ^

Copyright (C) 2008-2011 by Toby Inkster

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: