HTML::Microformats - parse microformats in HTML
use HTML::Microformats; my $doc = HTML::Microformats ->new_document($html, $uri) ->assume_profile(qw(hCard hCalendar)); print $doc->json(pretty => 1); use RDF::TrineShortcuts qw(rdf_query); my $results = rdf_query($sparql, $doc->model);
The HTML::Microformats module is a wrapper for parser and handler modules of various individual microformats (each of those modules has a name like HTML::Microformats::Format::Foo).
The general pattern of usage is to create an HTML::Microformats object (which corresponds to an HTML document) using the new_document method; then ask for the data, as a Perl hashref, a JSON string, or an RDF::Trine model.
new_document
$doc = HTML::Microformats->new_document($html, $uri, %opts)
Constructs a document object.
$html is the HTML or XHTML source (string) or an XML::LibXML::Document.
$uri is the document URI, important for resolving relative URL references.
%opts are additional parameters; currently only one option is defined: $opts{'type'} is set to 'text/html' or 'application/xhtml+xml', to control how $html is parsed.
HTML::Microformats uses HTML profiles (i.e. the profile attribute on the HTML <head> element) to detect which Microformats are used on a page. Any microformats which do not have a profile URI declared will not be parsed.
Because many pages fail to properly declare which profiles they use, there are various profile management methods to tell HTML::Microformats to assume the presence of particular profile URIs, even if they're actually missing.
$doc->profiles
This method returns a list of profile URIs declared by the document.
$doc->has_profile(@profiles)
This method returns true if and only if one or more of the profile URIs in @profiles is declared by the document.
$doc->add_profile(@profiles)
Using add_profile you can add one or more profile URIs, and they are treated as if they were found on the document.
add_profile
For example:
$doc->add_profile('http://microformats.org/profile/rel-tag')
This is useful for adding profile URIs declared outside the document itself (e.g. in HTTP headers).
Returns a reference to the document.
$doc->assume_profile(@microformats)
$doc->assume_profile(qw(hCard adr geo))
This method acts similarly to add_profile but allows you to use names of microformats rather than URIs.
Microformat names are case sensitive, and must match HTML::Microformats::Format::Foo module names.
$doc->assume_all_profiles
This method is equivalent to calling assume_profile for all known microformats.
assume_profile
Generally speaking, you can skip this. The data, json and model methods will automatically do this for you.
data
json
model
$doc->parse_microformats
Scans through the document, finding microformat objects.
On subsequent calls, does nothing (as everything is already parsed).
$doc->clear_microformats
Forgets information gleaned by parse_microformats and thus allows parse_microformats to be run again. This is useful if you've modified added some profiles between runs of parse_microformats.
parse_microformats
These methods allow you to retrieve the document's data, and do things with it.
$doc->objects($format);
$format is, for example, 'hCard', 'adr' or 'RelTag'.
Returns a list of objects of that type. (If called in scalar context, returns an arrayref.)
Each object is, for example, an HTML::Microformat::hCard object, or an HTML::Microformat::RelTag object, etc. See the relevent documentation for details.
$doc->all_objects
Returns a hashref of data. Each hashref key is the name of a microformat (e.g. 'hCard', 'RelTag', etc), and the values are arrayrefs of objects.
$doc->json(%opts)
Returns data roughly equivalent to the all_objects method, but as a JSON string.
all_objects
%opts is a hash of options, suitable for passing to the JSON module's to_json function. The 'convert_blessed' and 'utf8' options are enabled by default, but can be disabled by explicitly setting them to 0, e.g.
print $doc->json( pretty=>1, canonical=>1, utf8=>0 );
$doc->model
Returns data as an RDF::Trine::Model, suitable for serialising as RDF or running SPARQL queries.
$object->serialise_model(as => $format)
As model but returns a string.
$doc->add_to_model($model)
Adds data to an existing RDF::Trine::Model.
HTML::Microformats->modules
Returns a list of Perl modules, each of which implements a specific microformat.
HTML::Microformats->formats
As per modules, but strips 'HTML::Microformats::Format::' off the module name, and sorts alphabetically.
modules
There already exist two microformats packages on CPAN (see Text::Microformat and Data::Microformat), so why create another?
Firstly, HTML::Microformats isn't being created from scratch. It's actually a fork/clean-up of a non-CPAN application (Swignition), and in that sense predates Text::Microformat (though not Data::Microformat).
It has a number of other features that distinguish it from the existing packages:
It supports more formats.
HTML::Microformats supports hCard, hCalendar, rel-tag, geo, adr, rel-enclosure, rel-license, hReview, hResume, hRecipe, xFolk, XFN, hAtom, hNews and more.
It supports more patterns.
HTML::Microformats supports the include pattern, abbr pattern, table cell header pattern, value excerpting and other intricacies of microformat parsing better than the other modules on CPAN.
It offers RDF support.
One of the key features of HTML::Microformats is that it makes data available as RDF::Trine models. This allows your application to benefit from a rich, feature-laden Semantic Web toolkit. Data gleaned from microformats can be stored in a triple store; output in RDF/XML or Turtle; queried using the SPARQL or RDQL query languages; and more.
If you're not comfortable using RDF, HTML::Microformats also makes all its data available as native Perl objects.
Please report any bugs to http://rt.cpan.org/.
HTML::Microformats::Documentation::Notes.
Individual format modules:
HTML::Microformats::Format::adr
HTML::Microformats::Format::figure
HTML::Microformats::Format::geo
HTML::Microformats::Format::hAtom
HTML::Microformats::Format::hAudio
HTML::Microformats::Format::hCalendar
HTML::Microformats::Format::hCard
HTML::Microformats::Format::hListing
HTML::Microformats::Format::hMeasure
HTML::Microformats::Format::hNews
HTML::Microformats::Format::hProduct
HTML::Microformats::Format::hRecipe
HTML::Microformats::Format::hResume
HTML::Microformats::Format::hReview
HTML::Microformats::Format::hReviewAggregate
HTML::Microformats::Format::OpenURL_COinS
HTML::Microformats::Format::RelEnclosure
HTML::Microformats::Format::RelLicense
HTML::Microformats::Format::RelTag
HTML::Microformats::Format::species
HTML::Microformats::Format::VoteLinks
HTML::Microformats::Format::XFN
HTML::Microformats::Format::XMDP
HTML::Microformats::Format::XOXO
Similar modules: RDF::RDFa::Parser, HTML::HTML5::Microdata::Parser, XML::Atom::Microformats, Text::Microformat, Data::Microformats.
Related web sites: http://microformats.org/, http://www.perlrdf.org/.
Toby Inkster <tobyink@cpan.org>.
Copyright 2008-2012 Toby Inkster
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
To install HTML::Microformats, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::Microformats
CPAN shell
perl -MCPAN -e shell install HTML::Microformats
For more information on module installation, please visit the detailed CPAN module installation guide.