Mons Anderson > XML-Fast > XML::Fast

Download:
XML-Fast-0.11.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  1
Open  5
View/Report Bugs
Module Version: 0.11   Source  

NAME ^

XML::Fast - Simple and very fast XML to hash conversion

SYNOPSIS ^

  use XML::Fast;
  
  my $hash = xml2hash $xml;
  my $hash2 = xml2hash $xml, attr => '.', text => '~';

DESCRIPTION ^

This module implements simple, state machine based, XML parser written in C.

It could parse and recover some kind of broken XML's. If you need XML validator, use XML::LibXML

RATIONALE ^

Another similar module is XML::Bare. I've used it for some time, but it have some failures:

So, after count of tries to fix XML::Bare I've decided to write parser from scratch.

It is about 40% faster than XML::Bare and about 120% faster, than XML::LibXML

I got this results using the following test on 35kb xml doc:

    cmpthese timethese -10, {
        libxml  => sub { XML::LibXML->new->parse_string($doc) },
        xmlfast => sub { XML::Fast::xml2hash($doc) },
        xmlbare => sub { XML::Bare->new(text => $doc)->parse },
    };

              Rate  libxml xmlbare xmlfast
    libxml  1107/s      --    -38%    -56%
    xmlbare 1782/s     61%      --    -28%
    xmlfast 2490/s    125%     40%      --

Of course, the results could be defferent for different xml files. With non-utf encodings and with many entities it could be slower. This test was taken for a sample RSS feed in utf-8 mode with a small count of xml entities.

Here is some features and principles:

EXPORT ^

xml2hash $xml, [ %options ]

OPTIONS ^

order [ = 0 ]

Not implemented yet. Strictly keep the output order. When enabled, structures become more complex, but xml could be completely reverted.

attr [ = '-' ]

Attribute prefix

    <node attr="test" />  =>  { node => { -attr => "test" } }
text [ = '#text' ]

Key name for storing text

When undef, text nodes will be ignored

    <node>text<sub /></node>  =>  { node => { sub => '', '#text' => "test" } }
join [ = '' ]

Join separator for text nodes, splitted by subnodes

Ignored when order in effect

    # default:
    xml2hash( '<item>Test1<sub />Test2</item>' )
    : { item => { sub => '', '~' => 'Test1Test2' } };
    
    xml2hash( '<item>Test1<sub />Test2</item>', join => '+' )
    : { item => { sub => '', '~' => 'Test1+Test2' } };
trim [ = 1 ]

Trim leading and trailing whitespace from text nodes

cdata [ = undef ]

When defined, CDATA sections will be stored under this key

    # cdata = undef
    <node><![CDATA[ test ]]></node>  =>  { node => 'test' }

    # cdata = '#'
    <node><![CDATA[ test ]]></node>  =>  { node => { '#' => 'test' } }
comm [ = undef ]

When defined, comments sections will be stored under this key

When undef, comments will be ignored

    # comm = undef
    <node><!-- comm --><sub/></node>  =>  { node => { sub => '' } }

    # comm = '/'
    <node><!-- comm --><sub/></node>  =>  { node => { sub => '', '/' => 'comm' } }
array => 1

Force all nodes to be kept as arrays.

    # no array
    <node><sub/></node>  =>  { node => { sub => '' } }

    # array = 1
    <node><sub/></node>  =>  { node => [ { sub => [ '' ] } ] }
array => [ 'node', 'names']

Force nodes with names to be stored as arrays

    # no array
    <node><sub/></node>  =>  { node => { sub => '' } }

    # array => ['sub']
    <node><sub/></node>  =>  { node => { sub => [ '' ] } }

SEE ALSO ^

TODO ^

Patches, propositions and bug reports are welcome ;)

AUTHOR ^

Mons Anderson, <mons@cpan.org>

COPYRIGHT AND LICENSE ^

Copyright (C) 2010 Mons Anderson

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: