The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::Filter::Glossary - SAX2 filter for keyword lookup and replacement

SYNOPSIS

 use XML::SAX::Writer;
 use XML::Filter::Glossary;
 use XML::SAX::ParserFactory;

 my $writer   = XML::SAX::Writer->new();
 my $glossary = XML::Filter::Glossary->new(Handler=>$writer);
 my $parser   = XML::SAX::ParserFactory->parser(Handler=>$glossary);

 $glossary->set_glossary("/usr/home/asc/bookmarks.xbel");
 $parser->parse_string("<?xml version = '1.0' ?><root>This is \"aaronland\"</root>");

 # prints :

 <?xml version = "1.0" ?>
 <root>
  This is <a href='http://www.aaronland.net'>aaronland</a>
 </root>

DESCRIPTION

This package is modelled after the UserLand glossary system where words, or phrases, wrapped in double-quotes are compared against a lookup table and are replaced by their corresponding entries.

Currently only one type of lookup table is supported : a well-formed XBEL bookmarks file. Support for other kinds of lookup tables may be added at a later date.

KEYWORDS

Keywords are flagged as being any word, or words, between double quotes which are then looked up in the glossary. Alternately, you may specify keyword phrases with singleton elements that are the property of a user-defined namespace.

If no match is found, the text is left unaltered.

If a match is located, the result is then parsed with Robert Cameron's REX shallow parsing regular expressions. Chunks of balanced markup are then re-inserted into the SAX stream via XML::Filter::Merger. Anything else, including markup not determined to be well-formed, is added as character data.

PACKAGE METHODS

__PACKAGE__->new()

Inherits from XML::SAX::Base

OBJECT METHODS

$pkg->set_glossary($path)

Set the path to your glossary file.

$pkg->register_namespace()

Register data to allow the filter to recognize specific tags as containing data to be used for keyword lookup.

Valid arguments are

  • hash reference

    • Prefix

      String.

      The prefix for your glossary namespace.

    • NamespaceURI

      String.

      The URI for your glossary namespace.

    • KeywordAttr

      String.

      Default value is "id"

     # Use <g:keyword /> syntax
     $glossary->register_namespace({
                                    Prefix       => "g",
                                    NamespaceURI => "http://www.aaronland.net/glossary"
                                   });
    
     # Use <g:keyword phrase = "keyword with spaces" /> syntax
     $glossary->register_namespace({
                                    Prefix       => "g",
                                    NamespaceURI => "http://www.aaronland.net/glossary",
                                    KeywordAttr  => "phrase",
                                   });
  • zero

     # Toggle back to default double-quote syntax
     $glossary->register_namespace(0);

VERSION

0.2

DATE

September 12, 2002

AUTHOR

Aaron Straup Cope

TO DO

  • Support for Netscape bookmarks

  • Support for IE Favorites (via XML::Directory::SAX)

  • Support for UserLand glossaries (serialized)

BACKGROUND

http://www.la-grange.net/2002/09/04.html

http://aaronland.info/weblog/archive/4586

SEE ALSO

http://glossary.userland.com/

http://pyxml.sourceforge.net/topics/xbel/

http://www.cs.sfu.ca/~cameron/REX.html

XML::SAX

XML::Filter::Merger

BUGS

  • Certainly, not outside the realm of possibility.

LICENSE

Copyright (c) 2002, Aaron Straup Cope. All Rights Reserved.

This is free software, you may use it and distribute it under the same terms as Perl itself.