NAME

SWISH::Filter::Document - object model for result of SWISH::Filter

DESCRIPTION

A SWISH::Filter::Document object is returned by the SWISH::Filter convert() method. This class is intended to be used privately, but you might subclass it in order to extend or modify its behaviour.

METHODS

These methods are available to Filter authors, and also provide access to the document after calling the convert() method to end-users of SWISH::Filter.

End users of SWISH::Filter will use a subset of these methods. See "User Methods".

Filter authors will also be interested in the "Author Methods" secion. The filter() method in each Filter module is passed a SWISH::Filter::Document object. Method calls may be made on this object to check the document's current content type, or to fetch the document as either a file name or a reference to a scalar containing the document content.

User Methods

These methods are intended primarily for those folks using SWISH::Filter. If you are writing a filter, see also "Author Methods".

fetch_doc_reference

Returns a scalar reference to the document. This can be used when the filter can operate on the document in memory (or if an external program expects the input to be from standard input).

If the file is currently on disk then it will be read into memory. If the file was stored in a temporary file on disk the file will be deleted once read into memory. The file will be read in binmode if $doc->is_binary is true.

Note that fetch_doc() is an alias.

was_filtered

Returns true if some filter processed the document

content_type

Fetches the current content type for the document.

Example:

    return unless $filter->content_type =~ m!application/pdf!;

swish_parser_type

Returns a parser type based on the content type. Returns undef if no parser type is mapped.

is_binary

Returns true if the document's content-type does not match "text/".

Author Methods

These methods are intended for those folks writing filters.

fetch_filename

Returns a path to the document as stored on disk. This name can be passed to external programs (e.g. catdoc) that expect input as a file name.

If the document is currently in memory then a temporary file will be created. Do not expect the file name passed to be the real path of the document.

The file will be written in binmode if is_binary() returns true.

This method is not normally used by end-users of SWISH::Filter.

set_continue

Processing will continue to the next filter if this is set to a true value. This should be set for filters that change encodings or uncompress documents.

set_content_type( type );

Sets the content type for a document.

name

Fetches the name of the current file. This is useful for printing out the name of the file in an error message. This is the name passed in to the SWISH::Filter convert() method. It is optional and thus may not always be set.

    my $name = $doc_object->name || 'Unknown name';
    warn "File '$name': failed to convert -- file may be corrupt\n";

user_data

Fetches the the user_data passed in to the filter. This can be any data or data structure passed into SWISH::Filter new().

This is an easy way to pass special parameters into your filters.

Example:

    my $data = $doc_object->user_data;
    # see if a choice for the <title> was passed in
    if ( ref $data eq 'HASH' && $data->{pdf2html}{title_field}  {
       ...
       ...
    }

meta_data

Similar to user_data() but specifically intended for name/value pairs in the meta tags in HTML or XML documents.

If set, either via new() or explicitly via the meta_data() method, the value of meta_data() can be used in a filter to set meta headers.

The value of meta_data() should be a hash ref so that it is easy to pass to SWISH::Filters::Base->format_meta_headers().

After a document is filtered, the meta_data() method can be used to retrieve the values that the filter inserted into the filtered document. This value is (again) a hash ref, and is set by the SWISH::Filter module if the filter() method returns a second value. Because the filter module might also extract meta data from the document itself, and might insert some of its own, it is up to the individual filter to determine how and what it handles meta data. See SWISH::Filters::Pdf2HTML for an example.

See the filter() method description in SWISH::Filter, the section on WRITING FILTERS.

Example:

 my $doc = $filter->convert( meta_data => {foo => 'bar'} );
 my $meta = $doc->meta_data;
 # $meta *probably* is {foo => 'bar'} but that's up to how the filter handled
 # the value passed in convert(). Could also be { foo => 'bar', title => 'some title' }
 # for example.

TESTING

Filters can be tested with the swish-filter-test program in the example/ directory. Run:

   swish-filter-test -man

for documentation.

SUPPORT

Please contact the Swish-e discussion list. http://swish-e.org

AUTHOR

Bill Moseley

Currently maintained by Peter Karman perl@peknet.com

COPYRIGHT

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.