Peter Karman > SWISH-Filter > SWISH::Filters::Doc2txt

Download:
SWISH-Filter-0.190.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Module Version: 0.190   Source  

NAME ^

SWISH::Filters::Doc2txt - Perl extension for filtering MSWord documents with Swish-e

DESCRIPTION ^

This is a plug-in module that uses the "catdoc" program to convert MS Word documents to text for indexing by Swish-e. "catdoc" can be downloaded from:

    http://www.ice.ru/~vitus/catdoc/ver-0.9.html

The program "catdoc" must be installed and your PATH before running Swish-e.

BUGS ^

This filter does not specify input or output character encodings. This will change in the future to all use of the user_data to set the encoding.

A minor optimization during spidering (i.e. when docs are in memory instead of on disk) would be to use open2() call to let catdoc read from stdin instead of from a file.

AUTHOR ^

Bill Moseley

SEE ALSO ^

SWISH::Filter

syntax highlighting: