The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Text::Extract::Word - Extract text from Word files

SYNOPSIS

    use Text::Extract::Word qw(get_all_text);
    
    my $text = get_all_text("test1.doc");

DESCRIPTION

This simple module allows the textual contents to be extracted from a Word file. The code was ported from Java code, originally part of the Apache POE project, but extensive code changes were made interanlly.

FUNCTIONS

get_all_text($filename)

The only function exported by this module, when called on a file name, returns the text contents of the Word file. The contents are returned as UTF-8 encoded text.

BUGS

  • support for legacy Word - the module does not extract text from Word version 6 or earlier

SEE ALSO

OLE::Storage also has a script lhalw (Let's Have a Look at Word) which extracts text from Word files. This is simply a much smaller module with lighter dependencies, using OLE::Storage_Lite for its storage management.

AUTHOR

Stuart Watt, stuart@morungos.com

COPYRIGHT

Copyright (c) 2010 Stuart Watt. All rights reserved.