
CAM::PDF::PageText - Extract text from PDF page tree

my $pdf = CAM::PDF->new($filename); my $pageone_tree = $pdf->getPageContentTree(1); print CAM::PDF::PageText->render($pageone_tree);

This module attempts to extract sequential text from a PDF page. This is not a robust process, as PDF text is graphically laid out in arbitrary order. This module uses a few heuristics to try to guess what text goes next to what other text, but may be fooled easily by, say, subscripts, non-horizontal text, changes in font, form fields etc.
All those disclaimers aside, it is useful for a quick dump of text from a simple PDF file.

Copyright 2006 Clotho Advanced Media, Inc., <cpan@clotho.com>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Turn a page content tree into a string. This is a class method that should be called like:
CAM::PDF::PageText->render($pagetree);

Clotho Advanced Media Inc., cpan@clotho.com
Primary developer: Chris Dolan