The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.



- merge paragraph heuristics for putting unfinished sentences together
- better approach for finding word boundaries based on
  a unigram LM and dynamic programming
- better de-hyphenation in pdfxtk-mode

* v0.2.4 Fri Mar 15 10:34:42 CET 2013

- pdfxtk as default
- heuristics to handle ligatures
- dehyphenation and other heurstics in pdfxtk-mode (-X)
- now also splits strings into characters to find known words
  (solves a problem with pdfxtk conversions)

* v0.2.3 Wed Mar  6 23:12:22 CET 2013

- fixed test suite

* v0.2.2 Wed Feb 27 20:33:09 CET 2013

- fixed problem with wrong shared-dir settings
- make word-merging a bit more efficient

* v0.2.1 Fri Feb 15 16:29:41 CET 2013

- add pdfXtk as another option for converting pdf files
  (see http://sourceforge.net/projects/pdfxtk/)

* v0.2 - Thu Feb  7 10:51:16 CET 2013

- running without pdftotext is now possible
- added lowercasing (can be switched off)

* v0.1 - Tue Jan 29 20:31:42 CET 2013

- initial release