OCR::Naive - convert images into text in an extremely naive fashion
The module implements a very simple and unsophisticated OCR by finding all known images in a larger image. The known images are mapped to text using the preexisting dictionary, and the text lines are returned.
The interesting stuff here is the image finding itself - it is done by a regexp! For all practical reasons, images can be easily treated as byte strings, and regexps are not exception. For example, one needs to locate an image 2x2 in larger 7x7 image. The regexp constructed should be the first scanline of smaller image, 2 bytes, verbatim, then 7 - 2 = 5 of any character, and finally the second scanline, 2 bytes again. Of course there are some quirks, but these explained in API section.
Dictionaries for different fonts can be created interactively by bin/makedict; the non-interactive recognition is performed by bin/ocr which is a mere wrapper to this module.
bin/makedict
bin/ocr
use Prima::noX11; # Prima imaging required use OCR::Naive; # load a dictionary created by bin/makedict $db = load_dictionary( 'my.dict'); # load image to recognize my $i = Prima::Image-> load( 'screenshot.png' ); $i = enhance_image( $i ); # ocr! print "$_\n" for recognize( $i, $db);
Loads a glyph dictionary from $FILE, returns a dictionary hash table. If not loaded, returns undef and $! contains the error.
undef
$!
Saves a glyph dictionary from $DB into $FILE, returns success flag. If failed, $! contains the error.
The dictionary is intended to be a simple hash, where the key is the image pixel data, and value is a hash of image attributes - width, height, text, and possible something more for the future. The key currently is image data verbatim, and image2db_key returns the data of $IMAGE.
image2db_key
Locates a $SUBIMAGE in $IMAGE, returns one or many matches, depending on $MULTIPLE. If single match is requested, stops on the first match, and returns a pair of (X,Y) coordinates. If $MULTIPLE is 1, returns array of (X,Y) pairs. In both modes, returns empty list if nothing was found.
When more than one subimage is to be found on a larger image, it is important that parts of larger glyphs are not eventually attributed to smaller ones. For example, letter ('i') might be detected as a combination of ('dot') and ('dotlessi'). To avoid this suggest_glyph_order sorts all dictionary entries by their occupied area, larger first, and returns sorted set of keys.
('i')
('dot')
('dotlessi')
suggest_glyph_order
Glyphs in dictionary are black-and-white images, and the ideal detection should also happed on 2-color images. enhance_image tries to enhance the contrast of the image, find histogram peaks, and detect what is foreground and what is background, and finally converts the image into a black-and-white.
enhance_image
This procedure is of course nowhere near any decent pre-OCR image processing, so don't expect much. OTOH it might be serve a good-enough quick hack for screen dumps.
If $OPTIONS{verbose} is set, prints details is it goes.
$OPTIONS{verbose}
Given a dictionary $DB, recognizes all text it can find on $IMAGE. Returns array of text lines.
The spaces are a problem with approach, and even though recognize tries to deduce a minimal width in pixels that should not be treated a <C('space')> character, it will inevitably fail. Set $OPTION{minspace} to the space width if you happen to know what font you're detecting.
recognize
$OPTION{minspace}
Prima, IPA
OCR::PerfectCR, PDF::OCR
Copyright (c) 2007 capmon ApS. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Dmitry Karasik, <dmitry@karasik.eu.org>.
To install OCR::Naive, copy and paste the appropriate command in to your terminal.
cpanm
cpanm OCR::Naive
CPAN shell
perl -MCPAN -e shell install OCR::Naive
For more information on module installation, please visit the detailed CPAN module installation guide.