View on
James Mastros > OCR-PerfectCR > OCR::PerfectCR



Annotate this POD


New  3
Open  0
View/Report Bugs
Module Version: 0.03   Source  


OCR::PerfectCR - Perfect OCR (if you have perfect input).


    use OCR::PerfectCR;
    use GD;
    my $recognizer = OCR::PerfectCR->new;
    my $image = GD::Image->new("example.png") or die "Can't open example.png: $!";
    my $string = $recognizer->recognize($image);


OCR::PerfectCR is a fast, highly accurate "optical" character recognition engine requiring minimal training. How does it manage this, despite being written in pure perl? By ignoring most of the problems. OCR::PerfectCR requires that your input is in perfect shape -- that it hasn't gone into the real world and been scanned, that each image represent one line of text, and nothing else, and most difficultly, that the font have a fairly wide spacing. This makes it very useful for converting image-based subtitle formats to text, and probably not much else. However, it is very good at doing that.

OCR::PerfectCR's knowledge about a particular font is encapsulated in a "charmap" file, which maps md5 sums of the canonical representation of a character (the first 32 characters of the line) to a string (the 34th and onwards chars, to newline).

Most methods will die on error, rather then trying to recover and return undef.


Loads a charmap file into memory.


Saves the charmap to a file. Charmap files are always saved and loaded in utf8.

$recognizer->recognize($image) (recognise is an alias for this)

Takes the image (a GD::Image object), and tries to convert it into text. In list context, returns a list of hashrefs, each having a str key, whose value is the string in the charmap for that image. There may also be a color (note the spelling) key, with a value between 0 and 360, representing the color of the text in degrees on the color wheel, or undef meaning grey. The color being missing implies that there is nothing there but background -- that is, that it's whitespace. For non-whitespace characters, there is a key md5, which gives the md5 sum of the character in canonical form -- that is, it's charmap entry. Other keys are purposefully not documented -- if you find them useful, please let me know by filing an RT request.

Characters not in the charmap will have their str set to "\x{FFFD}" eq "\N{REPLACEMENT CHARACTER}", and will be added to the charmap. They will also be saved as png files named md5.png in the current directory, so that they a human can look at them and ID them.


Just a boring constructor. No parameters.


Please report bugs on If the bug /might possibly/ be because of your input file, please include it with the bug report.


Copyright 2005 James Mastros,, JMASTROS, theorbtwo. (Those are all the same person.)

May be used and copied under the same terms as perl itself.

Thanks, castaway, for being you, and diotalevi for a detailed review.

syntax highlighting: