View on
MetaCPAN is shutting down
For details read Perl NOC. After June 25th this page will redirect to
Ken Williams > Statistics-Contingency > Statistics::Contingency



Annotate this POD


New  1
Open  0
View/Report Bugs
Module Version: 0.09   Source  


Statistics::Contingency - Calculate precision, recall, F1, accuracy, etc.


version 0.09


 use Statistics::Contingency;
 my $s = new Statistics::Contingency(categories => \@all_categories);
 while (...something...) {
   $s->add_result($assigned_categories, $correct_categories);
 print "Micro F1: ", $s->micro_F1, "\n"; # Access a single statistic
 print $s->stats_table; # Show several stats in table form


The Statistics::Contingency class helps you calculate several useful statistical measures based on 2x2 "contingency tables". I use these measures to help judge the results of automatic text categorization experiments, but they are useful in other situations as well.

The general usage flow is to tally a whole bunch of results in the Statistics::Contingency object, then query that object to obtain the measures you are interested in. When all results have been collected, you can get a report on accuracy, precision, recall, F1, and so on, with both macro-averaging and micro-averaging over categories.

Macro vs. Micro Statistics

All of the statistics offered by this module can be calculated for each category and then averaged, or can be calculated over all decisions and then averaged. The former is called macro-averaging (specifically, macro-averaging with respect to category), and the latter is called micro-averaging. The two procedures bias the results differently - micro-averaging tends to over-emphasize the performance on the largest categories, while macro-averaging over-emphasizes the performance on the smallest. It's often best to look at both of them to get a good idea of how your data distributes across categories.

Statistics available

All of the statistics are calculated based on a so-called "contingency table", which looks like this:

              Correct=Y   Correct=N
 Assigned=Y |     a     |     b     |
 Assigned=N |     c     |     d     |

a, b, c, and d are counts that reflect how the assigned categories matched the correct categories. Depending on whether a macro-statistic or a micro-statistic is being calculated, these numbers will be tallied per-category or for the entire result set.

The following statistics are available:

The F1 measure is often the only simple measure that is worth trying to maximize on its own - consider the fact that you can get a perfect precision score by always assigning zero categories, or a perfect recall score by always assigning every category. A truly smart system will assign the correct categories and only the correct categories, maximizing precision and recall at the same time, and therefore maximizing the F1 score.

Sometimes it's worth trying to maximize the accuracy score, but accuracy (and its counterpart error) are considered fairly crude scores that don't give much information about the performance of a categorizer.


The general execution flow when using this class is to create a Statistics::Contingency object, add a bunch of results to it, and then report on the results.


Ken Williams <>


Copyright 2002-2008 Ken Williams. All rights reserved.

This distribution is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: