The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

reprec - calculate recall precision curves for TREC style retrieval results

SYNOPSIS

reprec -numdocs numdocs -collection collection -searchresult searchresult -maxdocs maxdocs [-output output] [-(no)single] [-(no)average] [-recall recall-points] [-(no)gnuplot] reprec [-help] [-version]

DESCRIPTION

With reprec one can calculate recall precision curves using TREC style result and relevance judgements files. The judgements file (option -collection) must be in the following format: each line represents the relevance judgement for a single document w. r. t. a single query: column 1 holds the query id, column 3 the document id and column 4 the relevance judgement (1 if relevant, 0 else). Column 2 is not used, the columns are seperatet by blanks or tabs.

In the search result files again each line represents the rank of a single document w. r. t. a single query. Column 1 holds the query id, column 2 is unused, column 3 the document id, column 4 the rank (unused), column 5 the retrieval status value (rsv), column 6 the run identifier (used in the output files if present). The file must be sorted by query id, i. e. lines representing the results for a given query must be blcked together. For each query the results must be sorted by decreasing RSVs.

OPTIONS

Option names may be abbreviated to uniqueness.

-numdocs numdocs

Give number of documents in collection. Needed to compute the very last rank.

-collection collection

Specify file with collection relevance judgements.

-searchresult searchresult

Specify file with search results.

-maxdocs maxdocs

consider the top maxdocs result documents for each query only in order to derive recall precision curves.

-output output

Specify prefix for output files. Defaults to /tmp/RP.

-single

Compute recall-precision graphs for individula results (default is not to do that, equivalent to -nosingle).

-gnuplot

Tells reprec to show the calculated RP graph(s) with gnuplot (default). This may not be desirable when e.g. the computation is done remotely. Use -nognuplot to turn this off and only write the gnuplot data files.

-average

Compute recall-precision graph by averaging individual results. This is the default, use -noaverage in order to avoid averaging.

-recall recall-points

Specify number of recall points for which precision is to be computed. Default is 100.

-help

Show this manual.

-version

Show program version.

EXAMPLES

        % reprec -collection t/data/collection_girt \
            -searchresult t/data/searchresult_girt \
            -numdocs 76128

computes recall precision curve for the averaged individual results in /tmp/RP*.

BUGS

Yes. Please let me know!

SEE ALSO

RePrec(3), RePrec::PRR(3), RePrec::Searchresult(3), RePrec::Collection(3), RePrec::Average(3), perl(1).

AUTHOR

Norbert Gövert <goevert@ls6.cs.uni-dortmund.de>

COPYRIGHT

Copyright (c) 2003 Norbert Gövert. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.