deob_index.pl - extracts BioPerl documentation and indexes it in a database for easy retrieval
This document describes deob_index.pl version 0.0.3
deob_index.pl <path to BioPerl lib> <output path>
a directory path pointing to the root of the BioPerl lib tree. e.g. /export/share/lib/perl5/site_perl/5.8.7/Bio/
where you would like deob_index.pl to put its output files.
deob_index.pl goes through the entire BioPerl library tree looking for .pm and .pl files. For each one it finds, it tries to extract module-level POD documentation (e.g. SYNOPSIS, DESCRIPTION) and store it in a BerkeleyDB. It also tries to extract documentation for each method in the module and store that in a separate BerkeleyDB.
Specific parts of the documentation for a module or method may be retrieved individually using the functions available in Deobfuscator.pm. See that module for details.
While going through and trying to parse each module, deob_index.pl also reports what pieces of the documentation it can't find. For example, if a method's documentation doesn't describe the data type it returns, this script logs that information to a file. This type of automated documentation- checking could be used to standardize and improve the documentation in BioPerl.
deob_index.pl creates four files:
A plaintext file listing each package found in the BioPerl directory that was searched. Packages are listed by their module names, such as 'Bio::SeqIO'. This file is used by deob_interface.cgi.
A Berkeley DB, which stores package-level documentation, such as the synopsis and the description. Each key is a package name, e.g. "Bio::SeqIO", and each value string is composed of the individual pieces of the documentation kept separate by unique string record separators. The individual pieces of documentation are pulled out of the string using the get_pkg_docs function in Deobfuscator.pm. See that package for details.
Like packages.db, methods.db is also a Berkeley DB, except it stores various pieces of information about individual methods available to a class. Each method might have documentation about its usage, its arguments, its return values, an example, and a description of its function.
Each key is the fully-qualified method name, e.g. "Bio::SeqIO::next_seq". Each value is a string containing all of the pieces of documentation concatenated together and separated by unique strings serving as record separators. The extraction of the actual documentation in these strings is handled by the get_method_docs subroutine in the Deobfuscator.pm module. See that package for details.
Not all methods will have all of these types of documentation, and some methods will not have the different pieces of information clearly labeled and separated. For the latter type, deob_index.pl will try to store whatever free-form documentation that does exist, and the get_method_docs function in Deobfuscator.pm, if called without arguments, will return that documentation.
This file contains detailed information about errors encountered while trying to extract documentation during the indexing process.
Each line in deob_index.log is a key-value pair describing a single parsing error.
These are the parsing error codes reported in 'deob_index.log'.
couldn't find the name of the package
couldn't find the synopsis
couldn't find the description
couldn't find any methods
This package name occurs more than once
couldn't find the function description
couldn't find the example
couldn't find the method's arguments
couldn't find the usage statement
couldn't find the return values
This method's documentation doesn't conform to the BioPerl standard of having clearly-labeled fields for title, function, example, args, usage, and returns.
This method name occurs more than once
This software requires:
The Berkeley DB comes standard with most UNIX distributions, so you may already have it installed. See http://www.sleepycat.com for more information.
deob_index.pl recursively navigates a directory of BioPerl modules. Note that the BioPerl module directory need not be "installed"; any old location will do. See http://www.bioperl.org for the latest version.
No bugs have been reported.
deob_index.pl currently expects the sections of POD in a BioPerl module to be in a particular order, namely: NAME, SYNOPSIS, DESCRIPTION, CONSTRUCTORS, ... , APPENDIX. Those sections are expected to be marked with =head1 POD tags, and the documentation for each method is expected to be in =head2 sections in the APPENDIX. The order of SYNOPSIS and DESCRIPTION can be flipped, but this behavior should not be taken as encouragement to do so.
Most, but not all BioPerl modules conform to this standard. Those that do not will cause deob_index.pl to report them as errors. Although the consistency of this standard is desirable for end-users of the documentation, this code probably needs to be a little bit more flexible (patches welcome!).
This software has only been tested in a UNIX environment.
User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.
firstname.lastname@example.org - General discussion http://www.bioperl.org/wiki/Mailing_lists - About the mailing lists
Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web:
This software was developed originally at the Cold Spring Harbor Laboratory's Advanced Bioinformatics Course between Oct 12-25, 2005. Many thanks to David Curiel, who provided much-needed guidance and assistance on this project.
Copyright (C) 2005-6 Laura Kavanaugh and Dave Messina. All Rights Reserved.
This module is free software; you may redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
This software is provided "as is" without warranty of any kind.