The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

compare-graphs.pl

SYNOPSIS

 compare-graphs.pl --file_1 go/ontology/old_gene_ontology.obo --file_2 go/ontology/gene_ontology.obo -s goslim_generic -o results.txt

DESCRIPTION

# must supply these arguments... or else! # INPUT -f1 || --file_1 /path/to/<file_name> "old" ontology file -f2 || --file_2 /path/to/<file_2_name> "new" ontology file

# OUTPUT -o || --output /path/to/<file_name> output file for results

# SUBSET -s || --subset <subset_name> subset to use for graph-based comparisons

# optional args

 -v || --verbose                         prints various messages

Compares two OBO files and records the differences between them. These differences include:

* new terms * term merges * term obsoletions * changes to term content, such as addition, removal or editing of features like synonyms, xrefs, comments, def, etc.. * term movements into or out of the subset designated by the subset option

At present, only term differences are recorded in detail, although this could presumably be extended to other stanza types in an ontology file. The comparison is based on creating hashes of term stanza data, mainly because hashes are more tractable than objects.

block_to_sorted_array

input: a multi-line block of text (preferably an OBO format stanza!) output: ref to an array with the following removed - empty lines - lines starting with "id: ", "[", and "...]" - trailing whitespace

        the array is sorted

tag_val_arr_to_hash

input: array ref containing ": " separated tag-value pairs output: lines in the array split up by ": " and put into a hash of the form key-[array of values]

compare_hashes

input: two hashes of arrays regex a regular expression for hash keys to ignore

output: hash of differences in the form {hash key}{ f1 => number of values unique to f1, f2 => number of values unique to f2 }

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 353:

=cut found outside a pod block. Skipping to next block.

Around line 377:

=cut found outside a pod block. Skipping to next block.