The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

analyze.pl - batch processor to find terms for lists of genes in various files

SYNOPSIS

This program takes a list of files, each of which contain a list of genes, with one gene per line. It will findTerms for the lists of genes in each of the GO aspects, outputting the results to a file named for the original file, but with a .terms extension. It will only output terms with a corrected P-value of <= 0.05.

It will use the first supplied argument as the annotation file, the second argument as the expected number of genes within the organism, the third argument is the name of the obo file, and all subsequent files as ones containing lists of genes.

Usage:

    analyze.pl <annotation_file> <numGenes> <obofile> <file1> <file2> <file3> ... <fileN>

e.g.

    analyze.pl ../t/gene_association.sgd 7200 ../t/gene_ontology_edit.obo genes.txt genes2.txt

An example output file might look like this:

    The following gene(s) will be considered:
    
    YDL235C YPD1
    YDL224C WHI4
    YDL225W SHS1
    YDL226C GCS1
    YDL227C HO
    YDL228C YDL228C
    YDL229W SSB1
    YDL230W PTP1
    YDL231C BRE4
    YDL232W OST4
    YDL233W YDL233W
    YDL234C GYP7
    
    Finding terms for P
    
    
    Finding terms for C
    
    
    Finding terms for F
    
    -- 1 of 15--
    GOID    GO:0005096
    TERM    GTPase activator activity
    CORRECTED P-VALUE       0.0113038452336839
    UNCORRECTED P-VALUE     0.00113038452336839
    NUM_ANNOTATIONS 2 of 12 in the list, vs 31 of 7272 in the genome
    The genes annotated to this node are:
    YDL234C, YDL226C
    -- 2 of 15--
    GOID    GO:0008047
    TERM    enzyme activator activity
    CORRECTED P-VALUE       0.0316194107645226
    UNCORRECTED P-VALUE     0.00316194107645226
    NUM_ANNOTATIONS 2 of 12 in the list, vs 52 of 7272 in the genome
    The genes annotated to this node are:
    YDL234C, YDL226C
    -- 3 of 15--
    GOID    GO:0005083
    TERM    small GTPase regulatory/interacting protein activity
    CORRECTED P-VALUE       0.0340606972468798
    UNCORRECTED P-VALUE     0.00340606972468798
    NUM_ANNOTATIONS 2 of 12 in the list, vs 54 of 7272 in the genome
    The genes annotated to this node are:
    YDL234C, YDL226C
    -- 4 of 15--
    GOID    GO:0030695
    TERM    GTPase regulator activity
    CORRECTED P-VALUE       0.0475469908576535
    UNCORRECTED P-VALUE     0.00475469908576535
    NUM_ANNOTATIONS 2 of 12 in the list, vs 64 of 7272 in the genome
    The genes annotated to this node are:
    YDL234C, YDL226C

AUTHORS

Gavin Sherlock, sherlock@genome.stanford.edu