load-go.pl
load-go.pl -d go -h mydbserver -datatype go_ont *.ontology
Loads GO data (ontology files, def files, xref files, assoc files) into a GO database. Will also perform additional housekeeping tasks on database if required
You will need the 'xsltproc' executable, which is part of libxslt
(You will have this if you have already installed XML::LibXSLT)
You need to have both go-perl and go-db-perl installed
http://www.godatabase.org/dev contains further details on these two modules
This site also has details on the GO database
-d DBNAME -h DBSERVER -datatype FORMAT (see below) -schema SCHEMA by default: godb Other values: chado Support for the chado schema is in beta. See http://www.gmod.org/chado -dbms DRIVER by default: mysql other values: Pg Support for PostgreSQL is in beta -append by default this script assumes you are loading a dataset for the FIRST time. it performs only SQL INSERTs in certain cases rather than checking with SELECT if it needs to update. if you are loading the same file for the second time, use this option. the loading will be slightly slower, but it will append to existing data You should use this option if you are loading multiple ontology files in one go! -no_optimize by default, loading will be optimized; certain primary keys in the db will be cached, and certain tables will be INSERTED straight into without doing an initial SELECT (the presumption is that these datatypes would only be loaded once). See L<GO::Handlers::godb> for details. If this is turned off, then all data will follow the SELECT followed by UPDATE or INSERT pattern This will be slower, but will use less memoty as no cache is required -no_clear_cache by default the in-memory cache (which reduced SQL lookups) is cleared after every single file is loaded. This is to prevent massive caches when we load all association files in a single command line. If you have plenty of memory, or aren't loading too many assoc files you may wish to use this option -fill_path (TRUE by default, IF an ontology file is parsed) populates the graph_path transitive closure table on completion this option can be used without any files as arguments to fill the path table in an already term-populated db -no_fill_path prevents graph_path table being populated after the ontologies have been loaded -fill_count populates the gene_product_count after all files have been loaded -add_root adds an explicit root term this may be necessary for loading from gene_ontology.obo which has 3 ontologies - it can be useful to make a fake root term covering these NOT FUNCTIONAL - CURRENTLY DONE AUTOMATICALLY -append you must use this option if you wish to append to data of the same type in an already loaded database; it switches off bulkloading option -replace removes all data of the same datatype before loading -ev filters based on an evidence type to filter out IEAs, use the not '!' prefix -ev '!IEA'
specify these with the -datatype option
A GO ontology file.
After loading is completed, the path/closure table will be built
A GO.defs definitions file
A GO xrefs file; eg ec2go
A gene_associations file
If you also specify the -fill_count option the gene_product_count table will also get populated (this is done at
You can also specify the -ev command to filter out specific evidence codes; for example
load-go.pl -d go -h mydbserver -datatype go-ontology *.ontology
An obo formatted file
First the input file is converted into its native XML format (eg OBO-XML). That native XML format is transformed to an XML format isomorphic to the GO relational database using an XSLT stylesheet. This transformed XML is then loaded using DBIx::DBStag
go-dev/xml/xsl/oboxml_to_godb_prestore.xsl L<DBIx::DBStag> L<GO::Parser>
When loading gene_association files, will split large files into multiple smaller files and load these
To install GO, copy and paste the appropriate command in to your terminal.
cpanm
cpanm GO
CPAN shell
perl -MCPAN -e shell install GO
For more information on module installation, please visit the detailed CPAN module installation guide.