Bio::DB::GFF::Adaptor::dbi::pg_fts -- Database adaptor for a specific postgres schema with a TSearch2 implementation
#create new GFF database connection my $db = Bio::DB::GFF->new( -adaptor => 'dbi::pg_fts', -dsn => 'dbi:Pg:dbname=worm'); #add full text indexing 'stuff' #assumes that TSearch2 is available to PostgreSQL #this will take a VERY long time for a reasonably large database $db->install_TSearch2(); ...some time later... #we don't like full text searching... $db->remove_TSearch2();
This adaptor is based on Bio::DB::GFF::Adaptor::dbi::pg but it implements the TSearch2 PostgreSQL contrib module for fast full text searching. To use this module with your PostgreSQL GFF database, you need to make TSearch2 available in the database.
To use this adaptor, follow these steps:
Can be as easy as `sudo yum install postgresql-contrib`, or you may need to recompile PostgreSQL to include it. See http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html for more details
% cat tsearch2.sql | psql <your database>
% bp_pg_bulk_load_gff.pl -c -d yeast saccharomyces_cerevisiae.gff
% bp_load_gff.pl -c -d yeast -a dbi::pg saccharomyces_cerevisiae.gff
Execute a perl script like this one:
#!/usr/bin/perl -w use strict; use Bio::DB::GFF; my $db = Bio::DB::GFF->new( -adaptor => 'dbi::pg_fts', -dsn => 'dbi:Pg:dbname=yeast', -user => 'scott', ); print "Installing TSearch2 columns...\n"; $db->install_TSearch2(); print "Done\n";
Note that this last step will take a long time. For a S. cerevisiae database with 15K rows, it took over an hour on my laptop, and with a C. elegans database (~10 million rows) it took well over a day.
If at some point you add more data you your database, you need to run a similar script to the one above, only executing the update_TSearch2() method. Finally, if you want to remove the TSearch2 columns from your database and go back to using the pg adaptor, you can execute a script like the one above, only executing the remove_TSearch2() method.
You should know a few things about how searching with TSearch2 works in the GBrowse environment:
Special thanks to Russell Smithies and Paul Smale at AgResearch in New Zealand for giving me their recipe for doing full text indexing in a GFF database.
Scott Cain, firstname.lastname@example.org
Title : search_notes Usage : @search_results = $db->search_notes("full text string",$limit) Function: Search the notes for a text string, using PostgreSQL TSearch2 Returns : array of results Args : full text search string, and an optional row limit Status : public
This is based on the mysql-specific method that makes use of the TSearch2 functionality in PosgreSQL's contrib directory. Given a search string, it performs a full-text search of the notes table and returns an array of results. Each row of the returned array is a arrayref containing the following fields:
column 1 A Bio::DB::GFF::Featname object, for passing to segment() column 2 The text of the note column 3 A relevance score.
Title : make_features_by_name_where_part Function: constructs a TSearch2-compliant WHERE clause for a name search Status : protected
Title : install_TSearch2 Function: installs schema modifications for use with TSearch2 Usage : $db->install_TSearch2 Status : public
Title : update_TSearch2 Function: Updates TSearch2 columns Usage : $db->update_TSearch2 Status : public
Title : remove_TSearch2 Function: Removes TSearch2 columns Usage : $db->remove_TSearch2 Status : public