The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
SmotifCS
Version 0.04

SmotifCS Hybrid Modeling Method


PRE-REQUISITES: 

The hybrid modeling algorithm requires a BMRB formatted chemical shift file as input. 
Additionally, if the structure of the protein is known from any alternate resource,
then a PDB-formatted structure file is required. This pdb-file can be present in a
centralized local directory or a user-designated separate directory. 

	Third-party Software:

	1. MySQL  
	2. Phylip 
	3. Modeller 
	4. NMRPipe/TALOS 

	Other requirements:

	1. MySQL Smotif database (http://fiserlab.org/SmotifCS/vilas_loop_pred.sql.gz)
	2. Smotif chemical shift library and related files (http://fiserlab.org/SmotifCS/chemical_shift.tar.gz)
	3. Local PDB directory (central or user-designated) - updated (http://www.rcsb.org). 

	The path to all the pre-requisites should be provided in smotifcs.ini configuration file. 

Smotics has been tested on 
    1. CentOS release 6.6 (Final) 
    2. Centos release 7


DOWNLOAD AND INSTALL Third-party Software:
1. MySQL
   Can be downloaded from (http://dev.mysql.com/downloads/mysql/) 

    1.a INSTALL MySQL AND perl DBI SUPPORT

        $ yum install mysql mysql-devel mysql-server  perl-DBD-MySQL

    1.b START MYSQL FOR FIRST TIME
        # /sbin/service mysqld start
        
        Initializing MySQL database:  Installing MySQL system tables...
        OK
        Filling help tables...
        OK

        To start mysqld at boot time you have to copy
        support-files/mysql.server to the right place for your system

        PLEASE REMEMBER TO SET A PASSWORD FOR THE MySQL root USER !
        To do so, start the server, then issue the following commands:

        /usr/bin/mysqladmin -u root password 'new-password'
        /usr/bin/mysqladmin -u root -h denali.aecom.yu.edu password 'new-password'

        Alternatively you can run:
        /usr/bin/mysql_secure_installation

        which will also give you the option of removing the test
        databases and anonymous user created by default.  This is
        strongly recommended for production servers.

        See the manual for more instructions.

        You can start the MySQL daemon with:
        cd /usr ; /usr/bin/mysqld_safe &

        You can test the MySQL daemon with mysql-test-run.pl
        cd /usr/mysql-test ; perl mysql-test-run.pl

        Please report any problems with the /usr/bin/mysqlbug script!

        [  OK  ]
        Starting mysqld:                                           [  OK  ]

   1.c  CHANGE ROOT PASSWORD
        # /usr/bin/mysqladmin -u root password 'new-password'

   1.d  CONNECT TO MYSQL TO VERIFY THAT NEW PASSWORD WORKS

        [root@denali tmp]# mysql -u root -h localhost -p 
        Enter password: 
        Welcome to the MySQL monitor.  Commands end with ; or \g.
        Your MySQL connection id is 4
        Server version: 5.1.73 Source distribution

        Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

        Oracle is a registered trademark of Oracle Corporation and/or its
        affiliates. Other names may be trademarks of their respective
        owners.

        Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

        mysql> 
   
   1.e  QUIT MYSQL CLIENT
        mysql> quit
        Bye

2. Phylip   (version 3.69) 
       PHYLIP is freely available from:
       http://evolution.genetics.washington.edu/phylip.html
       
       PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies 
       (evolutionary trees). It is available free over the Internet, and written to work on as many 
       different kinds of computer systems as possible. The source code is distributed (in C), 
       and executables are also distributed. 
       
       INSTALLATION
       http://evolution.genetics.washington.edu/phylip/getme.html


       if you’re using a Windows machine, 
       installation is easy. Download the three zip-files (phylip.exe,phylipwx.exe,phylipwy.exe ), 
       and extract them to a preferred folder. The subfolder exe contains all the programs. 
       Manual can be found from the subfolderdoc.

       For Macintosh OS X
       you may download the packaged disk image (Phylip3.66.dmg). It is compressed, so you need to 
       expand it, and copy the resulting folder to a desired location. Alternatively, you may compile 
       the programs from their sources as outlined in the UNIX installation below. There are source 
       codes and ready made compilations available for older Macintosh systems, Mac OS 8 or 9, also.

       Installation for UNIX systems
       is also quite straight-forward. These instruction apply for RedHat-based Linux systems. 
       Download the source code and documentation package (phylip-3.66.tar.gz) into a suitable folder. 
       Unzip the package with gzip utility (gzip –d phylip-3.66.tar.gz) and expand the tar ball 
       (tar xvf phylip-3.66.tar). Move to the newly formed folder containing the source codes (cd phylip3.6/src). 
       The folder contains a file called Makefile. Installation of the PHYLIP programs is done simply by typing
       make install


       INSTALL PHYLIP ON LINUX and UNIX
       http://evolution.gs.washington.edu/phylip/download/phylip-3.696.tar.gz

       You can easily install PHYLIP and compile it yourself on a Linux or Unix system, 
       provided that you have a C compiler on your system. 

       tar -zxvf phylip-3.696.tar.gz 

       This uncompresses the archive and a phylip3.696 folder is created that contains within 
       it three folders, doc, exe, and src.

       To make executables, use your C compiler. It is probably as simple as going into the src directory, 
       copying Makefile.unx and calling the copy Makefile, and then typing the command

       $ cp Makefile.unx Makefile
       $ make install

       With luck this will work. After the compilation the executables and their font files will 
       be in folder exe. 

       INSTALLATION SUMMARY

       $ wget  http://evolution.gs.washington.edu/phylip/download/phylip-3.696.tar.gz
       $ tar -zxvf phylip-3.696.tar.gz
       $ cd phylip-3.696
       $ cd src/
       $ cp Makefile.unx Makefile
       $ make install

3. Modeller (version 9.14 )
       https://salilab.org/modeller/
       
       MODELLER is used for homology or comparative modeling of protein three-dimensional structures.
       The user provides an alignment of a sequence to be modeled with known related structures and 
       MODELLER automatically calculates a model containing all non-hydrogen atoms. MODELLER 
       is available for download for most Unix/Linux systems, Windows, and Mac.
       
       Libraries needed for Modeller:
            yum install libz.so.1
            yum install libutil.so.1
            yum install librt.so.1
            yum install libpthread.so.0
            yum install libm.so.6
            yum install libdl.so.2
            yum install libglib-2.0.so.0
            yum install libc.so.6
       
       Install MODELLER:
       rpm -ivh modeller-9.14-1.***.rpm


4. NMRPipe/TALOS 
       http://spin.niddk.nih.gov/NMRPipe/

       NMRPipe is an extensive software system for processing, analyzing, and exploiting NMR spectroscopic 
       data. 
       
       Libraries needed for NMRPIPE:
            yum install xorg-x11-fonts-100dpi
            yum install xorg-x11-fonts-75dpi
            yum install xorg-x11-fonts-misc
            yum install libX11.so.6
            yum install libXext.so.6
            yum install libstdc++.so.6


       wget http://spin.niddk.nih.gov/NMRPipe/install/download/install.com
       wget http://spin.niddk.nih.gov/NMRPipe/install/download/binval.com
       wget http://spin.niddk.nih.gov/NMRPipe/install/download/NMRPipeX.tZ
       wget http://spin.niddk.nih.gov/NMRPipe/install/download/talos.tZ
       wget http://spin.niddk.nih.gov/NMRPipe/install/download/dyn.tZ

       chmod a+r  *.tZ 
       chmod u+rx *.com
       ./install.com +dest /home/user/software/talos



DOWNLOAD AND INSTALL INSTRUCTIONS FOR OTHER REQUIREMENTS: 
	
    1. MySQL Smotif database 
       MySQL Smotif database is freely available from: 
       http://fiserlab.org/SmotifCS/vilas_loop_pred.sql.gz
	
	
    *** How to install a local copy of the Smotif Database ***
    
    - Log into the server running MySQL
    
    - Download Smotif database from http://fiserlab.org/SmotifCS/vilas_loop_pred.sql.gz
        and save it to /tmp directory
        
    - Uncompress vilas_loop_pred.sql.gz  
        cd /tmp/
        tar -zxvf vilas_loop_pred.sql.gz 
    
    - Connect to the MySQL server and 
        create a database named vilas_loop_pred
        
        $ mysql -u root -h localhost  -p 'your_mysql_root_password'
        
        Welcome to the MySQL monitor.  Commands end with ; or \g.
        Your MySQL connection id is 2844
        Server version: 5.1.73 Source distribution

        Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

        Oracle is a registered trademark of Oracle Corporation and/or its
        affiliates. Other names may be trademarks of their respective
        owners.

        Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

        mysql> 
        mysql> create database vilas_loop_pred;
        mysql> quit

    - Load vilas_loop_pred database (This process might take some time).
        $ mysql -u root -h localhost -p vilas_loop_pred < vilas_loop_pred.sql 
        Enter password: 
        
      

    - Connect to the MySQL server as root and 
        create a user with read access to vilas_loop_pred database
        
        [root@denali tmp]# mysql -u root -h localhost -p 
        Enter password: 
        Welcome to the MySQL monitor.  Commands end with ; or \g.
        Your MySQL connection id is 14
        Server version: 5.1.73 Source distribution

        Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

        Oracle is a registered trademark of Oracle Corporation and/or its
        affiliates. Other names may be trademarks of their respective
        owners.

        Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

        mysql> use mysql
        Reading table information for completion of table and column names
        You can turn off this feature to get a quicker startup with -A

        Database changed

        Create New user:
        
        mysql> CREATE USER 'my_user'@'client_computer_where_you_will_run_Smotifcs' IDENTIFIED BY 'my_pass';
        # Query OK, 0 rows affected (0.00 sec)

        mysql> GRANT SELECT ON vilas_loop_pred.* TO  'my_user'@'client_computer_where_you_will_run_Smotifcs' ;
        Query OK, 0 rows affected (0.00 sec)


	2. Smotif chemical shift library and related files 
	   Smotif chemical shift library and related files is freely available from:
       http://fiserlab.org/SmotifCS/chemical_shift.tar.gz

    *** How to install a local copy of Smotif chemical shift library and related files ***

		- Download chemical shift database from http://fiserlab.org/SmotifCS/chemical_shift.tar.gz
          and save it to /tmp directory

        - Uncompress chemical_shift.tar.gz and move it to /usr/local/databases


INSTALLATION (SmotifCS )

    To install this SmotifCS-0.03, run the following commands:
       
       
      1. Manually:
        Install where standard Perl modules are stored
        
        tar -zxvf SmotifCS-0.03.tar.gz
        cd SmotifCS-0.03/

        perl Makefile.PL
        make
        make test
        make install


      2. Install in a custom location (/home/user/MyPerlLib)
        
        tar -zxvf SmotifCS-0.03.tar.gz
        cd SmotifCS-0.03/
        
        perl Makefile.PL PREFIX=/home/user/MyPerlLib/
        make
        make test
        make install
       
        
        Please, do not forget to add the following line:
        
        use lib "$ENV{HOME}/MyLib/share/perl5/" 
        
        in ./smotifcs.pl;

        
      3. Using a CPAN client:
        as root type:

        perl -MCPAN -e shell
        > install SmotifCS 
      
      4. Using a CPAN client and installing in a custom location (/home/user/MyPerlLib)
        
        perl -MCPAN -e shell
        > conf makepl_arg PREFIX=/home/user/MyPerlLib/
        > install SmotifCS 

        Please, do not forget to add the following line:
        
        use lib "$ENV{HOME}/MyLib/share/perl5/" 
        
        in ./smotifcs.pl;


HOW TO RUN THE SmotifCS HYBRID MODELING ALGORITHM: 
    
    INITIALIZE THE CONFIGURATION FILE: 
    
    1.Set up the configuration file:

    The configuration file, smotifcs_config.ini has all the information
    regarding the required library files and other pre-requisite software. 

    Set all the paths and executables in this file correctly.

    Set environment varible in .bashrc file:

    export SMOTIFCS_CONFIG_FILE=/home/user/SmotifCS-0.01/smotifcs_config.ini

    in $HOME/smotifcs_config.ini file.

    2. Set environment varible in .bashrc file:
    export SMOTIFCS_CONFIG_FILE=/home/user/smotifcs_config.ini
    
    3. Create a subdirectory with a dummy pdb file name (eg: 1abc or 1zzz). 

    4. Put the chemical shift input file (in BMRB format) in this directory.
    Use the filename 1abc/pdb1abcshifts.dat or 1zzz/pdb1zzzshifts.dat for
    the BMRB formatted chemical shift input file.
  
       For testing purpose you can download an input file (in BMBR format) 
       from:
	   http://fiserlab.org/SmotifCS/pdb1aabshifts.dat

    5. Optional: If structure is known, include a pdb format structure file
    in the same directory. 1abc/pdb1abc.ent or 1zzz/pdb1zzz.ent

    6. Run steps 1 to 6 as given above sequentially. Output from previous
    steps are often required in subsequent steps. Wait for each step to
    be completed without errors before going to the next step. 

    7. To run all steps together use: 
    perl smotifcs.pl --step=all --pdb=1zzz --chain=A --havestructure=0

    8. Use multiple-cores or clusters as available, for steps 2, 3 & 4.  
    These are slow and require a lot of computational resources. 

    8. If structure is known, use --havestructure=1.
    Else, use --havestructure=0 in all the steps. 

    Results: 

    Top 5 models are stored in the subdirectory (1abc or 1zzz) as:
    Model.1.pdb, Model.2.pdb, Model.3.pdb, Model.4.pdb & Model.5.pdb	


MODELING PROTEINS USING A SUPER-SECONDARY STRUCTURE LIBRARY AND NMR CHEMICAL SHIFT INFORMATION

    SMOTIFCS implement a hybrid protein modeling algorithms that relies on a library of protein 
    super-secondary structure motifs (Smotifs) and easily obtainable NMR experimental data.


	MODELING ALGORITHM STEPS: 

    Step 1:				             
    Run Talos+			             
    Get SS, Phi/PSi, Smotif Information (Single-core task)		     	             

        Usage: 
        perl smotifcs.pl --step=1 --pdb=1zzz  --chain=A --havestructure=0	     


    Step 2:                                             
    Compare experimental CS of Query SmotifS to theoretical CS of library Smotifs         
    (Multi-core task/ cluster job)		     

        Usage: 
        perl smotifcs.pl --step=2 --pdb=1zzz  --chain=A --havestructure=0           


    Step 3:                                             
    Cluster and rank chosen SmotifS (Multi-core task/ cluster job)                       

        Usage: 
        perl smotifcs.pl --step=3 --pdb=1zzz  --chain=A --havestructure=0           


    Step 4:                                             
    Enumerate all possible combinations of  Smotifs	(about a million models)	     
    (Multi-core task/ cluster job)                       

        Usage: 
        perl smotifcs.pl --step=4 --pdb=1zzz --chain=A --havestructure=0           


    Step 5:                                             
    Rank enumerated structures using a composite energy function  (Single-core task)                   

        Usage: 
        perl smotifcs.pl --step=5 --pdb=1zzz  --chain=A --havestructure=0           

        
    Step 6:                                             
    Run Modeller to generate top 5 complete models  (Single-core task)                     	       
        
        Usage: 
        perl smotifcs.pl --step=6 --pdb=1zzz --chain=A --havestructure=0           


    Reference: 

    Menon V, Vallat BK, Dybas JM, Fiser A.
    Modeling proteins using a super-secondary structure library and NMR chemical
    shift information.
    Structure, 2013, 21(6):891-9.



Authors:

Vilas Menon, Brinda Vallat, Joe Dybas, Carlos Madrid and Andras Fiser. 



SUPPORT AND DOCUMENTATION

After installing, you can find documentation for this module with the
perldoc command.

    perldoc SmotifCS

You can also look for information at:

    RT, CPAN's request tracker (report bugs here)
        http://rt.cpan.org/NoAuth/Bugs.html?Dist=SmotifCS

    AnnoCPAN, Annotated CPAN documentation
        http://annocpan.org/dist/SmotifCS

    CPAN Ratings
        http://cpanratings.perl.org/d/SmotifCS

    Search CPAN
        http://search.cpan.org/dist/SmotifCS/


LICENSE AND COPYRIGHT

Copyright (C) 2015 Fiserlab Members 

This program is free software; you can redistribute it and/or modify it
under the terms of the the Artistic License (2.0). You may obtain a
copy of the full license at:

L<http://www.perlfoundation.org/artistic_license_2_0>

Any use, modification, and distribution of the Standard or Modified
Versions is governed by this Artistic License. By using, modifying or
distributing the Package, you accept this license. Do not use, modify,
or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made
by someone other than you, you are nevertheless required to ensure that
your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service
mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge
patent license to make, have made, use, offer to sell, sell, import and
otherwise transfer the Package with respect to any patent claims
licensable by the Copyright Holder that are necessarily infringed by the
Package. If you institute patent litigation (including a cross-claim or
counterclaim) against any party alleging that the Package constitutes
direct or contributory patent infringement, then this Artistic License
to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER
AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.
THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY
YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR
CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR
CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.