The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
Notes by Ziheng Yang
Last modified: 22 July2003

(I) Data files for NSsites models used by Yang, Swanson & Vacquier (2000):

    README.txt
    lysin.trees (tree file)
    lysin.nuc  (sequence data file, with 135 codons)
    codeml.ctl  (control file)
    lysinResult.txt (results under M0 and M3)
    lysinPosteriorP.txt (posterior probabilities under M3)
    SiteNumbering.txt (site numbering according to the structure file)
    1LIS.pdb            (structure file for red abalone sperm lysin)


(II) Data files for fixed-sites models of Yang & Swanson (2002).  Note
    that the tree file is shared as above, but the sequence data file is 
    different, with one site with gaps in the red abalone deleted.  
    Yang & Swanson (2002 table 5) also fitted two random-sites (NSsites) 
    models, using the following data:

    codemlYangSwanson2002.ctl (controld file)
    lysinYangSwanson2002.nuc  (sequence data file, with 134 codons)
    lysin.trees


More details follow.

(Ia) 
This folder contains the control file, the sequence data file and
the tree file for demonstrating codon models that assign different
dN/dS ratios among sites in the sequence (Nielsen & Yang 1998; Yang,
Nielsen, Goldman & Pedersen 2000).  The included data set is the sperm
lysin genes from 25 abalone species used in Yang, Swanson & Vacquier
(2000).  The default control file (with NSsites = 3) lets you
duplicate the results in table 1 of that paper.  To run the program,
try

	codeml

The file lysinPosteriorP.txt includes part of the output from the file
rst for model M3 (NSsites=3).  The first 3 columns are the three
probabilities for the three site classes; you can use them to make
figure 1 of Yang, Swanson & Vacquier (2000).  In parentheses are the
most likely class numbers.  The last two columns are the posterior
average w for the site and the probability for the most likely class
(redundant).

(Ib) Colouring the Crystal Structure

If you choose verbose = 1 and provide a file named SiteNumbering.txt
with numbering of sites in the alignment, codeml will generate a file
named RasMol.txt, which collects RasMol (RasWin) scripts for coloring
the amino acid residues in the structure according to the approximate
posterior mean_w.  Look at SiteNumbering.txt.  The sequence data
file lysin.nuc has 135 amino acid (codon) sites in the alignment, but
one site is a gap, represented by the ? in SiteNumbering.txt, which is
not in the pdf file.  Compare this with Figures 4 and 5 in Yang,
Swanson, and Vacquier (2000).  

Here are the rules codeml uses right now.  The program copies your
site labels in SiteNumbering.txt verbatim as "text" (not as number)
when it prints to RasMol.txt.  If the label has a question mark in it,
codeml won't print that site, but all other sites with no ? in the
labels are printed (using the format "select ###", "color ....".  So
if you change the ? in the included SiteNumbering.txt for the lysin
into 133a, you will get the following output in RasMol.txt for that
site:

	select 133a
        color [250, 35, 35]

After codeml has generated RasMol.txt, you read the structure file
1LIS.PDB into RasMol.  Choose "Display-Cartoon".  Then in the
command-line window, type the following command to color the amino
acids.

       script RasMol.txt

My version of RasMol (RasWin2.7.2.1) does not seem to be properly
installed, and I can't tell it to look for the file from the right
folder.  So I copied RasMol.txt into the same folder as
raswin2.7.1.1.exe and it reads the script fine.  I got a warning
message from RasWin: "Unable to allocate shade".  I don't know what it
means, but it does not seem to do any harm.

Both filenames SiteNumbering.txt and RosMol.txt are hard-coded in
codeml.c.  I implemented three colour schemes, hard-coded as well,
with the colour-coded temperature matching the posterior mean w.  If
you want to change the source code, go to the routine
lfunNSsites_rate() and change continuous, ncolors, colorvalues (RGB
values).

The red abalone lysin structure file 1LIS.pdb can be downloaded from
http://www.rcsb.org/pdb/ (choose download - text format).  The RasMol
site is at http://www.umass.edu/microbio/rasmol/.


(II)

The lysin gene data used by Yang & Swanson (2002) to demonstrate the
fixed-sites models are included here as well.  The sequence data file
lysinYangSwanson2002.nuc has one fewer codon than lysin.nuc.  Look at
the beginning of the sequence data file, copied below, which says
there are 25 sequences in the file, each with 402 nucleotides (134
codons).  The 134 codons are partitioned into two "genes", which are
marked by 1 or 2, for buried and exposed residues, respectively.  

  25   402  G
G2
22222222221222112111221121122212211222122122222211211112211221221122122
212222222222112211221122221222122122222112122212211222222121212

In the control file, note the variable Mgene, which is used to run the
models described in Yang & Swanson (2002, table 1), with results shown
in table 6 of the same paper.  To run the program, you type
    codeml codemlYangSwanson2002.ctl

If you are using an old mac with OS 9 or earlier, you make a copy of 
codeml.ctl and then copy codemlYangSwanson2002.ctl into codeml.ctl and 
then run 
   codeml

method = 0 is probably faster for those Mgene models than method = 1.


References

Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting
positively selected amino acid sites and applications to the HIV-1
envelope gene. Genetics 148:929-936.

Yang, Z., R. Nielsen, N. Goldman and A.-M. K. Pedersen. 2000. 
Codon-substitution models for heterogeneous selection pressure at amino 
acid sites. Genetics 155:431-449.

Yang, Z., W. J. Swanson and V. D. Vacquier. 2000. Maximum likelihood
analysis of molecular adaptation in abalone sperm lysin reveals
variable selective pressures among lineages and
sites. Mol. Biol. Evol. 17:1446-1455.

Yang, Z., and W. J. Swanson. 2002. Codon-substitution models to detect
adaptive evolution that account for heterogeneous selective pressures
among site classes. Mol. Biol. Evol. 19:49-57.