The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
readme.txt
May 2002

Files included in the folder:

      mtCDNApri.nuc:  codon sequences from 7 apes (small data set in Yang et al. 1998)
      mtCDNApri.trees: tree file for the data
      OmegaAA.dat: file specifying different types of amino acid substitutions
      codeml.ctl: control file 

Also the following two files are incldued (large data set in Yang et al. 1998)
      mtCDNApri.nuc
      mtCDNApri.trees

This directory contains example files for estimating
nonsynonymous/synonymous substitution rate ratios for different types
of amino acid changes.  The data were used in Yang et al. (1998), so
you can use the data to duplicate results in that paper.

(1) Table 5 and the section titled "Different types of amino acid
substitutions".  This model assumes different dn/ds (w) ratios for
different types of amino acid substitutions.  You specify how many
amino acid substitution types you would like to have and which amino
acid pairs (changes) should be in each type.  The data should be codon
sequences (seqtype = 1), and the model is specified by aaDist = 7
(AAClasses).  The details of amino acid substitution types are
specified in a file called OmegaAA.dat.  See that file for details.
The model can be used to fit different rates for "radical" and
"conserved" amino acid substitutions.

Yang et al. (1998) implemented the model for codon sequences only
(seqtype = 1).  In theory the model should be applicable to amino acid
sequences as well (seqtype = 2), with one fewer parameter required.
However, such models (for amino acid sequences) are either not tested
properly or never made to work.  If you need to apply such models to 
amino acid sequences, you can let me know and I'll try to check.


(2) Table 4.  Mechanistic models of codon substitution.  You should have
seqtype = 1, model = 0, NSsites = 0.  Then the models are specified
using aaDist as follows.  You need to copy files with names like
g1974c.dat, g1974p.dat, g1974v.dat, g1974a.dat, grantham.dat, or
miyata.dat into the current folder to run those models.

       aaDist = 1  * geometric relationship using Grantham
       aaDist = 2  * geometric relationship using Miyata & Yasunaga
       aaDist = 3  * geometric relationship using c (composition)
       aaDist = 4  * geometric relationship using p (polarity)
       aaDist = 5  * geometric relationship using v (volume)
       aaDist = 6  * geometric relationship using a (aromaticity)

       aaDist = -1  * linear relationship using Grantham
       aaDist = -2  * linear relationship using Miyata & Yasunaga
       aaDist = -3  * linear relationship using c (composition)
       aaDist = -4  * linear relationship using p (polarity)
       aaDist = -5  * linear relationship using v (volume)
       aaDist = -6  * linear relationship using a (aromaticity)


(3) Table 3.  Mechanistic models of amino acid substitution.  As above
for table 4, but you should have seqtype = 2, model = 6 (FromCodon),
NSsites = 0.  Note that with amino acid sequences only, we cannot
estimate synonymous rate, and the models have one fewer parameter than
when codon sequences are used.

The models were implemented several years ago and not carefully
maintained since then.  So let me know you notice anything strange.
You should always run the example data set to duplicate the results in
the paper.

Reference:

Yang, Z., R. Nielsen, and M. Hasegawa, 1998.  Models of amino acid
substitution and applications to mitochondrial protein
evolution. Mol. Biol. Evol. 15: 1600-1611.

Ziheng Yang