The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

alnpi

alnpi - calculate molecular population genetic statistics from DNA alignments

SYNOPSIS

alnpi [OPTIONS] [MULTIFASTA-FILE...]

DESCRIPTION

alnpi takes multifasta format alignment data as input, and outputs molecular population genetic statistics. Options modulate the type of statistics calculated, the mode of calculation, and the style of output. By default, per-site statistics are calculated over the entire output, specifically after gap-containing sites are removed. Optionally, statistics may be calculated pair-wise across sequences, or in sliding windows of specified length. The statistics calculated by default include:

1. Number of sequences (n)
2. Number of alleles/distinct sequences (k)
3. Heterozygosity (H)
4. Expected number of alleles given Watterson estimator (Ewens, 1972)
5. Probability of allelic configuration (H, Karlin and MacGregor 1972)
6. Total alignment length (L)
7. Number of gap-free sites (L_gf)
8. Number of gap-free segregating sites (S)
9. Fraction of gap-free segregating sites (s)
10. Watterson estimator per gap-free site (Th_w_ps, Watterson 1975)
11. Standard Error of Th_w_ps assuming no recombination (SE_Thwps_LD)
12. Standard Error of Th_w_ps assuming free recombination (SE_Thwps_LE)
13. Nucleotide Diversity in gap-free sites (pi, Nei and Li 1979)
14. Standard Error of pi assuming no recombination (SE_pi_LD)
15. Standard Error of pi assuming free recombination (SE_pi_LE)
16. Tajima's D (Tajima, 1989)
17. Fu and Li's D* (Fu and Li, 1993)
18. Fu and Li's F* (Fu and Li, 1993, Simonsen et al. 1995)

Invoked in absolute mode with alnpi --absolute, alnpi outputs these statistics instead:

1. Number of sequences (n)
2. Number of alleles/distinct sequences (k)
3. Number of gap-free sites (Len)
4. Number of segregating gap-free sites (S)
5. Watterson estimator for gap-free sites (Th_W, Watterson 1975)
6. Total alignment length (L)
7. Average pairwise number of differences among gap-free sites (Pi)
8. Eta S, (Fu and Li, 1993)
9. Eta, (Fu and Li, 1993)

Sliding window analysis supports output for only for nucleotide diversity, Watterson estimator per-site, and Tajima's D.

Options specific to alnpi: -s, --suppress suppress table header output -x, --latex print table with Latex formating -p, --pairwise calculate stats pairwise across sequences -w, --window=<string> calculate stats in sliding windows --absolute output statistics not normalized per-site --label=<string> label input/output descriptively with <string>

Options general to FAST: -h, --help print a brief help message --man print full documentation --version print version -l, --log create/append to logfile -L, --logname=<string> use logfile name <string> -C, --comment=<string> save comment <string> to log --format=<format> use alternative format for input --moltype=<[dna|rna|protein] specify input sequence type

INPUT AND OUTPUT

alnpi is part of FAST, the FAST Analysis of Sequences Toolbox, based on Bioperl. Most core FAST utilities expect input and return output in multifasta format. Input can occur in one or more files or on STDIN. Output occurs to STDOUT. The FAST utility fasconvert can reformat other formats to and from multifasta.

OPTIONS

-s, --suppress

Supress header output

-x, --latex

LaTeX-style output

-p, --pairwise

Statistics are calculated pairwise over all sequences

-w, --window=<string>

Sliding window analysis. Option argument <string> expected to be in the form "window-size:step-size:statistic" where window-size and step-size are positive integers and "statistic" may be one of "p", "s" or "d" for nucleotide diversity, Watterson estimator or Tajima's D respectively.

--absolute

Output a smaller set of statistics not normalized by number of gap-free sites.

--label

Text label for the input data, to be placed in the output.

-h, --help

Print a brief help message and exit.

--man

Print the manual page and exit.

--version

Print version information and exit.

-l, --log

Creates, or appends to, a generic FAST logfile in the current working directory. The logfile records date/time of execution, full command with options and arguments, and an optional comment.

-L [string], --logname=[string]

Use [string] as the name of the logfile. Default is "FAST.log.txt".

-C [string], --comment=[string]

Include comment [string] in logfile. No comment is saved by default.

--format=[format]

Use alternative format for input. See man page for "fasconvert" for allowed formats. This is for convenience; the FAST tools are designed to exchange data in Fasta format, and "fasta" is the default format for this tool.

-m [dna|rna|protein], --moltype=[dna|rna|protein]

Specify the type of sequence on input (should not be needed in most cases, but sometimes Bioperl cannot guess and complains when processing data).

EXAMPLES

Generate sliding window of Tajima's D, the data plotted in Fig. 4A of Ardell et al. (2003) Genetics 165:1761. The data files ship with FAST:

fasgrep -v "(AF194|349[06])" t/data/ArdellEtAl03_ncbi_popset_32329588.fas | alndegap | fastr --strict -a "-" | alnpi --window 100:25:d

Statistics for 5'UTRs, the last line in Table 1 of Ardell et al. (2003) Genetics 165:1761:

gbfalncut -K t/data/ArdellEtAl03_ncbi_popset_32329588.fas t/data/AF194338.1.gb 5.UTR | fasgrep -v "(AF194|349[06])" | alndegap | fastr --strict -a "-" | alnpi

SEE ALSO

man perlre
perldoc perlre

Documentation on perl regular expressions.

man FAST
perldoc FAST

Introduction and cookbook for FAST

The FAST Home Page"

CITING

If you use FAST, please cite Ardell (2013). FAST: FAST Analysis of Sequences Toolbox. Bioinformatics and Bioperl Stajich et al..