alnpi

alnpi - calculate molecular population genetic statistics from DNA alignments

SYNOPSIS

alnpi [OPTIONS] [MULTIFASTA-FILE...]

DESCRIPTION

alnpi takes multifasta format alignment data as input, and outputs molecular population genetic statistics. Options modulate the type of statistics calculated, the mode of calculation, and the style of output. By default, per-site statistics are calculated over the entire output, specifically after gap-containing sites are removed. Optionally, statistics may be calculated pair-wise across sequences, or in sliding windows of specified length. The statistics calculated by default include:

1. Number of sequences (n)
2. Number of alleles/distinct sequences (k)
3. Heterozygosity (H)
4. Expected number of alleles given Watterson estimator (Ewens, 1972)
5. Probability of allelic configuration (H, Karlin and MacGregor 1972)
6. Total alignment length (L)
7. Number of gap-free sites (L_gf)
8. Number of gap-free segregating sites (S)
9. Fraction of gap-free segregating sites (s)
10. Watterson estimator per gap-free site (Th_w_ps, Watterson 1975)
11. Standard Error of Th_w_ps assuming no recombination (SE_Thwps_LD)
12. Standard Error of Th_w_ps assuming free recombination (SE_Thwps_LE)
13. Nucleotide Diversity in gap-free sites (pi, Nei and Li 1979)
14. Standard Error of pi assuming no recombination (SE_pi_LD)
15. Standard Error of pi assuming free recombination (SE_pi_LE)
16. Tajima's D (Tajima, 1989)
17. Fu and Li's D* (Fu and Li, 1993)
18. Fu and Li's F* (Fu and Li, 1993, Simonsen et al. 1995)

Invoked in absolute mode with alnpi --absolute, alnpi outputs these statistics instead:

1. Number of sequences (n)
2. Number of alleles/distinct sequences (k)
3. Number of gap-free sites (Len)
4. Number of segregating gap-free sites (S)
5. Watterson estimator for gap-free sites (Th_W, Watterson 1975)
6. Total alignment length (L)
7. Average pairwise number of differences among gap-free sites (Pi)
8. Eta S, (Fu and Li, 1993)
9. Eta, (Fu and Li, 1993)

Sliding window analysis supports output for only for nucleotide diversity, Watterson estimator per-site, and Tajima's D.

Options specific to alnpi: -s, --suppress suppress table header output -x, --latex print table with Latex formating -p, --pairwise calculate stats pairwise across sequences -w, --window=<string> calculate stats in sliding windows --absolute output statistics not normalized per-site --label=<string> label input/output descriptively with <string>

Options general to FAST: -h, --help print a brief help message --man print full documentation --version print version -l, --log create/append to logfile -L, --logname=<string> use logfile name <string> -C, --comment=<string> save comment <string> to log --format=<format> use alternative format for input --moltype=<[dna|rna|protein] specify input sequence type

INPUT AND OUTPUT

alnpi is part of FAST, the FAST Analysis of Sequences Toolbox, based on Bioperl. Most core FAST utilities expect input and return output in multifasta format. Input can occur in one or more files or on STDIN. Output occurs to STDOUT. The FAST utility fasconvert can reformat other formats to and from multifasta.

OPTIONS

-s, --suppress: Supress header output
-x, --latex: LaTeX-style output
-p, --pairwise: Statistics are calculated pairwise over all sequences
-w, --window=<string>: Sliding window analysis. Option argument <string> expected to be in the form "window-size:step-size:statistic" where window-size and step-size are positive integers and "statistic" may be one of "p", "s" or "d" for nucleotide diversity, Watterson estimator or Tajima's D respectively.
--absolute: Output a smaller set of statistics not normalized by number of gap-free sites.
--label: Text label for the input data, to be placed in the output.
-h, --help: Print a brief help message and exit.
--man: Print the manual page and exit.
--version: Print version information and exit.
-l, --log: Creates, or appends to, a generic FAST logfile in the current working directory. The logfile records date/time of execution, full command with options and arguments, and an optional comment.
-L [string], --logname=[string]: Use [string] as the name of the logfile. Default is "FAST.log.txt".
-C [string], --comment=[string]: Include comment [string] in logfile. No comment is saved by default.
--format=[format]: Use alternative format for input. See man page for "fasconvert" for allowed formats. This is for convenience; the FAST tools are designed to exchange data in Fasta format, and "fasta" is the default format for this tool.
-m [dna|rna|protein], --moltype=[dna|rna|protein]: Specify the type of sequence on input (should not be needed in most cases, but sometimes Bioperl cannot guess and complains when processing data).

EXAMPLES

Generate sliding window of Tajima's D, the data plotted in Fig. 4A of Ardell et al. (2003) Genetics 165:1761. The data files ship with FAST:

fasgrep -v "(AF194|349[06])" t/data/ArdellEtAl03_ncbi_popset_32329588.fas | alndegap | fastr --strict -a "-" | alnpi --window 100:25:d

Statistics for 5'UTRs, the last line in Table 1 of Ardell et al. (2003) Genetics 165:1761:

CITING

If you use FAST, please cite Ardell (2013). FAST: FAST Analysis of Sequences Toolbox. Bioinformatics and Bioperl Stajich et al..

To install FAST, copy and paste the appropriate command in to your terminal.

cpanm

cpanm FAST

CPAN shell

perl -MCPAN -e shell
install FAST

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)