Joachim Bargsten > Bio-Gonzales > fafind-eq-seq

Download:
Bio-Gonzales-0.0547.tar.gz

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Source   Latest Release: Bio-Gonzales-0.0547_01

NAME ^

fafind-eq-seq - find equal sequences

SYNOPSIS ^

    ./fafind-eq-seq [--help] [--eval 'perlcode'] <file1> [<file2> ... <fileN>] >file_with_results.txt

    ./fafind-eq-seq [--help] [--eval 'perlcode'] --filter <file1> [<file2> ... <fileN>] >file_with_only_unique_seqs.fasta

DESCRIPTION ^

Find identical / equal sequences in a given set of fasta files. Info messages go to standard error (stderr), results to standard output (stdout).

The result output of file_with_results.txt consists of lines following the pattern

    <ID> <DESCRIPTION><TAB><FILE>
    <TAB><ID> <DESCRIPTION><TAB><FILE>
    <TAB><ID> <DESCRIPTION><TAB><FILE>
    <ID> <DESCRIPTION><TAB><FILE>
    <TAB><ID> <DESCRIPTION><TAB><FILE>
    <TAB><ID> <DESCRIPTION><TAB><FILE>
    <ID> <DESCRIPTION><TAB><FILE>
    <TAB><ID> <DESCRIPTION><TAB><FILE>
    <TAB><ID> <DESCRIPTION><TAB><FILE>

whereas each unindented line and the following <TAB>-indented lines mark one group of identical sequences.

OPTIONS ^

--filter

Do not print the groups but the sequences in fasta format instead. Duplicated sequences are omitted. The resulting fasta output is not checked for identical ids, etc.

Synonyms: -f

--help

Display this message.

Synonyms: -?, -h

--eval

Manipulate input sequences on the fly. The current sequence string is set to $_.

This doesn't change the actual output sequence, e.g. on filtering.

Can be very handy for comparing aa-sequences from two different files, at which one file uses * as stop codon and the other file not:

    ./fafind-eq-seq --eval 's/\*$//' <file1> <file2> >file_with_results.txt

Synonyms: -e

AUTHOR ^

jw bargsten, <joachim.bargsten at wur.nl>

syntax highlighting: