The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

count - Counting utility for a file consisting of the fixed number of fields like CSV

VERSION

version v0.1.1

SYNOPSIS

count -h

count --help

count [-g|--group <columns>] [-c|--count] [-s|--sum <columns>] [--min <columns>] [--max <columns>] [--avg|--ave <columns>] [-m|--map <map>] [-M|--map-file <filename>] [-r|--reorder <order>] [-t|--delimiter <delimiter>] files...

  # show brief instruction
  count -h

  # show POD
  count --help

  # count the number of records grouping by the column 1 and 2
  # The column number is 1-origin
  count -g 1,2 file

  # count the sum of the column 3 grouping by the column 1 and 2
  # field delimiter is ','
  count -g 1 -g 2 -s 3 -t ',' file

  # Ouput min,max,average of the column 2 and the column 3 grouping by the column 1
  count -g 1 --min 2 --max 2 --avg 2 --min 3 --max 3 --avg 3

  # All columns are kept and lookuped value of column 1 from foomap in map.yaml is appended, then moved to the first column
  count -g '*' -M map.yaml -m 1,foomap -r -1

DESCRIPTION

I has written a oneliner like the following repeatedly and repeatedly, to make some statistics.

  perl -e 'while(<>) { @t = split /\t/; ++$c{$t[0]}; } foreach my $k (keys %c) { print "$k,$c{$k}\n" }'

Yes, we can write as the following making use of command line option.

  perl -an -F "\t" -e '++$c{$F[0]} END { foreach my $k (keys %c) { print "$k,$c{$k}\n" }'

This is still verbose in contrast with doing. By this script, you can write as the following. Please NOTE that the number is 1-origin.

  count -g 1 -t "\t"

Conforming to Unix philosophy, this scirpt does NOT have configurable sort functionality. If you want it, you can use sort command.

  count -g 1 -t "\t" | sort -k n1

OPTIONS

-h

Show brief instruction.

--help

Show this POD.

-g|--group <columns>

Specify group columns like GROUP BY in SQL. You can specify multiple times and/or as comma separated numbers.

Specifying a char '*' only means all fields are used as a group. If there are 3 fields in a row, -g * means -g 1,2,3.

-c|--count

Output the number of records. If no other output option is specified, process as if this option is specified.

-s|--sum <columns>

Output the sum of the specified column. You can specify multiple times and/or as comma separated numbers.

--min <columns>

Output the minimum value of the specified column. You can specify multiple times and/or as comma separated numbers.

--max <columns>

Output the maximum value of the specified column. You can specify multiple times and/or as comma separated numbers.

--avg|--ave <columns>

Output the average of the specified column. You can specify multiple times and/or as comma separated numbers.

-m|--map <map>

Output mapped value of the specified column by the specified mapping key. Argument is a list of key and column like.

  -m 0,class,1,subclass

-M|--map-file <filename>

Specify map file used by -m option. The map file is YAML file having the following structure.

  <key1>:
    <number11>: <value11>
    <number12>: <value12>
  <key2>:
    <number21>: <value21>
    <number22>: <value22>

-r|--reorder <reorder>

Specify column reorder as a comma separated integer list. 5,4,3,2,1 means order from the 5th column to the 1st ciolumn. Any omitted indices are filled with unsed indices so far. Negative numbers are treated as indices relatve to the last column. Trailing repeated commas can be omitted. Thus, ,3,2 is treated as if 1,3,2,4,5,... is specified and specifying just -1 means 1-step right rotation that is the last column moves to the first column.

This option is in effect at the last stage of the process wherever it is specified.

-t|--delimiter <delimiter>

Specify field separator character. The character is used by both of input and output. Perl's escape such as '\x0D' is available. So, you need to care about escape. For example, if you want to use a backslash as a delimiter, you need to specify as '\\'.

If omitted, the special rule is applied: /\s+/ is used for an input separator and '\t' is used for an output separator.

files...

Input files. If no files are specified, read from STDIN.

AUTHOR

Yasutaka ATARASHI <yakex@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by Yasutaka ATARASHI.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.