Bridget McInnes > Text-NSP-1.21 > split-data.pl

Download:
Text-NSP-1.21.tar.gz

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Source   Latest Release: Text-NSP-1.27

NAME ^

split-data.pl - Divide a text file in N approximately equal parts

SYNOPSIS ^

Splits a given data file into N parts such that each part has approximately same number of lines.

USAGE ^

split-data.pl [Options] DATA

Type 'split-data.pl --help' for a quick summary of the Options.

INPUT ^

Required Arguments:

DATA

DATA should be a file in plain text format such that each line in the DATA file shows a single training example.

Optional Arguments:

--parts N

Splits the DATA file into N equal parts. If the DATA file has M lines, each part except the last part will have int(M/N) lines while the last part will have all the remaining lines, M - (N-1 * (int(M/N))).

Default N is 10.

Other Options :

--help

Displays the quick summary of options.

--version

Displays the version information.

OUTPUT ^

split-data.pl creates exactly N files in the current directory. If the name of the DATA file is say DATA-file, then the N files will have names as DATA-file1, DATA-file2, DATA-file3,... DATA-fileN. e.g. If the DATA filename is ANC, then the N files created by split-data.pl will have names like ANC1, ANC2, ..., ANCN.

A DATA file containing total M lines is split into N parts such that each part/file contains approximately M/N lines.

Thus, if N = 1, the output file will be exactly same as the given DATA file. If N = M where N = value of --parts and M = #lines in DATA then, each part will have a single line.

AUTHOR ^

Amruta Purandare, Ted Pedersen. University of Minnesota, Duluth.

COPYRIGHT ^

Copyright (c) 2004,

Amruta Purandare, University of Minnesota, Duluth. pura0010@umn.edu

Ted Pedersen, University of Minnesota, Duluth. tpederse@umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

syntax highlighting: