NAME ^ - Split bigrams into pieces.


See perldoc



Required Arguments:


Input to should be a file generated by or with tokenlist option. The results files have the same name with the input source file and each split file has an extention sequence number.

--split N

This parameter should be set. huge-split will divide the output bigrmas tokenlist generated by or Each part created with --split N will contain N lines. Value of N should be chosen such that can be efficiently run on any part containing N lines from the file contains all bigrams file.

We suggest that N is equal to the number of KB of memory you have. If the computer has 8 GB RAM, which is 8,000,000 KB, N should be set to 8000000.

Other Options :


Displays this message.


Displays the version information.


Amruta Purandare, Ted Pedersen, Ying Liu. University of Minnesota at Duluth.


