The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Parse::ExuberantCTags::Merge - Efficiently merge large exuberant ctags files

SYNOPSIS

  use Parse::ExuberantCTags::Merge;
  my $merger = Parse::ExuberantCTags::Merge->new();
  $merger->add_file('perltags.old',  sorted => 0);
  $merger->add_file('perltags.new',  sorted => 1);
  $merger->add_file('perltags.new2', sorted => 1);
  # potentially add more files...
  
  # sorting happens only when you call 'write':
  $merger->write('perltags.out');

DESCRIPTION

This Perl module is intended to merge multiple exuberant ctags files. The synopsis says all about the interface. In order to be as efficient as possible, the module uses different sort methods depending on the input data. In the general case, it will use the Sort::External module to process the data. There are a few exceptions:

Pre-sorted input files

If two or more input files contain sorted data, we use the a merge sort to efficiently sort them before merging with the remaining data.

Small input files

If the total size of the input files is small, we load them into memory and use Perl's fast sort function. Default limit: 2^21B == 4MB.

Super-small input files

If the total size of the input files is extremely small, we ignore whether they're sorted or not and simply resort to Perl's sort. Default limit: 2^17B == 128kB.

The sorting modules are loaded at run-time on demand only.

METHODS

new

Creates a new merger object.

add_file

Adds a file to the merging process. First argument must be the file name followed by an optional named argument 'sorted' (default: false) which affects the way the data will be merged. Mixing sorted with unsorted files is possible and will produce a sorted output.

Pre-sorted files are naturally somewhat faster to merge.

small_size_threshold

Set this to the threshold under which the total size of the input files is to be considered small enough to be sorted in memory (see above). The default should be fine.

super_small_size_threshold

Set this to the threshold under which the total size of the input files is to be considered small enough to be sorted in memory regardless of whether the input was partly sorted (see above). The default should be fine.

This makes more sense than it sounds. Perl's sort function is fast. For small amounts of data, its low overhead wins significantly over the sort complexity.

tempdir

You can use this to set the location of the temporary files that are used for sorting and merging large files. By default, it goes into File::Spec-tmpdir()>.

TODO

Benchmark.

SEE ALSO

Exuberant ctags homepage: http://ctags.sourceforge.net/

Wikipedia on ctags: http://en.wikipedia.org/wiki/Ctags

Module that can produce ctags files from Perl code: Perl::Tags

Module that can parse exuberant ctags files: Parse::ExuberantCTags

Sorting modules: Sort::External, File::MergeSort (though we use a home-grown merge-sort)

File::PackageIndexer

AUTHOR

Steffen Mueller, <smueller@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2009 by Steffen Mueller

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.6 or, at your option, any later version of Perl 5 you may have available.