The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Sort::Bucket - a fast XS bucket sort

SYNOPSIS

  use Sort::Bucket qw(inplace_bucket_sort);

  inplace_bucket_sort(@some_array);

DESCRIPTION

A fast XS implementation of the Bucket Sort algorithm. If your data is well suited to a bucket sort then this can be significantly faster than Perl's builtin sort() function.

A bucket sort works best when there is a large amount of variation in the first few bytes of the strings to be sorted.

LIMITATIONS

Only in-place sorting is implemented so far.

Limited to sorting in standard string comparison order.

Sort::Bucket can be used to sort either byte strings or character strings, but not a mixture. If you apply Sort::Bucket to an array containing both byte strings and character strings then it may sort them into the wrong order, so don't do that.

There is no locale support, i.e. use locale will not effect the order into which this module sorts your strings as it does with Perl's builtin sort.

If the array is already partially sorted, Perl's builtin sort() can take advantage of that to sort it very quickly. For this reason, Sort::Bucket can be slower than Perl's builtin sort on partially sorted input, even with data that's well suited to a bucket sort.

EXPORTS

Exports nothing by default. The following functions are available for import:

inplace_bucket_sort( ARRAY_OF_STRINGS [,BUCKET_BITS] )

Sorts ARRAY_OF_STRINGS in-place.

BUCKET_BITS specifies the number of bits from the start of each string to use to select the bucket into which to place the string. If set, it must be an integer from 1 to 31.

The elements of the array will be distributed into 2**BUCKET_BITS buckets, and then sorted within each bucket. If BUCKET_BITS is omitted then a sensible value will be selected based on the size of the array.

ALGORITHM

For each element in the array, a major bucket and a minor bucket value is computed. The major bucket is the first BUCKET_BITS bits of the string, and the minor bucket is the next 32 bits.

The elements are then distributed into 2**BUCKET_BITS major buckets, according to their major bucket values. The minor bucket value is stored along with each element.

For each of the 2**BUCKET_BITS possible major bucket values, the elements in that bucket are sorted by minor bucket. Any that compare equal by minor bucket are further sorted using Perl's native sort.

The extra step of sorting by minor bucket before falling back to Perl's sort speeds up the process, since comparing 32-bit integers is much faster than comparing Perl scalars.

SEE ALSO

"sort" in perlfunc

There are several other fast sorting modules on CPAN, any of which may be faster than Sort::Bucket for your data. For example, Sort::Key. Sort::Maker, Sort::Merge.

If the data to be sorted does not fit into memory, you probably want Sort::External.

AUTHOR

Nick Cleaton, <nick@cleaton.net>

COPYRIGHT AND LICENSE

Copyright (C) 2010 by Nick Cleaton

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.