Sort::Naturally



Module Version: 1.03  


Sort::Naturally -- sort lexically, but sort numeral parts numerically


  @them = nsort(qw(
   foo12a foo12z foo13a foo 14 9x foo12 fooa foolio Foolio Foo12a
  print join(' ', @them), "\n";


  9x 14 foo fooa foolio Foolio foo12 foo12a Foo12a foo12z foo13a

(Or "foo12a" + "Foo12a" and "foolio" + "Foolio" and might be switched, depending on your locale.)


This module exports two functions, nsort and ncmp; they are used in implementing my idea of a "natural sorting" algorithm. Under natural sorting, numeric substrings are compared numerically, and other word-characters are compared lexically.

This is the way I define natural sorting:

The nsort function

This function takes a list of strings, and returns a copy of the list, sorted.

This is what most people will want to use:

  @stuff = nsort(...list...);

When nsort needs to compare non-numeric substrings, it uses Perl's lc function in scope of a <use locale>. And when nsort needs to lowercase things, it uses Perl's lc function in scope of a <use locale>. If you want nsort to use other functions instead, you can specify them in an arrayref as the first argument to nsort:

  @stuff = nsort( [
                    \&string_comparator,   # optional
                    \&lowercaser_function  # optional

If you want to specify a string comparator but no lowercaser, then the options list is [\&comparator, ''] or [\&comparator]. If you want to specify no string comparator but a lowercaser, then the options list is ['', \&lowercaser].

Any comparator you specify is called as $comparator->($left, $right), and, like a normal Perl cmp replacement, must return -1, 0, or 1 depending on whether the left argument is stringwise less than, equal to, or greater than the right argument.

Any lowercaser function you specify is called as $lowercased = $lowercaser->($original). The routine must not modify its $_[0].

The ncmp function

Often, when sorting non-string values like this:

   @objects_sorted = sort { $a->tag cmp $b->tag } @objects;

...or even in a Schwartzian transform, like this:

   @strings =
     map $_->[0]
     sort { $a->[1] cmp $b->[1] }
     map { [$_, make_a_sort_key_from($_) ]
   ; wight want something that replaces not sort, but cmp. That's what Sort::Naturally's ncmp function is for. Call it with the syntax ncmp($left,$right) instead of $left cmp $right, but otherwise it's a fine replacement:

   @objects_sorted = sort { ncmp($a->tag,$b->tag) } @objects;

   @strings =
     map $_->[0]
     sort { ncmp($a->[1], $b->[1]) }
     map { [$_, make_a_sort_key_from($_) ]

Just as with nsort can take different a string-comparator and/or lowercaser, you can do the same with ncmp, by passing an arrayref as the first argument:

  ncmp( [
          \&string_comparator,   # optional
          \&lowercaser_function  # optional
        $left, $right

You might get string comparators from Sort::ArbBiLex.



Copyright 2001, Sean M. Burke, all rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.


Sean M. Burke

