The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Compress::BraceExpansion - create a human-readable compressed string suitable for shell brace expansion

VERSION

This document describes Compress::BraceExpansion version 0.1.5. This is a beta release.

SYNOPSIS

    use Compress::BraceExpansion;

    # output: ab{c,d}
    print Compress::BraceExpansion->new( qw( abc abd ) )->shrink();

    # output: aabb{cc,dd}
    print Compress::BraceExpansion->new( qw( aabbcc aabbdd ) )->shrink();

    # output: aa{bb{cc,dd},eeff}
    print Compress::BraceExpansion->new( qw( aabbcc aabbdd aaeeff ) )->shrink();

DESCRIPTION

Shells such as bash and zsh have a feature call brace expansion. These allow users to specify an expression to generate a series of strings that contain similar patterns. For example:

  $ echo a{b,c}
  ab ac

  $ echo aa{bb,xx}cc
  aabbcc aaxxcc

  $ echo a{b,x}c{d,y}e
  abcde abcye axcde axcye

  $ echo a{b,x{y,z}}c
  abc axyc axzc

This module was designed to take a list of strings with similar patterns (e.g. the output of a shell expansion) and generate an un-expanded expression. Given a reasonably sized array of similar strings, this module will generate a single compressed string that can be comfortably parsed by a human.

The current algorithm only works for groups of input strings that start with and/or end with similar characters. See BUGS AND LIMITATIONS section for more details.

WHY?

My initial motivation to write this module was to compress the number of characters that are necessary to display a list of server names, e.g. to send in the subject of a text message to a pager/mobile phone. If I start with a long list of servers that follow a standard naming convention, e.g.:

    app-dc-srv01 app-dc-srv02 app-dc-srv03 app-dc-srv04 app-dc-srv05
    app-dc-srv06 app-dc-srv07 app-dc-srv08 app-dc-srv09 app-dc-srv10

After running through this module, they can be displayed much more efficiently on a pager as:

    app-dc-srv{0{1,2,3,4,5,6,7,8,9},10}

The algorithm can also be useful for directories:

    /usr/local/{bin,etc,lib,man,sbin}

BRACE EXPANSION?

Despite the name, this module does not perform brace expansion. If it did, it probably should have been located in the Shell:: heirarchy. It attempts to do the opposite which might be referred to as 'brace compression', hence the location it in the Compress:: heirarchy. The strings it generates could be used in a shell, but are more likely useful to make a (potentially) human-readable compressed string. I chose the name BraceExpansion since that's the common term, so hopefully it will be more recognizable than if it were named BraceCompression.

CONSTRUCTOR

new( )

Returns a reference to a new Compress::BraceExpansion object.

May be initialized with a hash of options:

    Compress::BraceExpansion->new( { strings => [ qw( abc abd ) ] } );

Or with an array ref:

    Compress::BraceExpansion->new( [ qw( abc abd ) ] );

Or with an array:

    Compress::BraceExpansion->new( qw( abc abd ) );

This is an inside-out perl class. For more info, see "Perl Best Practices" by Damian Conway

METHODS

shrink( )

Perform brace compression on strings. Returns a string that is suitable for brace expansion by the shell.

This method has not been designed being called multiple times on the same Compress::BraceExpansion object. If you call shrink() more than once on the same object, you're on your own.

enable_debug( )

Enable various internal data structures to be printed to stdout.

BUGS AND LIMITATIONS

The current algorithm is pretty ugly, and will only compress strings that start and/or end with similar text. I've been working on a new algorithm that uses a weighted trie.

If multiple identical strings are supplied as input, they will only be represented once in the resulting compressed string. For example, if "aaa aaa aab" was supplied as input to shrink(), then the result would simply be "aa{a,b}".

This module has reasonably fast performance to at least 1000 inputs strings. I've run several tests where I cut a 10k word slice from /usr/share/dict/words and have consistently achieved around 50% compression. However, even for strings that are very similar, the output rapidly loses human readability beyond a couple hundred characters.

Please report problems to VVu@geekfarm.org.

Patches and suggestions are welcome!

SEE ALSO

  - brace-compress - included command line script in scripts/ directory

  - http://www.gnu.org/software/bash/manual/bashref.html#SEC27

  - http://zsh.sourceforge.net/Doc/Release/zsh_13.html#SEC60

AUTHOR

Alex White <vvu@geekfarm.org>

LICENCE AND COPYRIGHT

Copyright (c) 2006, Alex White <vvu@geekfarm.org>. All rights reserved.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

- Neither the name of the geekfarm.org nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.