The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::Fold - Turn “unicode” and “byte” string text into lines of a given width, soft-hyphenating broken words

VERSION

This document describes Text::Fold version 0.5

SYNOPSIS

    use Text::Fold;

    my $78_char_wide_text = fold_text($long_lines);

    my $42_char_wide_text = fold_text($long_lines,42);

DESCRIPTION

This simple folding mechanism is intended to turn a long string of text (possibly containg multiple lines) into multiple lines not exceeding a certain width.

It should work consistently with Unicode strings and Byte strings.

See the rest of this document for further details.

What this is/does and what this is/does not

Before you worry that this module is superfluous and send me hate mail consider the context of this module, then decide:

What it is meant for

  • Handling Unicode strings (e.g. "Perl is the \x{32b7}\x{2122}") and byte strings (e.g. "Perl is the \xe3\x8a\xb7\xe2\x84\xa2" or "Perl is the ㊷™")

    All 3 formats should be considered 14 characters longs. You should get back the same type of string you passed it.

  • Folding long text (possibly containing multiple lines) into multiple lines not exceeding the given width in characters

  • connecting words that span the width limit with a (loose) soft hyphen

What it is not meant for

  • Understanding locale specific items, whitespace beyond the very basic, or special character behavior

  • folding in column or byte width context

  • normalization or implied understanding of your context

    e.g. a tab is a single character, if you mean it to stand for 4 spaces then normalize it first into 4 spaces

    Your data should be encoded properly before folding it. If you really want the original encoding then re-encode the results.

See Also

Here are some other modules that do similar things that you might like to use instead and the reasons I opted to do a different one.

Text::LineFold

Lines longer than the 'ColumnsMax' are not chunked. In other words you will get text wider than what you want.

Too many options/functionality for what the goal of this module was.

Did not soft-hyphen broken words.

Text::Wrap

Behavior set by global vars. (I know, I too used them back in the day when they were all the rage, I am working on rectifying that!)

Unintuitive interface.

Too many options/functionality for what the goal of this module was.

Did not soft-hyphen broken words.

Text::Format

Too many options/functionality for what the goal of this module was.

Did not soft-hyphen broken words.

Text::WrapI18N

Too many options/functionality for what the goal of this module was.

Did not soft-hyphen broken words.

EXPORTS

It exports fold_text() unless you bring it in a non-import() way, i.e.:

    use Text::Fold; # we now have fold_text() in this package since its import() was called

    use Text::Fold (); # we do not have fold_text() in this package since its import() was not called, we have Text::Fold::fold_text()

    require Text::Fold; # we do not have fold_text() in this package since its import() was not called, we have Text::Fold::fold_text()

INTERFACE

It has a single function: fold_text()

fold_text()

The first argument is the string to fold (either a Unicode string or a Byte string).

Additional arguments are described in the follow list:

no hashref argument

The second argument (optional) is the width. It defaults to 78.

The third argument (optional) is the string to join the chunks back together again. It defaults to "\n".

    my $78_char_wide_text = fold_text( $string ); # 78 chars wide lines seperated by \n
    
    my $42_char_wide_text = fold_text( $string, 42, "\n\n\n" ); # 42 chars wide lines seperated by 3 newlines
    
    my $78_char_wide_text = fold_text( $string, undef, "\n\n\n" ); # 78 chars wide lines seperated by 3 newlines
hashref argument

The second or third argument, both optional, can instead be a hashref with behavior options.

If a hashref is passed as the second argument the default width is used and any third argument is ignored.

If a hashref is passed as the third argument the second argument remains the same as when used under “no hashref argument” style calls.

The keys to the hashref are all optional and can be:

join

The same value as the third argument when not using a hasref argument.

soft_hyphen_threshold

The value should be the maximim number of “non-whitespace” (See the blurb about whitespace under “What it is not meant for”) characters that a sequence can be before it is soft hyphenated.

passing "0E0" will do a default value of about 20% of the width (i.e. I say “about” because it is $width/5 passed through int())

Not passing this key (default) simply does not enable this behavior and results in a chunk of anysize (i.e. 2 or more) being soft hyphenated.

    # w/ sequence of 15 (i.e. int(78/5)) characters or less moved to the next line
    my $78_char_wide_text = fold_text( $string,{ 'soft_hyphen_threshold' => '0E0'} );
    
    # w/ sequence of 20 characters or less moved to the next line
    my $42_char_wide_text =fold_text( $string, 42, { 'soft_hyphen_threshold' => '20', 'join' => "\n\n\n"} );

Regardless of the string type the intended character counts as 1 character. For example, the Unicode string "Perl is the \x{32b7}\x{2122}" and the Byte strings "Perl is the \xe3\x8a\xb7\xe2\x84\xa2" and "Perl is the ㊷™" are all considered 14 characters longs

It returns a string of multiple lines each of which do not exceed the width.

Words that crossed the width boundary are (loosely) notated with a soft hyphen taking into account the 'soft_hyphen_threshold' setting if any.

The string is the same type you passed in (either a Unicode string or a Byte string).

DIAGNOSTICS

Throws no warnings or errors of its own.

DEPENDENCIES

Encode

INCOMPATIBILITIES

None reported.

BUGS AND LIMITATIONS

No bugs have been reported.

Please report any bugs or feature requests to bug-text-fold@rt.cpan.org, or through the web interface at http://rt.cpan.org.

TODO

Would any one find array context useful?

Do you know of a better/faster/etc way to do what it does?

AUTHOR

Daniel Muey <http://drmuey.com/cpan_contact.pl>

LICENCE AND COPYRIGHT

Copyright (c) 2010, Daniel Muey <http://drmuey.com/cpan_contact.pl>. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.