The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
NAME
    Lingua::EN::Titlecase - Titlecase English words by traditional editorial
    rules.

VERSION
    0.14

SYNOPSIS
     use Lingua::EN::Titlecase;
     my $tc = Lingua::EN::Titlecase->new("CAN YOU FIX A TITLE?");
     print $tc->title(), $/;

     $tc->title("and again but differently");
     print $tc->title(), $/;

     $tc->title("cookbook don't work, do she?");
     print "$tc\n";

DESCRIPTION
    Editorial titlecasing in English is the initial capitalization of
    regular words minus inner articles, prepositions, and conjunctions.

    This is one of those problems that is somewhat easy to solve for the
    general case but impossible to solve for all cases. Hence the lack of
    module till now. This module takes an optimistic approach, assuming that
    some words, unless there are clues to the contrary, are likely to be
    correct already. Most titlecase implementations, for example, convert
    everything to lowercase first. This is obviously flawed for many common
    cases like proper names and abbreviations.

    Simple techniques like--

     $data =~ s/(\w+)/\u\L$1/g;

    Fail on words like "can't" and don't always take into account editorial
    rules or cases like--

    compound words -- Perl-like
    abbreviations -- USA
    mixedcase and proper names -- eBay: nEw KEyBOArD
    all caps -- SHOUT ME DOWN

    Lingua::EN::Titlecase attempts to cater to the general cases and provide
    hooks to address the special.

INTERFACE
    $tc = Lingua::EN::Titlecase->new
        The string to be titlecased can be set three ways. Single argument
        to new. The "original" hash element to "new". With the "title"
        method.

         $tc = Lingua::EN::Titlecase->new("this should be titlecased");

         $tc = Lingua::EN::Titlecase->new(original => "no, this is",
                                          mixed_threshold => 0.5);

         $tc->title("i beg to differ");

        The last is to be able to reuse the Titlecase object.

        Lingua::EN::Titlecase objects stringify to their processed
        titlecase, if they have a string, the ref of the object otherwise.

    $tc->original
        Returns the original string.

    $tc->title
        Set the original string, returns the titlecased version. Both can be
        done at once.

         print $tc->title("did you get that thing i sent?"), "\n";

    $tc->titlecase
        Returns the titlecased string. Croaks if there is no original set
        via the constructor or the method "title".

    $tc->uppercase
        Returns the list of uppercase letters found. Includes those mixed
        case letters. Chiefly used internally for determining if string has
        exceeded the set threshold to be considered "all caps."

    $tc->word_punctuation(qr/['-]/)
        Sets the regex which will be used to allow punctuation inside words.
        The default is "[:punct:]." This is more reasonable that it might
        sound as word boundaries generally have either a space or more than
        one piece of punctuation. Any instance of the word_punctuation is
        allowed inside a "word" if it is surrounded by [:alpha:]s. E.g.,
        [:punct:] makes all these one "word" for titlecasing--

         can't
         cpan.org
         cow-catcher
         M!M

        Set on construction or reset it to change the behavior--

         Lingua::EN::Titlecase->new(word_punctuation => "['-]");

         $tc->word_punctuation(qr/['-]/)
         # "can't" and "cow-catcher" are still one word
         # "cpan.org" is now two and the "Org" will get titlecased

    $tc->lexer
        Get/set the lexer sub ref. You should probably ignore this. If you
        think otherwise, read the source for more.

  STRATEGIES
    One of the hardest parts of properly titlecasing input is knowing if
    part of it is already correct and should not be clobbered. E.g.--

     Old MacDonald had a farm

    Is partly right and the proper name MacDonald should be left alone.
    Lowercasing the whole string and then title casing would yield--

      Old Macdonald Had a Farm

    So, to determine when to flatten a title to lowercase before processing,
    we check the ratio of mixedcase and the ratio of caps.

    $tc->mixed_threshold
        Set/get. The ratio of mixedcase to letters which triggers
        lowercasing the whole string before trying to titlecase. The
        built-in threshold to clobber is 0.25. Example breakpoints.

         0.09 --> Old Macdonald Had a Farm
         0.10 --> Old MacDonald Had a Farm

         0.14 --> An Ipod with Low Ph on Ebay
         0.15 --> An iPod with Low pH on eBay

    $tc->uc_threshold
        Same as mixed but for "all" caps. Default threshold is 0.95.

    $tc->mixed_case
        Scalar context returns count of mixedcase letters found. All caps
        and initial caps are not counted. List context returns the letters.
        E.g.--

         my $tc = Lingua::EN::Titlecase->new();
         $tc->title("tHaT pROBABly Will nevEr BE CorrectlY hanDled");
         printf "%d, %s\n",
             scalar($tc->mixedcase),
             join(" ", $tc->mixedcase);

        Yields--

         11, H T R O B A B E C Y D

        This is useful for determining if a string is overly mixed.
        Substrings like "pH" crop up now and then but they should never
        compose a high percentage of a properly cased title.

    $tc->wc
        "Word" count. Scalar context returns count of "words." List returns
        them.

    $tc->mixedcase
        Count/list of mixedcase letters found.

    $tc->whitespace
        Count/list of whitespace -- \s+ -- found.

DIAGNOSTICS
    No diagnostics for you!
      [Non-existent description of error here]

TODO
    Dictionary hook to allow BIG lists of proper names and lc to be applied.

    Handle internal punctuation like an em-dash as the equivalent of "--"?

    Handle hypens; user hooks.

    Move to Mouse or Moose?

    Handle classes of things to be left alone if of a case. Like Roman
    numerals? Better to have it be rule based where each rule is used to
    find a thing, apply a threshold map, possibly convert lc/uc, and then
    titlecase or accept. This could get much messier than a dictionary and
    might cause problems with overlap like i v I.

    Allow a grammar parser object (on demand, if available) to correctly
    identify a word's part of speech before applying casing. "To" might be a
    proper name, for example, and "A" might be a grade.

    Debug ability. Log object or to carp?

    Recipes. Including TT2 "plugin" recipe. Mini-scripts to test strings or
    accomplish custom configuration goals.

    Take out Class::Accessor...? For having it all in one place, checking
    args, and slight speed gain.

    Add ignore classes? Like \bhttp://...

    Bigger test suite.

SEE ALSO
    Lingua::EN::Titlecase::HTML for titlecasing text with markup.

RECIPES
   Passing L::E::T object to TT2
     use Template;
     use CGI "header";
     use Lingua::EN::Titlecase;
     my @titles = (
                   "orphans of the sky",
                   "childhood's end",
                   "the many-colored land",
                   "llana of gathol",
                   );
 
     print header(-content_type => "text/plain");
     my $tt2 = Template->new();
 
     $tt2->process(\*DATA,
                   { tc => Lingua::EN::Titlecase->new(),
                     title => \@titles }
                   );
     __DATA__
     [%-USE format %]
     [%-pretty_print = format('%30s : %s') %]
     [%-FOR t IN title %]
     [% pretty_print( t, tc.title(t) ) %]
     [%-END %]

CONFIGURATION AND ENVIRONMENT
    Lingua::EN::Titlecase requires no configuration files or environment
    variables.

DEPENDENCIES
    Perl 5.6 or better to support POSIX regex classes.

BUGS AND LIMITATIONS
    Please report any bugs or feature requests to
    "bug-lingua-en-titlecase@rt.cpan.org", or through the web interface at
    <http://rt.cpan.org>.

AUTHOR
    Ashley Pond V "<ashley@cpan.org>".

LICENSE AND COPYRIGHT
    Copyright (c) 2008-2009, Ashley Pond V "<ashley@cpan.org>".

    This module is free software; you can redistribute it and modify it
    under the same terms as Perl itself. See perlartistic.

DISCLAIMER OF WARRANTY
    Because this software is licensed free of charge, there is no warranty
    for the software, to the extent permitted by applicable law. Except when
    otherwise stated in writing the copyright holders and other parties
    provide the software "as is" without warranty of any kind, either
    expressed or implied, including, but not limited to, the implied
    warranties of merchantability and fitness for a particular purpose. The
    entire risk as to the quality and performance of the software is with
    you. Should the software prove defective, you assume the cost of all
    necessary servicing, repair, or correction.

    In no event unless required by applicable law or agreed to in writing
    will any copyright holder, or any other party who may modify or
    redistribute the software as permitted by the above license, be liable
    to you for damages, including any general, special, incidental, or
    consequential damages arising out of the use or inability to use the
    software (including but not limited to loss of data or data being
    rendered inaccurate or losses sustained by you or third parties or a
    failure of the software to operate with any other software), even if
    such holder or other party has been advised of the possibility of such
    damages.