The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
package Pod::Clipper;
use Moose;

use Pod::Clipper::Block;
use Text::Trim;

BEGIN {
    our $VERSION = '0.01';
}

has 'data' => (
    is       => 'rw',
    isa      => 'Str',
    required => 1, 
);

has 'newline_seq' => (
    is       => 'rw',
    isa      => 'Str',
    default  => "\n",
);

has [qw/from_file ignore_whitespace ignore_leading_whitespace ignore_trailing_whitespace/] => (
    is       => 'rw',
    isa      => 'Bool',
);

has 'ignore_invalid_pod' => (
    is       => 'rw',
    isa      => 'Bool',
    default  => 1,
);

has 'append_newline' => (
    is       => 'rw',
    isa      => 'Bool',
    default  => 0,
);

has '_blocks' => (
    is       => 'rw',
    isa      => 'ArrayRef[Pod::Clipper::Block]',
    default  => sub { [] },
    init_arg => undef,
);

sub BUILD {
    my $self = shift;
    $self->_build;
}

sub rebuild {
    my $self = shift;
    $self->_build;
}

sub _build {
    my $self = shift;
    my $nl = $self->newline_seq;
    my $data;
    if ($self->from_file) {
        open(FH, $self->data) or die $!;
        # save those precious bytes of memory! (i.e. no @lines = <FH>;)
        while (<FH>) {
            $data .= $_;
        }
        close(FH);
    }
    else { $data = $self->data; }
    if (!defined($self->ignore_whitespace) || $self->ignore_whitespace) {
        $data = trim($data);
    }
    else {
        $data = ltrim($data) if $self->ignore_leading_whitespace;
        $data = rtrim($data) if $self->ignore_trailing_whitespace;
    }
    my @lines = split $nl, $data;
    my $in_pod = 0;
    my @block;
    undef @{$self->_blocks};
    foreach my $l (@lines) {
        if ($l =~ m/\A=cut/) {
            if (!$in_pod) {
                next if $self->ignore_invalid_pod;
                push @block, $l;
                next;
            }
            $in_pod = 0;
            push @block, $l;
            push @block, '' if $self->append_newline;
            my $b = Pod::Clipper::Block->new({ data => join($nl, @block), is_pod => 1 });
            push @{$self->_blocks}, $b;
            undef @block;
            next;
        }
        if ($l =~ m/\A=[a-zA-Z]/ && !$in_pod) {
            $in_pod = 1;
            if (@block) {
                push @block, '' if $self->append_newline;
                my $b = Pod::Clipper::Block->new({ data => join($nl, @block), is_pod => 0 });
                push @{$self->_blocks}, $b;
                undef @block;
            }
        }
        push @block, $l;
    }
    if (@block) {
        push @block, '' if $self->append_newline;
        my $b = Pod::Clipper::Block->new({ data => join($nl, @block), is_pod => $in_pod });
        push @{$self->_blocks}, $b;
    }
}

sub all {
    my $self = shift;
    return $self->_blocks;
}

sub pod {
    my $self = shift;
    my @pod;
    map { push @pod, $_ if $_->is_pod } @{$self->_blocks};
    return \@pod;
}

sub non_pod {
    my $self = shift;
    my @non_pod;
    map { push @non_pod, $_ if !$_->is_pod } @{$self->_blocks};
    return \@non_pod;
}

=head1 NAME

Pod::Clipper - Extract blocks of POD from a text document

=head1 SYNOPSIS

  use Pod::Clipper;
  my $clipper = Pod::Clipper->new({ data => $data });
  my $all_blocks = $clipper->all;
  foreach (@{$all_blocks}) {
      # do something with $_->data
      if ($_->is_pod) {
          # POD block. do something with the POD data...
          # e.g. convert it to HTML
      }
      else {
          # non-POD block (code etc). do something else with it...
          # e.g. syntax-highlight it
      }
  }

=head1 DESCRIPTION

This module allows you to divide a document/string into POD and non-POD
blocks of text. This is useful for extracting POD data (or code) from a
"mixed" document, like most perl modules on CPAN.

POD data is identified as per the L<perlpodspec|perlpodspec(1)> manpage.
Invalid POD is simply ignored. The only case for this is if a line
matched C</\A=cut/> without a starting POD command (e.g. C<=head1, =head2,>
etc). That line will be completely ignored. If you want such lines to be
included as part of the non-POD blocks, set C<ignore_invalid_pod> to false.

Please note that C<Pod::Clipper> doesn't check the POD data itself for
validity. For example, you may have a mismatched bracket in your POD
like C<CE<lt>E<lt>mismatchedE<gt>>. C<Pod::Clipper> only cares about the
POD commands that mark the beginning and end of your blocks (i.e. where
these blocks should be I<clipped>). It doesn't care about the actual POD
data. Hence, the only case for invalid POD that C<Pod::Clipper> can detect
is if you have a dangling C<=cut> command (explained in the previous
paragraph and L<below|/ignore_invalid_pod>).

By default, leading and trailing whitespace characters are ignored.
To change this, set C<ignore_whitespace> to false.
You can also use C<ignore_leading_whitespace> and
C<ignore_trailing_whitespace> for more control. See below.

=head1 METHODS

=head2 new

This is the C<Pod::Clipper> constructor. As with many perl modules,
configuration options are passed as a hash reference. The available
options are:

=over

=item data

The data you want to process into POD and non-POD blocks. This is a required option.

=item from_file

If this option is set (to true), C<data> is treated as a filename and your
data is pulled from there. If an error occurs (file doesn't exist etc), an
exception will be thrown.

  eval { my $c = Pod::Clipper->new({ data => 'Test.pm', from_file => 1 }); };
  if ($@) {
      # some IO error occurred. the caught error string ($@) is set to
      # whatever perl passed in $! after open() failed
  }

=item newline_seq

The I<line separator> that should be used. The default newline sequence used
to separate lines is C<\n>. In most cases this will do the right thing (perl
treats C<\n> differently depending on what platform it is running on -- see
L<binmode(1)>). However, sometimes you may want to use a different newline
sequence. For example, you're running a script on *nix and trying to read a
file that was created on Windows. In that case, set newline_seq to C<\r\n>
in order to get the correct results. If you're running perl 5.10.x or
newer, you can use C<\R> as your newline sequence and everything should
magically work regardless of where the file was created and what
platform perl is running on.

=item append_newline

By default, C<Pod::Clipper> excludes the last newline character in each
block. For example, if you have the following:

  # line 1
  # line 2
  =pod
  
  test
  
  =cut

C<Pod::Clipper> would divide the text above into these two blocks:

  # line 1
  # line 2 <--- block ends here (no newline)

and

  =pod
  
  test
  
  =cut <--- block ends here (no newline)

If you set C<append_newline> to true, you would get the following blocks
instead:

  # line 1
  # line 2
  <--- block ends here

and

  =pod
  
  test
  
  =cut
  <--- block ends here

Please remember that (the last line of) your data may not necessarily end
with a newline character, so setting C<append_newline> to true may tack an
extra one to the last block.

=item ignore_whitespace

Ignore leading and trailing whitespace characters in your data. Default is
true.

=item ignore_leading_whitespace

Ignore leading whitespace characters in your data. If C<ignore_whitespace>
is set (to true) this option will be ignored.

=item ignore_trailing_whitespace

Ignore trailing whitespace characters in your data. If C<ignore_whitespace>
is set (to true) this option will be ignored.

=item ignore_invalid_pod

Defaults to true which completely ignores "dangling" C<=cut> commands
in the data. Setting this option to false causes such lines to be treated
as non-POD text.

=back

Each of the options listed above also have an accessor/mutator method by
the same name. For example:

  my $data = $clipper->data; # get
  $clipper->data($new_data); # set

=head2 rebuild

If you want to resuse the same C<Pod::Clipper> object for a different
document/string, make sure to call C<rebuild()> after you update your
data and other parameters so that C<all(), pod(),> and C<non_pod()>
return the correct results. For example:

  $clipper->data($new_data);
  $clipper->ignore_whitespace(0);
  $clipper->rebuild; # parse $new_data and build the new blocks
  # $clipper->all now reflects the new data

=head2 all

This method returns an array reference to all blocks, i.e. the POD
and non-POD ones. The order of these blocks as they appear in the
original data is preserved.

=head2 pod

Same as C<all()>, but returns only the POD blocks.

=head2 non_pod

Same as C<all()>, but returns only the non-POD blocks.

=head1 BUGS

There are no known bugs. If you find one, please report it
to me at the email address listed below. Any other suggestions
or comments are also welcome.

=head1 AUTHOR

Yousef H. Alhashemi <yha@cpan.org>

=head1 COPYRIGHT

This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the
LICENSE file included with this module.


=head1 SEE ALSO

L<Pod::Clipper::Block|Pod::Clipper::Block>

=cut

1; # leave this here!