Bertil Kronlund > Filter-Heredoc > Filter::Heredoc



Annotate this POD

View/Report Bugs
Module Version: 0.02   Source  


Filter::Heredoc - Search and filter embedded here documents


Version 0.02


    use 5.010;
    use Filter::Heredoc qw( hd_getstate hd_init hd_labels );
    use Filter::Heredoc::Rule qw( hd_syntax );
    my $line;
    my %state;
    # Get the defined labels to compare with the returned state
    my %label = hd_labels();

    # Read a file line-by-line and print only the here document
    while (defined( $line = <DATA> )) {
        %state = hd_getstate( $line ); 
        print $line if ( $state{statemarker} eq $label{heredoc} );
        if ( eof ) {
            close( ARGV ); 
            hd_init(); # Prevent state errors to propagate to next file

    # Test a line (is this an opening delimiter line?)
    $line = q{cat <<END_USAGE};
    %state = hd_getstate( $line ); 
    print "$line\n" if ( $state{statemarker} eq $label{ingress} );
    # Load a syntax helper rule (shell script is built in)
    hd_syntax ( 'pod' );


This is the core module for Filter::Heredoc. If you're not looking to extend or alter the behavior of this module, you probably want to look at filter-heredoc instead.

Filter::Heredoc provides subroutines to search and print here documents. Here documents (also called "here docs") allow a type of input redirection from some following text. This is often used to embed short text messages (or configuration files) within shell scripts.

This module extracts here documents from POSIX IEEE Std 1003.1-2008 compliant shell scripts. Perl have derived a similar syntax but is at the same time different in many details.

Rules can be added to enhance here document extraction, i.e. prevent "false positives". Filter::Heredoc::Rule exports an additional subroutine to load and unload rules.

This version supports a basic POD rule. Current subroutines can be tested on Perl scripts if the code constructs use a near POSIX form of here documents. With that said don't rely on the current version for Perl since it's still in a very early phase of development.

Concept to parse here documents.

This is a line-by-line state machine design. Reading from the beginning to the end of a script results in following state changes:

    Source --> Here document --> Source

What tells a source line from a here document line apart? Nothing! However if adding an opening and closing delimiter state and tracking previous state we can identify what is source and what's a here document:

    Source --> Ingress --> Here document --> Egress --> Source

In reality there are few more state changes defined by POSIX. An example of this is the script below and with added state labels:

    S]   #!/bin/bash --posix
    I]   cat <<eof1; cat <<eof2
    H]   Hi,
    E]   eof1
    H]   Helene.
    E]   eof2

Naturally, when bash runs this only the here document is printed:



Filter::Heredoc exports following subroutines only on request.

    hd_getstate   # returns a label based on the argument (text line)
    hd_labels     # reads out and (optionally) define new labels
    hd_init       # flushes the internal state machine

Filter::Heredoc::Rule exports one subroutine to load and unload syntax rules.

    hd_syntax             # load/unload a script syntax rule


This routine determines the new state, based on last state and the new text line in the argument.

    %state = hd_getstate( $line );

Returns a hash with following keys/values:

    statemarker :      Holds a label that represent the state of the line.
    blockdelimiter:    Holds the delimiter which belongs to a 'region'.
    is_tabremovalflag: If the redirector had a trailing minus this
                       value is true for the actual line.

A here document 'region' is defined as all here document lines being bracketed by the ingress (opening delimiter) and the egress (terminating delimiter) line. This region may or may not have a file unique delimiter.

To prevent unreliable results, only pass a text line as an argument. Use file test operators if reading input lines from a file:

    if ( -T $file ) {
      print "$file 'looks' like a plain text file to me.\n";

This function throws exceptions on a few fatal internal errors. These are trappable. See ERRORS below for messages printed.


Gets or optionally sets a new unique label for the four possible states.

    %label = hd_labels();
    %label = hd_labels( %newlabel );

The hash keys defines the default internal label assignments.

    %label = (
            source  => 'S',
            ingress => 'I',
            heredoc => 'H',
            egress  => 'E',

Returns a hash with the current label assignment.


Sets the internal state machine to 'source' and empties all internal state arrays.


When reading more that one file, call this function before next file to prevent any state faults to propagate to next files input. Now always returns an $EMPTY_STR (q{}) but this may change to indicate an state error from previous files.


hd_getstate throws following exceptions.


Filter::Heredoc only requires Perl 5.10 (or any later version).


Bertil Kronlund, <bkron at>


Filter::Heredoc complies with *nix POSIX shells here document syntax. Non-compliant shells on e.g. MSWin32 platform is not supported.

Please report any bugs or feature requests to or at <bug-filter-heredoc at>.


Overview of here documents and its usage:

The IEEE Std 1003.1-2008 standards can be found here:

Filter::Heredoc::Rule, filter-heredoc

Filter::Heredoc::Cookbook discuss e.g. how to embed POD as here documents in shell scripts to carry their own documentation.


Copyright 2011-12, Bertil Kronlund

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See for more information.

syntax highlighting: