App::Framework::Extension::Filter - Script filter application object
use App::Framework '::Filter' ;
Application that filters a file or files to produce some other output
This extension modifies the normal call flow for the application subroutines. The extension calls the subroutines for each input file being filtered. Also, the main 'app' subroutine is called for each of the lines of text in the input file.
The pseudo-code for the extension is:
FOREACH input file <init variables, state HASH> call 'app_start' subroutine FOREACH input line call 'app' subroutine END call 'app_end' subroutine END
For each input file, a state HASH is created and passed as a reference to the application subroutines. The state HASH contains various values maintained by the extension, but the application may add it's own additional values to the HASH. These values will be passed unmodified to each of the application subroutine calls.
The state HASH contains the following fields:
num_files
Total number of input files.
file_number
Current input file number (1 to num_files)
file_list
ARRAY ref. List of input filenames.
vars
HASH ref. Empty HASH created so that any application-specific variables may be stored here.
line_num
Current line number of line being processed (1 to N).
output_lines
ARRAY ref. List of the output lines that are to be written to the output file (maintained by the extension).
file
Current file name of the file being processed.
line
String of line being processed.
output
Special variable used by application to tell extension what to output (see "Output").
The state HASH reference is passed to all 3 of the application subroutines. In addition, the input line of text is also passed to the main 'app' subroutine. The interface for the subroutines is:
Called once for each input file. Called at the start of processing. Allows any setting up of variables stored in the state HASH.
Arguments are:
Called once for each input file. Called at the end of processing. Allows for any end of file tidy up, data sorting etc.
By default, each time the extension calls the 'app' subroutine it sets the output field of the state HASH to undef. The 'app' subroutine must set this field to some value for the extension to write anything to the output file.
For examples, the following simple 'app' subroutine causes all input files to be output uppercased:
sub app { my ($app, $opts_href, $state_href, $line) = @_ ; # uppercase $state_href->{output} = uc $line ; }
If no "outfile" option is specified, then all output will be written to STDOUT. Also, normally the output is written line-by-line after each line has been processed. If the "buffer" option has been specified, then all output lines are buffered (into the state variable "output_lines") then written out at the end of processing all input. Similarly, if the "inplace" option is specified, then buffering is used to process the complete input file then overwrite it with the output.
The "outfile" option may be used to set the output filename. This may include variables that are specific to the Filter extension, where the variables value is updated for each input file being processed. The following Filter-sepcific variables may be used:
$filter{'filter_file'} = $state_href->{file} ; $filter{'filter_filenum'} = $state_href->{file_number} ; my ($base, $path, $ext) = fileparse($file, '\..*') ; $filter{'filter_name'} = $base ; $filter{'filter_base'} = $base ; $filter{'filter_path'} = $path ; $filter{'filter_ext'} = $ext ;
NOTE: Specifying these variables for options at the command line will require you to escape the variables per the operating system you are using (e.g. use single quotes ' ' around the value in Linux).
For example, with the command line arguments:
-outfile '/tmp/$filter_name-$filter_filenum.txt' afile.doc /doc/bfile.text
Processes './afile.doc' into '/tmp/afile-1.txt', and '/doc/bfile.text' into '/tmp/bfile-2.txt'
As an example, here is a script that filters one or more HTML files to strip out unwanted sections (they are actually Doxygen HTML files that I wanted to convert into a pdf book):
#!/usr/bin/perl # use strict ; use App::Framework '::Filter' ; # VERSION our $VERSION = '1.00' ; ## Create app go() ; #---------------------------------------------------------------------- sub app_begin { my ($app, $opts_href, $state_href, $line) = @_ ; # force in-place editing $app->set(inplace => 1) ; # set to start state $state_href->{vars} = { 'state' => 'start', } ; } #---------------------------------------------------------------------- # Main execution # sub app { my ($app, $opts_href, $state_href, $line) = @_ ; my $ok = 1 ; if ($state_href->{'vars'}{'state'} eq 'start') { if ($line =~ m/<!-- Generated by Doxygen/i) { $ok = 0 ; $state_href->{'vars'}{'state'} = 'doxy-head' ; } } elsif ($state_href->{'vars'}{'state'} eq 'doxy-head') { $ok = 0 ; if ($line =~ m/<div class="contents">/i) { $ok = 1 ; $state_href->{'vars'}{'state'} = 'contents' ; } } elsif ($state_href->{'vars'}{'state'} eq 'contents') { if ($line =~ m/<hr size="1"><address style="text-align: right;"><small>Generated/i) { $ok = 0 ; $state_href->{'vars'}{'state'} = 'doxy-foot' ; } } elsif ($state_href->{'vars'}{'state'} eq 'doxy-foot') { $ok = 0 ; if ($line =~ m%</body>%i) { $ok = 1 ; $state_href->{'vars'}{'state'} = 'end' ; } } # only output if ok to do so $state_href->{'output'} = $line if $ok ; } #================================================================================= # SETUP #================================================================================= __DATA__ [SUMMARY] Filter Doxygen created html removing frames etc. [DESCRIPTION] B<$name> does some stuff.
The script takes in HTML of the form:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html><head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> <title>rctu4_test: File Index</title> <link href="doxygen.css" rel="stylesheet" type="text/css"> <link href="tabs.css" rel="stylesheet" type="text/css"> </head><body> **<!-- Generated by Doxygen 1.5.5 --> **<div class="navigation" id="top"> ** <div class="tabs"> ** <ul> .. ** </div> **</div> <div class="contents"> <h1>File List</h1>Here is a list of all files with brief descriptions:<table> <tr><td class="indexkey">src/<a class="el" href="rctu4__tests_8c.html">rctu4_tests.c</a></td><td class="indexvalue"></td></tr> <tr><td class="indexkey">src/common/<a class="el" href="ate__general_8c.html">ate_general.c</a></td><td class="indexvalue"></td></tr> ... <tr><td class="indexkey">src/tests/<a class="el" href="test__star__daisychain__specific_8c.html">test_star_daisychain_specific.c</a></td><td class="indexvalue"></td></tr> <tr><td class="indexkey">src/tests/<a class="el" href="test__version__functions_8c.html">test_version_functions.c</a></td><td class="indexvalue"></td></tr> </table> </div> **<hr size="1"><address style="text-align: right;"><small>Generated on Fri Jun 5 13:43:31 2009 for rctu4_test by **<a href="http://www.doxygen.org/index.html"> **<img src="doxygen.png" alt="doxygen" align="middle" border="0"></a> 1.5.5 </small></address> </body> </html>
And removes the lines beginning '**'.
The script does in-place updating of the HTML files and can be run as:
filter-script *.html
This extension adds the following additional command line options to any application:
Do not process empty lines (lines that contain only whitespace)
Remove spaces from start and end of lines
Remove any comments from the line, starting from the comment string to the end of the line
Read file, process, then overwrite original input file with processed output
Write file(s) into specified directory rather that into same directory as input file
Specify the output filename, which may include variables (see "Output Filename")
Specify the comment start string. Used in conjuntion with "-trim_comment".
This extension sets the following additional command line arguments for any application:
Specify one of more input files to be processed. If no files are specified on the command line then reads from STDIN.
Note that the fields match with the command line options.
Store output lines into a buffer, then write out file at end of processing
Specify the comment start string. Used in conjuntion with "trim_comment".
Read only. File handle of current output file.
Create a new App::Framework::Extension::Filter.
The %args are specified as they would be in the set method, for example:
'mmap_handler' => $mmap_handler
The full list of possible arguments are :
'fields' => Either ARRAY list of valid field names, or HASH of field names with default values
Initialises the object class variables.
Filter the specified file(s) one at a time.
Application interface for writing out extra lines
Start of output file
Write out line (if required)
End of output file
Open the file (or STDOUT) depending on settings
Close the file if open
Setting the debug flag to level 1 prints out (to STDOUT) some debug messages, setting it to level 2 prints out more verbose messages.
Steve Price <sdprice at cpan.org>
<sdprice at cpan.org>
None that I know of!
To install App::Framework, copy and paste the appropriate command in to your terminal.
cpanm
cpanm App::Framework
CPAN shell
perl -MCPAN -e shell install App::Framework
For more information on module installation, please visit the detailed CPAN module installation guide.