Mart E. Rivilis > File-Canonicalizer-0.11 > File::Canonicalizer

Download:
File-Canonicalizer-0.11.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.11   Source  

NAME ^

File::Canonicalizer - ASCII file canonicalizer

SYNOPSIS ^

   use File::Canonicalizer;

   $aref = [ 'replaced_pattern1', 'replacement1',
             'replaced_pattern2', 'replacement2',
             ... ];

   file_canonicalizer ('input_file','canonical_output_file', '',4,5,6,7,8,9,10, $aref);

DESCRIPTION ^

Sometimes files must be compared semantically, that is their contents, not their forms are to be compared. Following two files have different forms, but contain identical information:

file_A

   First name -        Barack

   Last name  -        Obama

   Birth Date -        1961/8/4

   Profession -        President 

file_B

   last name : Obama
   first name: Barack
   profession: president   # not sure

   Birth Date: 1961/08/04

Some differences between forms of these files are:

Using file_canonicalizer allows one to simplify both of these files, so that they can be compared with each other.

SUBROUTINES ^

file_canonicalizer

   file_canonicalizer ( <input_file>                                   # 1 default is STDIN
                      , <output_file>                                  # 2 default is STDOUT 
                      , remove_comments_started_with_<regular_express> # 3 if empty, ignore comments
                      , 'replace_adjacent_tabs_and_spaces_with_1_space'# 4
                      , 'replace_adjacent_slashes_with_single_slash'   # 5
                      , 'remove_white_characters_from_line_edges'      # 6
                      , 'remove_empty_lines'                           # 7
                      , 'convert_to_lower_cased'                       # 8
                      , 'remove_leading_zeroes_in_numbers'             # 9
                      , 'sort_lines_lexically'                         #10
                      , array_reference_to_pairs_replaced_replacement  #11
   );

All parameters, beginning with the 3rd, are interpreted as Boolean values true or false. A corresponding action will be executed only if its parameter value is true. This means, that each of literals between apostrophes '' can be shortened to single arbitrary character or digit 1-9.

List of parameters can be shortened, that is any amount of last parameters can be skipped. In this case the actions, corresponding skipped parameters, will not be executed.

EXAMPLES ^

Read from STDIN, write to STDOUT and remove all substrings, beginning with '#' :

   file_canonicalizer ('','','#');

Create canonicalized cron table (on UNIX/Linux) in any of equivalent examples:

   file_canonicalizer('path/cron_table','/tmp/cron_table.canonic','#',4,5,'e','empty_lin','',9,'sort');
   file_canonicalizer('path/cron_table','/tmp/cron_table.canonic','#',4,5, 6,    7,       '',9, 10);
   file_canonicalizer('path/cron_table','/tmp/cron_table.canonic','#',1,1, 1,    1,       '',1, 1);

Canonicalization of files 'file_A' and 'file_B', shown in the section "DESCRIPTION":

   file_canonicalizer('file_A','file_A.canonic','#',1,5,1,1,1,1,10, ['\s*-\s*',' : ', '^','<', '$','>']);
   file_canonicalizer('file_B','file_B.canonic','#',1,5,1,1,1,1,10, ['\s*:\s*',' : ', '^','<', '$','>']);

creates two identical files 'file_A.canonic' and 'file_B.canonic':

   <birth date : 1961/8/4>
   <first name : barack>
   <last name : obama>
   <profession : president>

AUTHOR ^

Mart E. Rivilis, rivilism@cpan.org

BUGS ^

Please report any bugs or feature requests to bug-file-canonicalizer@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=File-Canonicalizer. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT ^

You can find documentation for this module with the perldoc command.

   perldoc File::Canonicalizer

You can also look for information at:

LICENSE AND COPYRIGHT ^

Copyright 2013 Mart E. Rivilis.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0).

syntax highlighting: