The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

greple - extensible grep with lexical expression and region handling

SYNOPSIS

greple [*-M*module] [ -options ] pattern [ file... ]

PATTERN
  pattern              'and +must -not ?alternative &function'
  -e pattern           pattern match across line boundary
  -r pattern           pattern cannot be compromised
  -v pattern           pattern not to be matched
  --le pattern         lexical expression (same as bare pattern)
  --re pattern         regular expression
  --fe pattern         fixed expression
  --file file          file contains search pattern
MATCH
  -i                   ignore case
  --need=[+-]n         required positive match count
  --allow=[+-]n        acceptable negative match count
STYLE
  -l                   list filename only
  -c                   print count of matched block only
  -n                   print line number
  -H, -h               do or do not display filenames
  -o                   print only the matching part
  -m n[,m]             max count of blocks to be shown
  -A,-B,-C [n]         after/before/both match context
  --join               delete newline in the matched part
  --joinby=string      replace newline in the matched text by string
  --nonewline          do not add newline character at block end
  --filestyle=style    how filename printed (once, separate, line)
  --linestyle=style    how line number printed (separate, line)
  --separate           set filestyle and linestyle both "separate"
  --format LABEL=...   define line number and file name format
FILE
  --glob=glob          glob target files
  --chdir              change directory before search
  --readlist           get filenames from stdin
COLOR
  --color=when         use terminal color (auto, always, never)
  --nocolor            same as --color=never
  --colormap=color     R, G, B, C, M, Y etc.
  --colorful           use default multiple colors
  --ansicolor=s        ANSI color 16, 256 or 24bit
  --[no]256            same as --ansicolor 256 or 16
  --regioncolor        use different color for inside/outside regions
  --uniqcolor          use different color for unique string
  --random             use random color each time
  --face               set/unset visual effects
BLOCK
  -p                   paragraph mode
  --all                print whole data
  --block=pattern      specify the block of records
  --blockend=s         specify the block end mark (Default: "--\n")
REGION
  --inside=pattern     select matches inside of pattern
  --outside=pattern    select matches outside of pattern
  --include=pattern    reduce matches to the area
  --exclude=pattern    reduce matches to outside of the area
  --strict             strict mode for --inside/outside --block
CHARACTER CODE
  --icode=name         specify file encoding
  --ocode=name         specify output encoding
FILTER
  --if,--of=filter     input/output filter command
  --pf=filter          post process filter command
  --noif               disable default input filter
RUNTIME FUNCTION
  --print=func         print function
  --continue           continue after print function
  --begin=func         call function before search
  --end=func           call function after search
  --prologue=func      call function before command execution
  --epilogue=func      call function after command execution
OTHER
  --norc               skip reading startup file
  --man                display command or module manual page
  --show               display module file
  --require=file       include perl program
  --conceal=type       conceal run time errors
  --persist            continue even after encoding error
  -d flags             display info (f:file d:dir c:color m:misc s:stat)

DESCRIPTION

MULTIPLE KEYWORDS

greple has almost the same function as Unix command egrep(1) but the search is done in a manner similar to Internet search engine. For example, next command print lines those contain all of `foo' and bar' and `baz'.

greple 'foo bar baz' ...

Each word can be found in any order and/or any place in the string. So this command find all of following texts.

foo bar baz
baz bar foo
the foo, bar and baz

If you want to use OR syntax, prepend question (`?') mark on each token, or use regular expression.

greple 'foo bar baz ?yabba ?dabba ?doo'
greple 'foo bar baz yabba|dabba|doo'

This command will print the line which contains all of `foo', `bar' and `baz' and one or more of `yabba', `dabba' or `doo'.

NOT operator can be specified by prefixing the token by minus (`-') sign. Next example will show the line which contain both `foo' and bar' but none of `yabba' or `dabba' or `doo'.

greple 'foo bar -yabba -dabba -doo'

This can be written as this using -e and -v option.

greple -e foo -e bar -v yabba -v dabba -v doo
greple -e foo -e bar -v 'yabba|dabba|doo'

If `+' is placed to positive matching pattern, that pattern is marked as required, and required match count is automatically set to the number of required patterns. So

greple '+foo bar baz'

commands implicitly set the option --need 1, and consequently print all lines including `foo'. If you want to search lines which includes either or both of `bar' and `baz', use like this:

greple '+foo bar baz' --need 2
greple '+foo bar baz' --need +1

FLEXIBLE BLOCKS

Default data block greple search and print is a line. Using --paragraph (or -p in short) option, series of text separated by empty line is taken as a record block. So next command will print whole paragraph which contains the word `foo', `bar' and `baz'.

greple -p 'foo bar baz'

Option --all takes whole file as a single block. So next command find files which contains these strings, and print the all contents.

greple --all 'foo bar baz'

Block also can be defined as pattern. Next command search and print mail header, ignoring mail body text.

greple --block '\A(.+\n)+'

You can also define arbitrary complex blocks by writing script.

greple --block '&your_original_function' ...

MATCH AREA CONTROL

Using option --inside and --outside, you can specify text area the match should be occurred. Next commands search only in mail header and body area respectively. In these case, data block is not changed, then print lines which contains the pattern in the specified area.

greple --inside '\A(.+\n)+' pattern

greple --outside '\A(.+\n)+' pattern

Option --inside/--outside can be used repeatedly to enhance the area to be matched. There are similar option --include/--exclude, but they are used to trim down the area.

Those four options also takes user defined function and any complex region can be used.

LINE ACROSS MATCH

greple search the pattern across the line boundaries. This is especially useful to handle Asian multi-byte text, more specifically Japanese. Japanese text can be separated by newline almost any place in the text. So the search pattern may spread out on multiple lines.

As for ascii word list, space character in the pattern matches any kind of space including newline. Next example will search the word sequence of `foo', `bar' and 'baz', even they spread out to multiple lines.

greple -e 'foo bar baz'

Option -e is necessary because space is taken as a token separator in the bare or --le pattern.

MODULE AND CUSTOMIZATION

User can define default and original options in ~/.greplerc. Next example enables color output always, and define new option using macro processing.

option default --color=always

define :re1 complex-regex-1
define :re2 complex-regex-2
define :re3 complex-regex-3
option --newopt --inside :re1 --exclude :re2 --re :re3

Specific set of function and option interface can be implemented as module. Modules are invoked by -M option immediately after command name.

For example, greple does not have recursive search option, but it can be implemented by --readlist option which accept target file list from standard input. Using find module, it can be written like this:

greple -Mfind . -type f -- pattern

Also dig module implements more complex search. It can be used simple as this:

greple -Mdig --dig .

but this command finally translated into following option list.

greple -Mfind . ( -name .git -o -name .svn -o -name RCS ) -prune -o 
    -type f ! -name .* ! -name *,v ! -name *~ 
    ! -iname *.jpg ! -iname *.jpeg ! -iname *.gif ! -iname *.png 
    ! -iname *.tar ! -iname *.tbz  ! -iname *.tgz ! -iname *.pdf 
    -print --

OPTIONS

PATTERNS

If no specific option is given, greple takes the first argument as a search pattern specified by --le option. All of these patterns can be specified multiple times.

Command itself is written in Perl, and any kind of Perl style regular expression can be used in patterns. See perlre(1) for detail.

Note that multiple line modifier (m) is set when executed, so put (?-m) at the beginning of regex if you want to explicitly disable it.

Order of capture group in the pattern is not guaranteed. Please avoid to use direct index, and use relative or named capture group instead. For example, repeated character can be written as (\w)\g{-1} or (?<c>\w)\g{c}.

STYLES

FILES

COLORS

BLOCKS

REGIONS

CHARACTER CODE

FILTER

RUNTIME FUNCTIONS

For these run-time functions, optional argument list can be set in the form of key or key=value, connected by comma. These arguments will be passed to the function in key => value list. Sole key will have the value one. Also processing file name is passed with the key of FILELABEL constant. As a result, the option in the next form:

--begin function(key1,key2=val2)
--begin function=key1,key2=val2

will be transformed into following function call:

function(&FILELABEL => "filename", key1 => 1, key2 => "val2")

As described earlier, FILELABEL parameter is not given to the function specified with module option. So

-Mmodule::function(key1,key2=val2)
-Mmodule::function=key1,key2=val2

simply becomes:

function(key1 => 1, key2 => "val2")

The function can be defined in .greplerc or modules. Assign the arguments into hash, then you can access argument list as member of the hash. It's safe to delete FILELABEL key if you expect random parameter is given. Content of the target file can be accessed by $_. Ampersand (&) is required to avoid the hash key is interpreted as a bare word.

sub function {
    my %arg = @_;
    my $filename = delete $arg{&FILELABEL};
    $arg{key1};             # 1
    $arg{key2};             # "val2"
    $_;                     # contents
}

OTHERS

ENVIRONMENT and STARTUP FILE

Environment variable GREPLEOPTS is used as a default options. They are inserted before command line options.

Before starting execution, greple reads the file named .greplerc on user's home directory. Following directives can be used.

Environment variable substitution is done for string specified by `option' and `define' directives. Use Perl syntax $ENV{NAME} for this purpose. You can use this to make a portable module.

When greple found __PERL__ line in .greplerc file, the rest of the file is evaluated as a Perl program. You can define your own subroutines which can be used by --inside/outside, --include/exclude, --block options.

For those subroutines, file content will be provided by global variable $_. Expected response from the subroutine is the list of array references, which is made up by start and end offset pairs.

For example, suppose that the following function is defined in your .greplerc file. Start and end offset for each pattern match can be taken as array element $-[0] and $+[0].

__PERL__
sub odd_line {
    my @list;
    my $i;
    while (/.*\n/g) {
        push(@list, [ $-[0], $+[0] ]) if ++$i % 2;
    }
    @list;
}

You can use next command to search pattern included in odd number lines.

% greple --inside '&odd_line' pattern files...

MODULE

You can expand the greple command using module. Module files are placed at App/Greple/ directory in Perl library, and therefor has App::Greple::module package name.

In the command line, module have to be specified preceding any other options in the form of -Mmodule. However, it also can be specified at the beginning of option expansion.

If the package name is declared properly, __DATA__ section in the module file will be interpreted same as .greplerc file content. So you can declare the module specific options there. Functions declared in the module can be used from those options, it makes highly expandable option/programming interaction possible.

Using -M without module argument will print available module list. Option --man will display module document when used with -M option. Use --show option to see the module itself. Option --path will print the path of module file.

See this sample module code. This sample defines options to search from pod, comment and other segment in Perl script. Those capability can be implemented both in function and macro.

package App::Greple::perl;

use Exporter 'import';
our @EXPORT      = qw(pod comment podcomment);
our %EXPORT_TAGS = ( );
our @EXPORT_OK   = qw();

use App::Greple::Common;
use App::Greple::Regions;

my $pod_re = qr{^=\w+(?s:.*?)(?:\Z|^=cut\s*\n)}m;
my $comment_re = qr{^(?:[ \t]*#.*\n)+}m;

sub pod {
    match_regions(pattern => $pod_re);
}
sub comment {
    match_regions(pattern => $comment_re);
}
sub podcomment {
    match_regions(pattern => qr/$pod_re|$comment_re/);
}

1;

__DATA__

define :comment: ^(\s*#.*\n)+
define :pod: ^=(?s:.*?)(?:\Z|^=cut\s*\n)

#option --pod --inside :pod:
#option --comment --inside :comment:
#option --code --outside :pod:|:comment:

option --pod --inside '&pod'
option --comment --inside '&comment'
option --code --outside '&podcomment'

You can use the module like this:

greple -Mperl --pod default greple

greple -Mperl --colorful --code --comment --pod default greple

If special subroutine initialize() is defined in the module, it is called at the beginning with Getopt::EX::Module object as a first argument. Second argument is the reference to @ARGV, and you can modify actual @ARGV using it. See find module as a sample.

HISTORY

Most capability of greple is derived from mg command, which has been developing from early 1990's by the same author. Because modern standard grep family command becomes to have similar capabilities, it is a time to clean up entire functionalities, totally remodel the option interfaces, and change the command name. (2013.11)

SEE ALSO

grep(1), perl(1)

github

Getopt::EX

AUTHOR

Kazumasa Utashiro

LICENSE

Copyright 1991-2018 Kazumasa Utashiro

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.