The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::Globstar - Globbing With Double Asterisk Expansion

SYNOPSIS

    use File::Globstar qw(globstar fnmatchstar);

    @files = globstar '**/*.css';
    @files = globstar 'css/**/*.css';
    @files = globstar 'scss/**';

    print "Match!\n" if fnmatchstar '*.pl', 'hello.pl';
    print "Case-insensitive match!\n" 
        if fnmatchstar '*.pl', 'Makefile.PL', ignoreCase => 1;
    
    $re = File::Globstar::translatestar('**/*.css');

    $safe_pattern = quotestar $config_srcdir;

DESCRIPTION

Shortcut: If you want to implement file inclusion or exclusion in the style of .gitignore, have a look at File::Globstar::ListMatch.

Many globbing implementations have been recently extended to accept the pattern ** in place of a path element. The double asterisk matches all files and directories in the current directory and all of its descendants including the current directory itself (in other words: the directory . is included in the match, the parent directory .. is not).

The convention is especially popular in the Node.js ecosystem and is also used by git when evaluating ignore patterns (see https://git-scm.com/docs/gitignore).

FUNCTIONS

globstar PATTERN[, DIRECTORY]

Return all files and directories matching PATTERN in DIRECTORY. DIRECTORY defaults to the current directory if empty or undefined.

An invalid PATTERN matches nothing and globstar() returns an empty list for it.

The function only expand the sequence "**" (at the appropriate) positions. All other heavy lifting is done by File::Glob::bsd_glob() (which is also the backend for the standard Perl glob operator <HANDLE>.

Currently, only the (forward!) slash is accepted as a path separator!

fnmatchstar PATTERN, STRING[, OPTIONS]

Returns 1 if STRING matches PATTERN, false otherwise. If a Perl truthy value is passed as the optional third argument, case is ignored for the match.

OPTIONS is an optional hash of named arguments. The only supported option is "ignoreCase" at the moment.

Invalid PATTERNs never match.

Unlike globstar(), the function does not rely on File::Glob and is implemented entirely in Perl. The semantics of PATTERN are as follows:

*

A single asterisk stands for zero or more arbitrary characters except for the slash /.

?

The question mark stands for exactly one character except for the slash /.

**

A double asterisk stands for an arbitrary sequence of 0 or more characters. It is only allowed when preceded by either the beginning of the string or a slash. Likewise it must be followed by a slash or the end of the pattern.

[RANGE], [!RANGE]

A character range or a negated character range that is preceded by an exclamation mark. A range cannot be empty.

The following features are supported:

CHARACTER

By default, all CHARACTERs stand for themselves.

CHARACTER1-CHARACTER2

All characters from CHARACTER1 to CHARACTER2 according to the collation valid for the currently selected locale.

[:CLASS:]

A character class like "[:print:]", "[:upper:]", "[:lower:]". Example:

    [0-9[:lower:]]

This pattern stands for exactly one character, either one of the ASCII digits 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9 or any lowercase character (according to the current locale).

\CHARACTER

Any character with a special meaning can be escaped with a backslash.

You can include a hyphen in a range by using it as the last character. You can include a closing square bracket by using it as the first character. This is implied by the above rules.

Example:

    []-]

This pattern matches either a closing square bracket "]" or a hyphen "-".

Note that POSIX collation classes (for example "[.ch.]") and POSIX equivalence classes (for example "[=a=]") are not supported. Rationale: bsd_glob() does not support them and in Perl regular expression character classes they are currently not supported and their usage throws an exception.

pnmatchstar PATTERN, STRING[, OPTIONS]

Returns 1 if STRING matches PATTERN, false otherwise.

Similar to fnmatchstar() but PATTERN and STRING are subject to some pre-processing and matching is eventually done "ascendingly". pnmatchstar() is the function underneath File::Globstar::ListMatch. You can think of it as a matching engine for one pattern (line) in a .gitignore or similar file.

PATTERN is either a (scalar) string that will be preprocessed and then compiled with translatestar() or a compiled regular expression. In the latter case, the compilation step is skipped. You should pass a regular expression that was returned by a translatestar() call with the option pathMode.

The pre-processing of PATTERN goes through the steps outlined below. The intermediate results for the input STRING "!/src/**/.deps/" is added in parentheses.

* If PATTERN begins with an exclamation mark, it is discarded, and the PATTERN is marked as negated (current "/src/**/.deps/").
* If PATTERN ends with a slash, it is discarded, and the PATTERN is marked as a "directory match" (current "/src/**/.deps").
* If PATTERN begins with a slash, it is discarded, and the PATTERN is marked as a "full path match" (current "src/**/.deps").
* If PATTERN is not already marked as a "full path match", every remaining slash will activate "full path match" (no modification in this last step).

Note that the pre-processing step is skipped, if you pass a compiled regular expression instead of a (scalar) string for PATTERN. And to be honest, pnmatchstar() does not really do the pre-processing. It is in fact done by translatestar() when called with the option pathMode. See below for details.

The STRING argument (for example "lib/File/Globstar/") is always pre-processed:

* A possible trailing slash is discarded, and the STRING is then marked as representing a "directory". The example string would now be "lib/File/Globstar" without the trailing slash.
* If full path mode is NOT active (either because of a leading or innner slash in PATTERN), a possible leading "directory portion" of STRING is discarded. The "directory portion" is everything until and including the last slash in STRING. The example string would have been truncated to just "Globstar" by now.

The resulting STRING is then compared to PATTERN. If it matches, and PATTERN was not negated, 1 is returned. Likewise, if it doesn't match and PATTERN was negated.

In "directory mode" (PATTERN ended with a slash), a match also requires that STRING is considered a directory. It either needs a trailing slash or you must pass the option isDirectory with a truthy value.

In full path mode, the comparison continues and "ascends": The "base name" of STRING, that is the last slash and everything following it is discarded, and pnmatchstar() is called recursively, but always with the option isDirectory set to 1 until a match is found. If it never matches, the function ultimately returns false.

You can pass the following optional named arguments to pnmatchstar():

ignoreCase TRUE|FALSE

Compile the regular expression with the "i" modifier so that it matches in a case-insensitive manner.

The option is ignored, when a compiled regular expression was passed as the first argument instead of a scalar PATTERN.

isDirectory TRUE|FALSE

Mark STRING as a directory, if TRUE.

The option is ignored, when a compiled regular expression was passed as the first argument instead of a scalar PATTERN. It is also ignored (forced to TRUE), when a trailing slash was removed from STRING. Rationale: The name of a non-directory can never contain a slash.

The function was introduced in version 0.2.

translatestar PATTERN[, OPTIONS]

Compiles PATTERN into a Perl regular expression or throws an exception in case of failure. This function is used internally by fnmatchstar() and pnmatchstar().

OPTIONS is an optional hash of named arguments:

ignoreCase TRUE|FALSE

If TRUE, return a regular expression with the "i" modifier set, so that case is ignored, when matching.

pathMode TRUE|FALSE

Remove a possible leading exclamation mark from PATTERN and mark the regular expression in that case as negated. If a trailing slash could be removed, the regular expression will be marked as a "directory match". If PATTERN contains more slashes, the regular expression is marked as a "full path match".

Inner slashes are preserved. A leading slash is discarded, and only triggers "full path match" mode.

See pnmatchstar() above or the section "EXAMPLES" below for more details.

The option was intrucded in version 0.2.

quotestar STRING[, NEGATABLE]

Escapes all characters special to globstar patterns. This are:

\\ the backslash
* the asterisk/star
[ the opening square bracket
] the closing square bracket (actually unneeded but ...)

If the optional argument NEGATABLE is truthy, a leading exclamation mark will also be backslash-escaped.

EXAMPLES

We start with some examples for the double asterisk pattern.

**/*.css

All files matching the name *.css in the current directory and all its descendants, for example main.css, styles/body.css, styles/body/footer.css, styles/footer/whatever.css and so on.

src/**

The directory src and all files and directories underneath src, for example src/a, src/b, src/a/x/y, and so on.

src/**/*.c

All *.c files in the directory src or one if its descendants, for example src/file.c, src/a/file.c, src/b/file.c, src/a/x/file.c, and so on.

Recursively find all Perl source files in lib:

    @sources = globstar 'lib/**/*.p[lm]';

Is hello.pl in the current directory a Perl source file?

    fnmatchstar '*.p[lm]', 'hello.pl';

Is src/simple/hello.pl a Perl source file?

    fnmatchstar '*.p[lm]', 'src/simple/hello.pl';
    require Carp;
    Carp::croak("bug alert!");

The above does not work because a star does not match a slash. You need a different pattern:

    fnmatchstar '**/*.p[lm]', 'src/simple/hello.pl';
    fnmatchstar 'src/**/*.p[lm]', 'src/simple/hello.pl';

Both patterns work for the example but the second pattern only matches in directory 'src' and its descendants.

With pnmatchstar() you have more options:

    pnmatchstar '*.p[lm]', 'src/simple/hello.pl';

Returns 1 because src/simple is discarded for matching. Only the basename portion takes part in the match.

    pnmatchstar '!*.p[lm]', 'src/simple/hello.pl';

Returns false because the pattern is negated.

    pnmatchstar 'src/', 'src/simple/hello.pl';

Returns 1! src/simple/hello.pl does not match. src/simple is tried next but again at no avail. src ultimately matches.

The following example may now be surprising:

    pnmatchstar 'src/', 'src';

Does not match because pnmatchstar() does not consider src a directory. Do this instead:

    pnmatchstar 'src/', 'src/';
    pnmatchstar 'src/', 'src', isDirectory => 1;

Both above examples work, because it is now clear that the string stands for a directory.

Now for full path matching.

    pnmatchstar 'lib/File', 'lib/File/Globstar/ListMatch.pm';

Matches because the leading part lib/File matches. But remember that this does not work:

    pnmatchstar 'File/Globstar, 'lib/File/Globstar/ListMatch.pm';
    pnmatchstar 'File/Globstar/ListMatch.pm, lib/File/Globstar/ListMatch.pm';

Full path mode is full path mode. A partial match is not enough.

If you use the same patterns multiple times, you can cache them:

    $regex = translatestar 'src/**/*.p[lm]', pathMode => 1;
    foreach $file (@files) {
        print "$file matches\n" if pnmatchstar $regex, $file;
    }

You should have realized by now that pnmatchstar() is hard to describe but easy to understand.

GORY DETAILS

The function globstar() is a wrapper around bsd_glob (see File::Glob) that extends the functionality with the double asterisk semantics. A double asterisk is only allowed in the following cases:

**/...

At the beginning of a pattern, when followed by a slash.

.../**/...

Anywhere inside a pattern, when preceded and followed by a slash.

.../**

At the end of a pattern, when preceded by a slash.

**

The pattern ** expands to all files and directories in the current directory with arbitrary nesting level.

All other usages of two consecutive asterisks are considered illegal. An illegal pattern does not match any file. The file foo***bar for example does not match the pattern foo***bar. It matches, however, foo\*\*\*bar.

Of course, all other features of File::Glob::bsd_glob() are supported as well. The module also suffers from the same bugs and incompatibilities.

BUGS AND CAVEATS

Other than the ones currently unknown, be prepared for the following:

COMPATIBILITY

The module assumes that the (forward!) slash "/" is the only valid path separator. Especially the backslash "\\" is only used for escaping and never as a path separator. On the other hand, the underlying function bsd_glob() from File::Glob does support other path separators than a slash.

This behavior may change in the future. The main reason why the backslash is not supported as a path separator is that I have no idea how bsd_glob() behaves exactly, for example under Windows. Patches are welcome!

UNIFORM BEHAVIOR OF GLOBSTAR() AND FNMATCHSTAR()

The functions should theoretically behave accordingly. If globstar() returns a filename for a certain pattern, then fnmatchstar() should return true for the same pattern and the filename passed as a string.

Unfortunately, the two implementations are completely independent, and this cannot be guaranteed. Please file a bug if you find a discrepancy.

CASE-INSENSITIVE FILE SYSTEMS

The behavior of globstar() completely depends on the behavior of the underlying File::Glob. For fnmatchstar() you can pass a third argument specifying whether the match should be done case-insensitively or not.

For real-world scenarios you should be aware that on non-Windows systems, every directory level could behave differently: While "/media/disk1/backup" can be case-sensitive, "/media/disk1/backup/data" could be case-preserving and "/media/disk2/mp3" could be case-insensitive.

COPYRIGHT

Copyright (C) 2016-2017 Guido Flohr <guido.flohr@cantanea.com>, all rights reserved.

SEE ALSO

File::Glob(3pm), File::Globstar::Listmatch(3pm), glob(3), glob(7), fnmatch(3), glob(1), perl(1)