The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=head1 NAME

makepp_scanning -- How makepp finds include files and other hidden dependencies

=for vc $Id: makepp_scanning.pod,v 1.21 2012/02/07 22:26:15 pfeiffer Exp $

=head1 DESCRIPTION

Makepp can determine additional dependencies or targets for certain commands
that it knows something about.  This is especially important for C/C++
compilation, where it is too error-prone to list manually all of the include
files that a given source file depends on.  By looking at the compilation
command and the source files themselves, makepp is able to determine
accurately which object files need to be rebuilt when some include file
changes.

Example: Given a rule

    foo.o:			# Usually %.o: %.c, just for illustration
 	time -p /bin/libtool -bar /usr/bin/cc -c -I somewhere foo.c

makepp knows that C<time> and C<libtool> must be skipped and that C<cc> is the
actual command to be parsed here.  It understands that F<foo.c> is the input
file and thus a dependency of this rule.  Moreover it will scan that file
looking for include statements, also in directory F<somewhere>, because it
understood the command options.

Actually there are three steps to what is historically known as scanning:

=over

=item 1.

The rule action is split into lines (continuation lines count as one).  Each
line (except builtins and Perl blocks) is B<lexically analyzed> as one or more
Shell commands.  Redirections are recognized as inputs or outputs to this
rule.  The first word of each command is looked up (with its directory part
but, if not found, again without it) to find a parser for it.  These become
optional dependencies, they are built if possible, but ignored if not found,
as makepp can't know which part of a complex command is actually run.

Commands in backquotes are analyzed but not executed.  (Often execution is
important, but this would be a major interference by makepp.)  It is better
style to avoid them.  Instead have makepp run the command at most once by
assigning it in this special way:

    XYZFLAGS ;= $(shell pkg-config --cflags xyz)

Currently there is only one lexer class, which understands Bourne Shell.  To
better handle C Shell or C<command.com>, subclasses might be created.  However,
much syntax is similar enough to not warrant this.  Get in touch if you want
to contribute either.

=item 2.

For known commands the corresponding B<command parser> (also referred to just
as parser) analyzes the important options and arguments.  The available ones
are L<described below|/SCANNERS (PARSERS)>.

Even if no specialized parser was found, the generic one makes the command
executable an input of this rule.  You can change that with the
L<--no-path-executable-dependencies command
option|makepp_command/no_path_exe_dep>.

=item 3.

If the parser recognized any input files, they get sent to the B<scanner>
chosen by the parser.  It finds further inputs by looking for C<#include> or
comparable statements.

This is the most expensive step.  All the results get cached to avoid
repeating it unnecessarily.

=back

If makepp thinks it's compiling a C/C++ source but can't find a parser, it
will give a warning message to let you know.  This usually means that you
buried your compiler command too deeply in the action for makepp to find it.
For example, I have seen rules like this:

    %.o: %.c
 	@echo Compiling $< now; obscure_wrapper gcc -c $< $(CFLAGS) -o $@

The first words of the actions here are C<echo> and C<obscure_wrapper>, for
which there are no parsers, so makepp will not scan for include files in
this case.  You can ignore the prefixed command by:

    register-parser obscure_wrapper skip-word

The following sections document the built in parsers and scanners.  In the
name you can use C<-> interchangeably with C<_>.

=head1 SCANNERS (PARSERS)

The various scanners must be chosen by a command parser, which is given in
parentheses:

=head2 C/C++ compilation (c-compilation, gcc-compilation)

The C/C++ scanner, handles both languages indifferently.  In fact it looks
only at preprocessor statements, so it can be used for quite a few languages.
The parser that activates it has a special variant for gcc's many options,
which gets chosen if the command name includes the string C<gcc> or g++.  If
compilers for other languages with C preprocessor use the same options as the
C compiler (at least C<-I>) then this parser works fine.

It looks at the command for C<-Idir> options specifying the include path
or C<-Ldir> options specifying the link path.  It then scans any source
files for C<#include> directives, and also looks at the command line to
see if there are any source files or libraries mentioned which are not
listed as dependencies.  It recognizes these by their extension.

This scanner gives a warning message if files included with S<C<#include
"file.h">> are not found, or not buildable by makepp, in the include path, or
in the directory containing the file which is C<#includ>ing, or in
F</usr/include>.  No warning is given if a file included with S<C<< #include
<file.h> >>> is not found.  Makepp assumes it is in some system include
directory that the compiler knows about, and that files in system include
directories won't change.

In addition, files in F</usr/include>, F</usr/local/include>,
F</usr/X11R6/include>, and any other directory which is not writable are
not scanned to see what they include.  Makepp assumes that these files
won't change.  (If you're running as root, the writability test is
performed with the UID and GID of the directory you ran makepp from.
This is so compiling a program as an ordinary user and then doing
S<C<makepp install>> as root won't cause extra directories to be scanned.)

This is a fairly simple-minded scanner.  It will get confused if you do
things like this:

    #ifdef INCLUDE_THIS
    #include "this.h"
    #endif

because it doesn't know about preprocessor conditionals.  This is
usually harmless; it might cause additional extra files to be labeled
as dependencies (occasionally causing unnecessary rebuilds), or else it
might cause makepp to warn that the include file was not found.  You can
either ignore the warning messages, or put an empty file C<this.h> out
there to shut makepp up.

If your compiler has a funny name, you can say either of

    register-parser obscure_c_compiler c-compilation
    register-parser obscure_gcc_alias gcc-compilation

=head2 Embedded SQL C/C++ compilation (esql-compilation)

These commands, which come with the various databases, preprocess special
sections in otherwise C/C++-like sources, and produce C/C++ headers and
sources.  This finds EXEC SQL INCLUDE "filename" or $INCLUDE "filename"
directives.

These preprocessors are recognized: Altibase APRE*C/C++ (F<apre>), CASEMaker
DBMaker (F<dmppcc>), Firebird / InterBase (F<gpre>), IBM DB2 (F<db2
precompile, db2 prep>) & Informix ESQL/C (F<esql>), Ingres (F<esqlc>), Mimer
(F<esql>), Oracle (F<proc>), PostgreSQL (F<ecpg>) & YARD (F<yardpc>).  If your
preprocessor is not recognized, you can say

    register-parser obscure_esqlc_preprocessor esql-compilation

This will however only handle the style common to Informix and others: Command
arguments ending in C<.ec> are files to be scanned, C<-I> defines the include
path and EXEC SQL INCLUDE directives without a suffix get C<.h> appended.

=head2 Swig (swig)

Swig (Simplified Wrapper and Interface Generator, http://www.swig.org/)
is a program that converts a C/C++ header file into the wrapper
functions needed to make your code callable from a variety of other
languages, such as Perl, Python, Tcl, C#, Ruby, OCaml, and probably some
others that I don't know about.

Makepp understands and parses the swig command line, looking for C<-I> and
C<-l> options.  It also knows how to scan swig's interface definition files
(F<.i> files) looking for C<%include>, C<%import>, and also C<#include>
if C<-includeall> is in effect.

If your swig has a funny name, you can say

    register-parser obscure_swig_alias swig

=head2 Vera and Verilog (vcs_compilation)

If you design hardware, this will come in handy.

=head2 Ignorable wrappers (skip-word, shell)

Makepp recognizes the following command words and many more and skips over them
appropriately in in its search for the correct scanner:
C<condor_compile>, C<distcc>, C<ignore_error>, C<libtool>, C<noecho>
C<purify>.

There is a variant of this which finds the nested commands in C<sh -c
'command1; command2'>.

If you have more such commands, you can say

    register-parser command skip-word

=head3 Libtool

Libtool is a very clever compilation system that greatly simplifies
making shared libraries by hiding all the system-dependent details away
in a shell script.  The only difficulty is that the library binary files
are not actually stored in the same directory as the output
file--libtool actually creates a subdirectory, C<.libs>, which contains
the real files.  This is ordinarily not a problem, but makepp has to
know where the real binaries are if it is to link them in from a
repository.  At the moment, libtool libraries (C<.la> files) are not
linked in from repositories; they are always rebuilt if needed.  Also,
makepp at the moment is not able to use the dependency information that
is stored inside the C<.la> file itself.  This will hopefully change
soon.

=head2 Suppressing the scan (none)

Sometimes you may not want a rule or a certain command to be parsed.  You can
turn off parsing and thereby scanning with

    register-parser cc none

=head1 RELATED OPTIONS

=head2 Quickscan and smartscan

The C<:quickscan> and C<:smartscan> rule options, if applicable, affect the way
that files are scanned.

In C<:quickscan> mode (the default), all include directives are assumed
active. This allows for very efficient scanning.

In C<:smartscan> mode, an attempt is made to interpret macros and expressions
so that inactive include directives are ignored.
For example, the executable produced by compiling the following C program
ought I<not> to depend on F<foo.h>:

    #if 0
    #include "foo.h"
    #endif
    int main() { return 0; }

=head1 CUSTOM SCANNERS

You can specify your own parser either in a L<rule
option|makepp_rules/parser_parser> like C<:parser foo>, or by using the
L<C<register_parser> or
C<register_command_parser>|/register_command_parser_command_word_parser>
statements.

Either way, as described under
L<C<register_parser>|/register_command_parser_command_word_parser>, there you
must directly or indirectly (via a class) specify a function that creates a
parser object.  This object will usually create a scanner object for files,
and feed it with its findings from the command line options.  These two
objects will call the parser's C<add_*_dependency> methods which forward the
information to the somewhat more complicated C<Mpp::Lexer::add_*_dependency>
utility functions.

However your parser function can also do this work itself for simple cases.
There are a couple of special return values if this function doesn't return a
parser object:

=over

=item C<undef>

The scan info is not cacheable and must be recalculated next time the rule's
target needs to be built.

=item C<p_none, p_skip_word> or C<p_shell>

These are in fact numeric constants, which tell the lexer to do the work of
these pseudo-parsers.

=item any reference, e.g. C<\1>

This is equivalent to returning a parser object of the C<Mpp::CommandParser>
base class, which will only additionally make the command executable itself a
dependency.

=back

In most cases, objects of type C<Mpp::CommandParser> should instantiate at least
one object of type C<Mpp::Scanner>.
The C<Mpp::Scanner> base class takes care of the distinction between quickscan
and smartscan.
Note that the behavior of C<Mpp::Scanner> can be markedly affected by this
distinction, but that should be transparent to the derived class if it
is well-formed.
New derived C<Mpp::Scanner> classes ought to be tested in both modes.

If you write your own C<Mpp::Scanner> class, you should also base your rescanning
decision on the build info C<RESCAN>.  This gets set by C<makeppreplay> after
signing files without scanning.  So despite the signatures being consistent, a
rescan is still necessary.  If your C<Mpp::Scanner> uses the inherited
C<scan_file1> method, you're probably fine.

For more details, refer to the respective class documentation.  For examples,
see C<Mpp::CommandParser::Gcc> and C<Mpp::CommandParser::Vcs>.  Look at the
C<p_> functions in F<Mpp/Subs.pm> which get aliased into their respective
classes as C<factory> when loaded.



=head2 Caching scanner info

If the all of the scanner's important side effects are effected through calls
to methods of the C<Mpp::CommandParser> base class, then those side effects can
be cached in the build info file, so that they can be played back by a
subsequent invocation of makepp without doing all of the costly scanning work.
This can save quite a bit of time, especially in smartscan mode.

If the scanner has other important side effects, then it should call the
C<Rule> object's mark_scaninfo_uncacheable method.  Otherwise, the scanner
info retrieved from the build info may be inaccurate, causing the build result
possibly to be incorrect.  This method is called automatically when a value
from the %parsers hash does not return an object of type
C<Mpp::CommandParser>, or when the parser is specified with a rule option and
the C<p_*> routine does not return an object of type C<Mpp::CommandParser>.

Cached scan info is invalidated using criteria similar to those used for
determining when the target is out of date.
Similarly, it can be retrieved from a repository using criteria similar
to those used for determining when a target can be linked in from a
repository.

You can force makepp to ignore the cached scanner info with the
C<--force-rescan> option.
This is useful when a broken scanner may have caused incorrect scanner info
to be cached.

=head2 Ad Hoc Scanner

Often you will have just one or few files which contain dependency
information.  You don't want to write this into a makefile redundantly (since
redundancy later often leads to inconsistencies when one update gets
forgotten).  But you also don't want to write a Mpp::Scanner?  As a workaround you
can generate an include file on the fly.  For example Qt has F<.qrc> files
which can look like:

    <RCC>
      <qresource prefix="...">
 	<file>abc</file>
 	<file>xyz</file>
 	...

If you adhere to the above layout, you can transform the relevant lines into a
makepp include file, which gets automatically created by being included.

    %.qrc.makepp: %.qrc
 	&grep 's!<RCC>\n!$(stem).cc:! || s! *<file>! ! && s!</file>\n!!' $(input) -o $(output)

    include $(wildcard *.qrc)	# .makepp is appended automatically

The drawback is that you begin building while reading the makefile.  So the
L<--stop-after-loading|makepp_command/stop> command option will not be very
useful.