The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=head1 NAME

makepp_signatures -- How makepp knows to rebuild files

=head1 DESCRIPTION

Each file is associated with a I<signature>, which is a number or string that
changes if the file has changed.  Makepp compares signatures to see whether it
needs to rebuild anything.  The default signature for files is the file's
modification time, unless you're executing a C/C++ compilation command, in
which case the default signature is a cryptographic checksum on the file's
contents, ignoring comments and whitespace.  If you want, you can switch to a
different method, or you can define your own signature functions.

In addition to the file's signature, it is also possible to control how makepp
compares these signature values.  For example, the C<exact_match> method
requires that file signatures be exactly the same as on the last build,
whereas the C<target_newer> method only requires that all dependencies be
older than the target.

If makepp is building a file, and you don't think it should be, you might want
to check the build log (C<.makepp_log>).  Makepp writes an explanation of what
it thought each file depended on, and why it chose to rebuild.

At present, there are four signature checking methods included in
makepp.  Usually, makepp picks the most appropriate signature method
automatically.  However, you can change the signature method for an
individual rule by using C<:signature> modifier on the rule (see
L<makepp_rules/signature>), or for all rules in a makefile by using the
C<signature> statement (see L<makepp_statements/signature>, or for all
makefiles at once using the C<-m> or C<--signature-method> command line
option (see L<makepp_command/-m>).

=head2 Signature methods included in the distribution

=over 4

=item exact_match

This method uses the modification dates on the file as signatures.  It
rebuilds the targets unless all of the following conditions are true:

=over 4

=item *

The signature of each dependency is the same as it was on the
last build.

=item *

The signature of each target is the same as it was on the last
build.

=item *

The build command has changed.

=item *
The default directory for the build command has changed.

=item *

The machine architecture (or what perl thinks it is) has
changed.

=back

Makepp stores all the signature information and the build command from
the last build, so that it can do these comparisons.  (All the stored
information is in the subdirectory .makepp of each directory.)

This is makepp's default algorithm unless it is trying to rebuild a
makefile or compile C/C++ code.  This is a highly reliable way of
ensuring correct builds, and is almost always what you want.  However,
it does have a few side effects that may be surprising:

=over 4

=item *

If you've been compiling with the traditional make, and then switch to
makepp, everything is recompiled the first time you run makepp.

=item *

If you damage makepp's information about what happened on the last build
(e.g., you delete the subdirectory C<.makepp>, or don't copy it when you
copy everything else), then a rebuild is triggered.

-item *

If you replace a file with an older version, a rebuild is triggered.
This is normally what you want, but it might be surprising.

=item *

If you modify a file outside of the control of makepp (e.g., you run the
compilation command yourself), then makepp will rebuild the file next
time.

=item *

Architecture-independent files are rebuild when you switch to a
different architecture.  This is usually not a problem, because they
often don't take long to build.  The reason why all files are tagged
with the architecture, instead of just binary files, is that often times
even ASCII files are architecture-dependent.  For example, output from
the solaris C<lex> program won't compile on linux (or at least this used
to be true the last time I tried it).

=back

=item target_newer

Rebuilds only if the target is newer than all of its dependencies.  The
dependencies may change their time stamp, but as long as they are older
than the target, the target is not rebuilt.  The target is also not
rebuilt even if the command or the architecture has changed.  (This is
the signature method that the traditional make uses.)

This is makepp's default algorithm if it is trying to build the makefile
before reading it in.  (It loads the makefile and checks for a rule
within the makefile to rebuild itself, and if such a rule is present and
the makefile needs rebuilding, it is rebuild and then reread.)  This is
because it is common to modify a makefile using commands that are not
under the control of makepp, e.g., running a configure procedure.  Thus
makepp doesn't insist that the last modification to the makefile be made
by itself.

Using C<target_newer> compared to C<exact_match> has the following
disadvantages:

=over 4

=item *

makepp can be confused by clock synchronization or by bogus dates.  For
example, if a file somehow gets a date in the far future, anything that
depends on it will always be rebuilt, no matter what.

=item *

Replacing a file with an older version of the same file won't trigger a
rebuild.

=item *

Changing the build command (e.g., changing compilation options) won't
trigger a rebuild.

=item *

Changing the architecture (e.g., switching from linux to
solaris) won't trigger a rebuild.

=back

=item md5

This is the same as C<exact_match>, except that instead of using the
file date as the signature, an MD5 checksum of the files contents is
used.  This means that if you change the date on the file but don't
change its contents, makepp won't try to rebuild anything that depends
on it.

This is particularly useful if you have some file which is often
regenerated during the build process that other files depend on, but
which usually doesn't change.  If you use the C<md5> signature checking
method, makepp will realize that the file's contents haven't changed
even if the file's date has changed.  (Of course, this won't help if the
files have a timestamp written inside of them, as archive files do for
example.)

For C/C++ source code, you should use C<c_compilation_md5> instead since
it achieves the same thing but in a more powerful way.

=item c_compilation_md5

This is the same as C<exact_match>, except that signatures for files
which look like C or C++ source files are computed by an MD5 checksum of
the file, ignoring comments and whitespace.  (Technically, comments are
replaced by a single space, and multiple whitespace characters are
collapsed to a single space, before computing the MD5 checksum.)
Ordinary file times are still used for signatures for object files, and
any other files that don't have an extension typical of a C or C++
source file.  (A file is considered to be source code if it has an
extension of C<c>, C<h>, C<cc>, C<hh>, C<cxx>, C<hxx>, C<hpp>, C<cpp>,
C<h++>, C<c++>, C<moc>, or upper case versions of these.)  If you use
this, you can reindent your code or add or change comments without
triggering a rebuild.


This method is particularly useful for the following situations:

=over 4

=item *

You want to make changes to the comments in a commonly included header
file, or you want to reformat or reindent part of it.  For one project
that I worked on a long time ago, we were very unwilling to correct
inaccurate comments in a common header file, even when they were
seriously misleading, because doing so would trigger several hours of
rebuilds.  With this signature method, this is no longer a problem.

=item *

You like to save your files often, and your editor (unlike emacs) will
happily write a new copy out even if nothing has changed.

=item *

You have C/C++ source files which are generated automatically by other
build commands (e.g., yacc or some other preprocessor).  For one system
I work with, we have a preprocessor which (like yacc) produces two
output files, a C<.cxx> and a C<.h> file:

    %.h %.cxx: %.qtdlg $(HLIB)/Qt/qt_dialog_generator
    	$(HLIB)/Qt/qt_dialog_generator $(input)

However, most of the time when the input file changes, the resulting
C<.h> file contents are unchanged (except for a comment about the build
time written by the preprocessor), although its date will change.  This
could trigger unnecessary rebuilds of many modules without this kind of
cryptographic signature checking.

=back

This is the default signature method for C or C++ compilation.  It
overrides any default specified with the C<-m> or C<--signature-method>
command line option, but is overridden by any signature method specified
by the C<signature></a> statement or the C<:signature> rule modifier.
Makepp determines that you are doing a C/C++ compilation if it
recognizes your command line as an invocation of a C/C++ compiler (see
L<makepp_scanning>).

=back

=head2 Custom methods

You can, if you want, define your own methods for calculating file
signatures and comparing them.  You will need to write a perl module to
do this.  Have a look at the comments in C<Signature.pm> in the
distribution, and also at the existing signature algorithms in
C<Signature/*.pm> for details.