The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=head1 NAME

makepp_sandboxes -- How to partition a makepp build

=for vc $Id: makepp_sandboxes.pod,v 1.4 2010/07/16 21:15:23 pfeiffer Exp $

=head1 DESCRIPTION

=for genindex '[-$][-?\w]+' makepp_sandboxes.pod

B<D:>E<nbsp>L<--do-build|/dont_build_path>,
  L<--dont-build|/dont_build_path>,
  L<--dont-read|/dont_read_path>,
  L<--do-read|/dont_read_path>,E<nbsp>
B<I:>E<nbsp>L<--in-sandbox|/sandbox_path>,
  L<--inside-sandbox|/sandbox_path>,E<nbsp>
B<O:>E<nbsp>L<--out-of-sandbox|/sandbox_path>,E<nbsp>
B<S:>E<nbsp>L<--sandbox|/sandbox_path>,
  L<--sandbox-warn|/sandbox_warn>,
  L<--sandbox-warning|/sandbox_warn>,E<nbsp>
B<V:>E<nbsp>L<--virtual-sandbox|/virtual_sandbox>

There are a couple of reasons that you might want to partition the file tree
for a makepp build:

=over 4

=item 1.

If you know that the majority of the tree is not affected by any changes made
to source files since the previous build, then you can tell makepp to assume
that files in those parts of the tree are already up-to-date, which means not
even implicitly loading their makefiles, let alone computing and checking
their dependencies.  (Note that explicitly loaded makefiles are still loaded,
however.)

=item 2.

If you have multiple makepp processes accessing the same tree, then you want
to raise an error if you detect that two concurrent processes are writing the
same part of the tree, or that one process is reading a part of the tree that
a concurrent process is writing.  Either way, you have a race condition in
which the relative order of events in two concurrent processes (which cannot
be guaranteed) may affect the result.

=back

Makepp has sandboxing facilities that address both concerns.

=head2 Sandboxing Options

The following makepp options may be used to set the sandboxing properties
of the subtree given by I<path> and all of its files and potential files:

=over 4

=item --dont-build I<path>

=item --do-build I<path>

Set or reset the "dont-build" property.  Any file with this property set is
assumed to be up-to-date already, and no build checks will be performed.  The
default is reset (i.e. "do-build"), except if you have a C<RootMakeppfile>, in
which case everything outside of its subtree id "dont-build".

=item --sandbox I<path>

=item --in-sandbox I<path>

=item --inside-sandbox I<path>

=item --out-of-sandbox I<path>

Set or reset the "in-sandbox" property.  An error is raised if makepp would
otherwise write a file with this property reset.  Build checks are still
performed, unless the "dont-build" property is also set.  The default is set
(i.e. "in-sandbox"), unless there are any B<--sandbox> options, in which case
the default for all other files is reset (i.e. "out-of-sandbox").

=item --sandbox-warn

=item --sandbox-warning

Downgrade violations of "in-sandbox" and "dont-read" to warnings instead of
errors.  This is useful when there are hundreds of violations, so that you can
collect all of them in a single run and take appropriate corrective action.
Otherwise, you see only one violation per makepp invocation, and you don't
know how many are left until they're all fixed.

=item --dont-read I<path>

=item --do-read I<path>

Set or reset the "dont-read" property.  An error is raised if makepp would
otherwise read a file with this property set.  The default is reset
(i.e. "do-read").

=item --virtual-sandbox

Don't rewrite build infos of files that were not created by this makepp process.
This is useful when running concurrent makepp processes with overlapping
sandboxes, and you are certain that no two processes will attempt to build
the same target.  Makepp will then refrain from caching additional information
about files that it reads, because there might be other concurrent readers.

=back

Each of these 3 properties applies to the entire subtree, including to files
that do not yet exist.  More specific paths override less specific paths.  A
specified path may be an individual file, even if the file does not yet exist.

If a property is both set and reset on the exact same path, then the option
that appears furthest to the right on the command line takes precedence.

=head2 Sandboxing for Acceleration

If you want to prevent makepp from wasting time processing files that you
know are already up-to-date (in particular, files that are generated by a
build tool other than makepp), then B<--dont-build> is the option for you.

By far the most common case for such an optimization is that you know that
everything not at or below the starting directory is already up-to-date.
This can be communicated to makepp using "B<--dont-build /. --do-build .>".

=head2 Sandboxing for Concurrent Processes

One technique that can reduce build latency is to have multiple makepp
processes working on the same tree.  This is quite a bit more difficult to
manage than using the B<-j> option, but it can also be substantially more
effective because:

=over 2

=item *

With sandboxing, the processes may be running on multiple hosts, for example,
via a job queuing system.  Increasing the B<-j> limit eventually exhausts the
CPU resources of a single host, and can even slow the build due to excessive
process forking.

=item *

B<-j> does not currently parallelize some of makepp's time-consuming tasks
such as loading makefiles, scanning, building implicit dependencies while
scanning, and checking dependencies.

=back

The biggest risk with this approach is that the build can become
nondeterministic if processes that might be concurrent interact with one
another.  This leads to build systems that produce incorrect results
sporadically, and with no simple mechanism to determine why it happens.

To address this risk, it is advisable to partition the tree among concurrent
processes such that if any process accesses the filesystem improperly, then an
error is deterministically raised immediately.  Normally, this is accomplished
by assigning to each concurrent process a "sandbox" in which it is allowed to
write, where the sandboxes of no two concurrent processes may overlap.

In addition, each process marks the sandboxes of any other possibly concurrent
processes as "dont-read."  If a process reads a file that another concurrent
process is responsible for writing (and which therefore might not yet be
written), then an error is raised immediately.

=head2 Sandboxing for Sequential Processes

When the build is partitioned for concurrent makepp processes, there is also
usually a sequential relationship between various pairs of processes.  For
example, there may be a dozen concurrent compile processes, followed by a
single link process that cannot begin until all of the compile processes have
completed.  Such sequential relationships must be enforced by whatever
mechanism is orchestrating the various makepp processes (for example, the job
queuing system).

When processes have a known sequential relationship, there is normally no need
to raise an error when they access the same part of the tree, because the
result is nonetheless deterministic.

However, it is generally beneficial to specify B<--dont-build> options to the
dependent process (the link process in our example) that notify it of the
areas that have already been updated by the prerequisite processes (the
compile jobs in our example).  In this manner, we avoid most of the
unnecessary work of null-building targets that were just updated.

=head1 AUTHOR

Anders Johnson (anders@ieee.org)