The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

TITLE

Using optimizer to do Method Munging. (as done in Log::Log4perl::AutoCategorize)

Munging

Munging is a term borrowed from C++, where the technique was used to disambiguate same-named functions in different classes, or with different argument or return types. Munging allowed a C++ front-end to use a C compiler back-end.

 Example:

 public class A {
        int foo();
        int foo(int);
 }
 public class B : public A {
        String foo();
        String foo(String);
 }

The reasons we do it are different, but the operation is the same; it 'alters' the symbol table, in a way thats 'invisible'

OK, but why bother

The normal way of getting a method to behave differently for each caller/client is to parameterize the call, or the object its acting upon; each client tells the method or object what it wants to do.

  # parameterize the ctor -> resulting object
  $sock = IO::Socket::INET->new
    ( PeerPort  => 9999,
      PeerAddr  => inet_ntoa(INADDR_BROADCAST),
      Proto     => udp,
      LocalAddr => 'localhost',
      Broadcast => 1 );

  # alter some socket properties via method
  print $sock->sockopt($opt, $val);
    

But this requires participation of the client, which isnt always convenient. Consider a customer-service case;

  cust> hello, the book I ordered hasnt arrived !
  serv> whats your order number ?
  cust> I dunno
  serv> whats your name ?
  cust> Cant u just look it up ?
  serv> whats your credit-card number ?
  cust> Aww Cripes, wheres my wallet ! 

In this case, what you (the server) want is Caller-ID, feeding an automatic service-order retrieval, which pushes the complete customer bio to the service agent. This way, you can magically do everything the customer wants.

Motivating Application: Logging

I used this technique for Log::Log4perl::AutoCategorize, which is a convenience wrapper around Log::Log4perl. Log::Log4perl provides a powerful, flexible way to redirect and filter logging via config-file.

Log::Log4perl example

    # in your code, load a config-string or file
    Log::Log4perl->init(\ qq{
           log4perl.category.Candy.Twix = DEBUG, Screen
           log4perl.appender.Screen = Log::Dispatch::Screen
           log4perl.appender.Screen.layout = SimpleLayout
           });

    # now use that logger
    package Candy::Twix;

    sub new {
        my $logger = Log::Log4perl->new("Candy::Twix::new");
        $logger->debug("Creating a new Twix bar");
        bless {}, shift;
    }

Notes about above example

new() has its own category, implying that each method would also have its own. This represents a coding burden on the client.

The category must be manually added, even though its known to perl.

Youre wise to keep category consistent with (ie the same as) the package and method.

The config-string has a logging category for Bar.Twix only, not for each of its methods; detailed control of logging is not used here, but is possible later because methods use distinct categories.

Doing things the ':easy' way

    package main;
    use Log::Log4perl qw(:easy);
    Log::Log4perl->easy_init($ERROR);  # Set priority of root logger to ERROR

    package Candy::Snickers;
    use Log::Log4perl qw(:easy);
    my $logger = get_logger(); # package wide category obtained transparently

    sub foo {
        $logger->fatal("This will get logged.");
        $logger->debug("This won't.");
    }

The easy way automatically determines the category from your __PACKAGE__, but it precludes the detailed control that would be possible if __SUBROUTINE__ existed.

OK, so how do we get the control and the ease ?

caller() will give you the all that detail; package, subroutine, logging-level, even line number, yielding a category like Bar.Twix.new.debug.14. Just call it for every logging invocation.

Note: To save typing, I do this in AUTOLOAD, which handles all logging calls: fatal, error, warning, info, debug.

Damn, thats a lot of overhead

Indeed. For a application-wide logger, its unacceptable. And thats where optimizer comes in.

By munging the methods at compile time, we provide a unique method which may be called later (if the code is reached).

AUTOLOAD can now do the time-consuming (and detailed) category resolution once, then 'remember' it in a code block thats immediately added to the Symbol Table.

Thereafter, the methods are dispatched directly; AUTOLOAD will not be called any more for them.

Ahh. So how do you do it ?

First, a caveat.

Because munging is done at compile-time, blessed logger objects are not available to help recognize a method which should be munged.

I define this as a non-problem by insisting that clients invoke all logging by using Class->method style. This makes it simple and reliable to check for the right Class before munging the method.

But this means changing my existing code-base.

Hey, nothings perfect.

Besides, its easy enough to get 90% switched over, the rest you can shake out with a couple of test-runs.

    perl -pi.bak \
    -e s/(my $logger = (Log::Log4perl::)?get_logger)/#$1/ \
    -e s/$logger->(debug|info|warn|error|fatal)/Logger->$1/ \
    *.pm

OK. So how do you use optimizer

Well, Arthur Bergman is here to tell us all the finer points tomorrow, but heres the quick-and-dirty.

    use optimizer 'extend-c' => sub { opchain-recognizer-code }

The sub is called for every $op, and can march through the chain of linked opcodes, by calling $op->next() successively. It must find the chains of interest, by examining each in turn.

What we look for in the optimizer routine

Here are a few variations of Logger-debug(@args)>, the source-code that we need to recognize and munge in the optimizer routine. Note that there are invariant parts; these are what we need to focus upon.

simplest invocation (no args)

  $> perl -MO=Concise,-exec -e 'Logger->debug()'
  1  <0> enter 
  2  <;> nextstate(main 1 -e:1) v
  3  <0> pushmark s
  4  <$> const(PV "Logger") sM/BARE
  5  <$> method_named(PVIV "debug") 
  6  <1> entersub[t1] vKS/TARG
  7  <@> leave[t1] vKP/REFC
  -e syntax OK

scalar arg

  $> perl -MO=Concise,-exec -e 'Logger->debug(1)'
  1  <0> enter 
  2  <;> nextstate(main 1 -e:1) v
  3  <0> pushmark s
  4  <$> const(PV "Logger") sM/BARE
  5  <$> const(IV 1) sM
  6  <$> method_named(PVIV "debug") 
  7  <1> entersub[t1] vKS/TARG
  8  <@> leave[t1] vKP/REFC
  -e syntax OK

list arg

  $> perl -MO=Concise,-exec -e 'Logger->debug(1,2)'
  1  <0> enter 
  2  <;> nextstate(main 1 -e:1) v
  3  <0> pushmark s
  4  <$> const(PV "Logger") sM/BARE
  5  <$> const(IV 1) sM
  6  <$> const(IV 2) sM
  7  <$> method_named(PVIV "debug") 
  8  <1> entersub[t1] vKS/TARG
  9  <@> leave[t1] vKP/REFC
  -e syntax OK

arrayref arg

  $> perl -MO=Concise,-exec -e 'Logger->debug([1,"a"])'
  1  <0> enter 
  2  <;> nextstate(main 1 -e:1) v
  3  <0> pushmark s
  4  <$> const(PV "Logger") sM/BARE
  5  <0> pushmark s
  6  <$> const(IV 1) s
  7  <$> const(PV "a") s
  8  <@> anonlist sKRM/1
  9  <1> srefgen sKM/1
  a  <$> method_named(PVIV "debug") 
  b  <1> entersub[t1] vKS/TARG
  c  <@> leave[t1] vKP/REFC
  -e syntax OK

hashref arg

  $> perl -MO=Concise,-exec -e 'Logger->debug({1,2})'
  1  <0> enter 
  2  <;> nextstate(main 1 -e:1) v
  3  <0> pushmark s
  4  <$> const(PV "Logger") sM/BARE
  5  <0> pushmark s
  6  <$> const(IV 1) s
  7  <$> const(IV 2) s
  8  <@> anonhash sKRM/1
  9  <1> srefgen sKM/1
  a  <$> method_named(PVIV "debug") 
  b  <1> entersub[t1] vKS/TARG
  c  <@> leave[t1] vKP/REFC
  -e syntax OK

nested call, to builtin

  $> perl -MO=Concise,-exec -e 'Logger->debug(localtime)'
  1  <0> enter 
  2  <;> nextstate(main 1 -e:1) v
  3  <0> pushmark s
  4  <$> const(PV "Logger") sM/BARE
  5  <0> localtime[t1] lM
  6  <$> method_named(PVIV "debug") 
  7  <1> entersub[t2] vKS/TARG
  8  <@> leave[t1] vKP/REFC
  -e syntax OK

nested call, to builtin, (version 2)

  $> perl -MO=Concise,-exec -e 'Logger->debug(localtime())'
  1  <0> enter 
  2  <;> nextstate(main 1 -e:1) v
  3  <0> pushmark s
  4  <$> const(PV "Logger") sM/BARE
  5  <0> localtime[t1] lM*
  6  <$> method_named(PVIV "debug") 
  7  <1> entersub[t2] vKS/TARG
  8  <@> leave[t1] vKP/REFC
  -e syntax OK

nested call, to your routine

  $> perl -MO=Concise,-exec -e 'Logger->debug(foo())'
  1  <0> enter 
  2  <;> nextstate(main 1 -e:1) v
  3  <0> pushmark s
  4  <$> const(PV "Logger") sM/BARE
  5  <0> pushmark s
  6  <$> gv(*foo) s/EARLYCV
  7  <1> entersub[t1] lKMS/NO(),TARG,INARGS,1
  8  <$> method_named(PVIV "debug") 
  9  <1> entersub[t2] vKS/TARG
  a  <@> leave[t1] vKP/REFC
  -e syntax OK

nested call, to your (predefined) routine

  $ perl -MO=Concise,-exec -e 'sub foo{2,3}; Logger->debug(foo())'
  7  <0> enter 
  8  <;> nextstate(main 2 -e:1) v
  9  <0> pushmark s
  a  <$> const(PV "Logger") sM/BARE
  b  <0> pushmark s
  c  <$> gv(*foo) s
  d  <1> entersub[t1] lKMS/NO(),TARG,INARGS,1
  e  <$> method_named(PVIV "debug") 
  f  <1> entersub[t2] vKS/TARG
  g  <@> leave[t1] vKP/REFC
  -e syntax OK

Munging results in Log::Log4perl::AutoCategorize

This example illustrates some of the debug flags which print various details of the munging and subroutine creation process.

perftests/tdebug.pl -d <flags> passes <flags> thru to the module; the full set of which are eminently understandable (by reading the code ;-)

Lots of Subroutine B::PADOP::padix redefined warnings are elided here, I welcome help on suppressing them :-)

$> perl perftests/tdebug.pl -db

This enumerates the subroutines that are vivified by AUTOLOAD. Method munging has already occurred (as you can see).

 building enabled sub: Logger.warn_00004
 building enabled sub: Logger.info_00005
 building enabled sub: Logger.warn_00001
 building enabled sub: Logger.warn_00002
 building disabled sub: Logger.debug_00003
 building enabled sub: Logger.info

Subroutines are built enabled & disabled according to how theyre specified in the log-config; this determination is incorporated into the vivified routines.

$> perl perftests/tdebug.pl -dc

The 'c' flag displays the category determined by AUTOLOAD in the 1st invocation.

 cat: main.main.warn.32
 cat: main.main.info.33
 cat: main.foo.warn.41
 cat: A.bar.warn.51
 cat: A.bar.debug.52
 cat: Log.Log4perl.AutoCategorize.END.info.308

$> perl perftests/tdebug.pl -dca

Here, the 'a' prints the results of AUTOLOAD caller(0), caller(1), the 0: 1: lines are those respective lines, which are combined into the category, which is reprised here.

 0: main,perftests/tdebug.pl,32,Log::Log4perl::AutoCategorize::AUTOLOAD
 cat: main.main.warn.32
 0: main,perftests/tdebug.pl,33,Log::Log4perl::AutoCategorize::AUTOLOAD
 cat: main.main.info.33
 1: main,perftests/tdebug.pl,34,main::foo
 0: main,perftests/tdebug.pl,41,Log::Log4perl::AutoCategorize::AUTOLOAD
 cat: main.foo.warn.41
 1: main,perftests/tdebug.pl,35,A::bar
 0: A,perftests/tdebug.pl,51,Log::Log4perl::AutoCategorize::AUTOLOAD
 cat: A.bar.warn.51
 1: main,perftests/tdebug.pl,35,A::bar
 0: A,perftests/tdebug.pl,52,Log::Log4perl::AutoCategorize::AUTOLOAD
 cat: A.bar.debug.52
 1: main,perftests/tdebug.pl,0,Log::Log4perl::AutoCategorize::END
 0: Log::Log4perl::AutoCategorize,/usr/local/lib/perl5/site_perl/5.8.0/Log/Log4perl/AutoCategorize.pm,308,Log::Log4perl::AutoCategorize::AUTOLOAD
 cat: Log.Log4perl.AutoCategorize.END.info.308

Munging sub in Log::Log4perl::AutoCategorize

  my $munged = '00000'; # preincremented for each munge

  # sub method_munger
  use optimizer 'extend-c' => sub {
    my $opp = shift;

    # look for op-chains which start with pushmark & const == __PACKAGE__
    # Scan until method_named is reached, while keeping track of inner
    # stack manipulations (iow monitor balance of push, pop on @opstack)
    
    my $n1 = $opp->name();
    $opp = $opp->next();
    return if ref $opp eq 'B::NULL';

    my $n2 = $opp->name();
    # by Policy, use Class method invocation only, hence const
    return unless $n1 eq "pushmark" and $n2 eq "const";

    # the Class->method requirement allows code to demand a const
    # opcode, with value == __PACKAGE.

    my $class = '';
    eval { $class = $opp->sv->PV };
    return unless $class eq $Alias or $class =~/^$MyPkg/;

    $DB::single = 1 if $opt->{D};
    $opp->dump if $opt->{d};

    # OK: weve seen 2 required ops; pushmark, const='Logger'.  Now we
    # track stack activity, so that nested meth_named ops dont
    # prematurely end the scan which guards the munge.

    my (@opchain, @opstack, $name);

    while (@opstack or $opp->name ne 'method_named') {

        return if ref $opp eq 'B::NULL';
        $opp = $opp->next();
        return if ref $opp eq 'B::NULL';
        
        push @opchain, $opp;
        $name = $opp->name;
        
        if ($name eq 'pushmark') {
            push @opstack, $opp;
            printf "pushed: %s\n", opNames(\@opchain) if $opt->{s};
        }
        if ($name =~ /refgen|entersub/) {
            printf "popping: %s\n", opNames(\@opchain) if $opt->{s};
            pop @opstack;
        }
    }
   printf "found op-chain: $class => %s\n", opNames(\@opchain) if $opt->{f};

    # this should be proper end of chain
    my ($meth) = $opchain[-1];

    unless ($meth->name eq 'method_named') {
        # this is a sign of problems.
        printf "junk op-chain: $class => %s\n", opNames(\@opchain) if $opt->{j};
        dumpchain(\@opchain) if $opt->{J};
        return;
    }

    printf "matched op-chain: $class => %s\n", opNames(\@opchain) if $opt->{m};
    dumpchain(\@opchain) if $opt->{M};

    my $fnname = $meth->sv->PV;
    unless (get_loglevel($fnname)) {
        print "$fnname is ineligible for munging\n" if $opt->{v};
        return;
    }
    # now do the munge
    print "func: $fnname\n" if $opt->{r};
    $munged++;
    $meth->sv->PV("${fnname}_$munged");
    print "munged func name: ${fnname}_$munged\n" if $opt->{r};

    {
        # record munged fnname, and where its called (to aid test-coverage review)
        no warnings 'uninitialized';
        $unseenCat{"${fnname}_$munged"} = join(',', (caller(0))[0..2]);
    }
    $meth->dump if $opt->{z};
  };

Conclusion: Other uses, Limitations

Munging by itself is not useful; it needs AUTOLOAD to handle the calls to munged (and non-existent) subroutines. AUTOLOAD must also eval the subroutines into existence, and put them in the symbol table, otherwise theres no point to doing the method munging in the 1st place.

The Munge then AUTOLOAD paradigm (there, I snuck it in ;-), is only valuable if you have something clever to do with the detailed lexical knowledge that it makes efficiently available.

AUTOLOAD could do lots of things;

It could build sub warn_00003() to delegate back to warn(@customArgs) instead of creating an entirely new function. In fact, this is approximately what it should do, because creating a lot of large, unique subroutines would greatly enlarge the executable size of your program.

It could query a database to determine what each custom routine should do. This is essentially what Log::Log4perls config file is doing for AutoCategorize.

It could parse the method-name to extract additional information. Some possible, but unimplemented (and probably unwise), examples are;

  debug_dump_20 ("reason: $reson", $data)       # to limit dump depth
  debug_if1 ($if, "heres data", $data)          # if youre exceptionally lazy
  debug_yaml ($data)                            # 
  

But.. theres a limit to what the category can tell you,

    caller (and therefore category) only gives lexical information
    you must add the contextual knowledge of whats important.