The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=head1 Frozen Perl 2010

The Perl Compiler

B<rurban> - Reini Urban <br>
Graz, Austria

=head1 What's new?

=over

=item Fixed most bugs I<(in work)> <br> bytecode: 12=>0, c: 6=>1, cc: 9=>5

=item 5.10 and 5.12, non-threaded favored I<(faster)>

=item .plc platform compatible, almost version compatible I<(.plc header change)>

=item added testsuite

=item more and better optimisations I<(in work)>

=item removed B::Stash bloat from perlcc, -stash [optional]

=back

=head1 Who am I

rurban maintains B<cygwin perl> since 5.8.8 
and 3-4 modules, guts, B<B::*> => 5.10

Mostly doing B<LISP> B<Perl> and B<PHP>, and support for custom HW,
windows + linux + real-time systems in real-life.
Coding in winter, surfing in summer.

1995 first on CPAN with the F<perl5.hlp> file and converter for Windows.

=head1 Contents

Started 1995 by Malcom Beattie, abandoned 2007 by p5p, revived 2008 by me

Very dynamic language. eval "require $foo;" -> which packages?

=over

=item Overview 

=item Status

=item Plans

=back

=head1 Why use B::C / perlcc?

=over

=item Improved startup time, esp. significant with larger code.  

=item Reduced memory usage. <br><small> 9% less memory w/ 25000 lines</small>

=item Distribute binary only versions

=over 8

=item No need to ship an entire perl install

=item Self contained application

=item But you could also use a "Packager", like L<perl2exe>, L<perlapp>, L<PAR> <br> <small>They are no compilers, slower startup </small>

=back

=item And with B::CC - Improve run-time

=back

=head1 Overview

In the Perl Compiler suite B<B::C> are three seperate compilers:

=over

=item B::Bytecode / ByteLoader (I<freeze/thaw> to F<.plc> + F<.pmc>)

=item B::C  (I<freeze/thaw> to F<.c>)

=item B::CC (I<optimising> to F<.c>)

=back

perl toke.c/op.c  -  B::C  - perl B<op walker> run.c

Eliminate the whole parsing and dynamic allocation time.

=head1 The Walker

After compilation walk the "op tree" - F<run.c>

=begin perl

  int
  Perl_runops_standard(pTHX)
  {
	dVAR;
	while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
		PERL_ASYNC_CHECK();
	}
	TAINT_NOT;
	return 0;
  }

=end perl

=head1 The Walker

=begin perl

	while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
		PERL_ASYNC_CHECK();
	}

=end perl

=head2 Observation

1. The op tree is not a B<"tree">, it is reduced to a simple B<linked list> of ops.
Every "op" (a C<pp_&lt;opname&gt;> function) returns the next op.

2. PERL_ASYNC_CHECK is called after every single op.

=head1 Perl Phases - the "Perl Compiler"

=over

=item => B<Parse + Compile> to op tree I<(in three phases, see> L<perlguts> and L<perloptree>)
<br>

=item BEGIN I<(use ...)>

=item CHECK I<(O modules)>

=item INIT I<(main phase)>

=item END I<(cleanup, perl destructors)>

=back

Normal Perl functions start at B<INIT>, after BEGIN and CHECK. <br>
The B<O> modules start at B<CHECK>, and skip INIT.

=head1 Perl Phases - the "B Compilers"

=over

=item Parse + Compile to op tree I<(in three phases)>

=item BEGIN I<(use ...)>

=item => B<CHECK> (O) => freeze

=item compiled I<INIT (main phase)>

=item compiled I<END (cleanup, perl destructors)>

=back

=head1 Perl Phases - the "B Compilers"

The B Compilers, invoked via B<O>, freeze the state in B<CHECK>,
and invoke then the walker.

  $ perl -MO=C,-omyprog.c myprog.pl <br>
  $ cc_harness -o myprog myprog.c <br>
  $ ./myprog

=head1 B::CC - Unoptimised / the walker

=begin C

  while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
        PERL_ASYNC_CHECK();
  }

=end C

=head1 B::CC - The optimiser / unrolled

=begin C

  while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
        PERL_ASYNC_CHECK();
  }
  is unrolled to:
  lab_15de5b8:
	PUSHMARK(sp);
	XPUSHs(GvSVn((GV*)PL_curpad[1]));
	PL_op = (OP*)&listop_list[0];
	PUTBACK; PL_op = pp_print(); SPAGAIN;
  lab_1248c18:
	PP_UNSTACK;
	PERL_ASYNC_CHECK();
  /* next basic block */

=end C

</font>

=head1 B::CC - The optimiser / unrolled 

<br><br><br>

=over

=item no CALL_FPTR - call by ref

=item static direct function call

=item prefetched into CPU cache!

=item no unneeded stack handling

=item PERL_ASYNC_CHECK only after every basic block

=back

=head1 Status

5.6.2 and 5.8.9 non-threaded B::C are quite usable and have the
least known bugs, but 5.10 and 5.12 became also pretty stable
now.

=head2 Targets:

=over

=item Bugfixes for B<B::C>

=item Test top100 CPAN modules I<(3-4 fail)>

=item Isolate bugs into simple tests I<(35 cases)>

=item Test the perl cores suite I<(~20 fails)> <br> Estimated 3-4 more open bugs.

=back

=head1 Status

=over

=item 5.6.2 + 5.8.9 are almost bug free, with B::Bytecode and B::C

=item B::C >=5.10 threaded (I<pads>) in work <br> I<2-3 minor bugs with certain modules>

=item With debugging perls there seem to be less bugs than with releases. <small>I<Normally it 's the other way round></small>

=item B::CC has some limitations and some known bugs

I<See testsuite and> F<STATUS>

=back

=head1 Projects

Which software is compiler critical?

=head1 Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

=head1 Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster

=head1 Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster.

Web Apps with fast response times - 

B<1 sec more or less =E<gt> good or bad software>

=head1 Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster.

Web Apps with fast response times - 

Optimise static initialization - strings and arrays

=head1 New Optimisations

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

C<ltrace> reveils C<Gthr_key_ptr>, C<gv_fetchpv>, C<savepvn>,
C<av_extend> and C<safesysmalloc> as major culprits, the later
three at B<startup-time>.

=head1 New Optimisations

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

common constant strings with gcc -Os =E<gt> automatically optimised

=head1 New Optimisations

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

common constant strings with gcc -Os =E<gt> automatically optimised

av_extend - run-time malloc =E<gt> static arrays ?

=head1 New Optimisations

av_extend - run-time malloc => static arrays ?

static arrays are impossible if not B<Readonly>

can not be extended at run-time, need to be realloc'ed into the heap.

=head1 New Optimisations

av_extend - run-time malloc =E<gt> static arrays ?

pre-allocate faster with B<-fav-init> or B<-O3>

I<at least this is the idea>. Same for hashes I<(nyi)>.

=head1 Real Life Applications

B<cPanel> has used B::C compiled 5.6 for a decade, 
and wants to switch to 5.8.9 (or later).

B<cPanel> offers web hosting automation software that manages
provider data, domains, emails, webspace. A typical large webapp.
Perl startup time can be too slow for many AJAX calls which need 
fast initial response times.

=head1 Benchmarks (by cPanel)

Larger code base => more significant startup improvements

=over

=item 18.78x faster startup for large production applications. (~ 70000 lines)

=item 3.52x faster startup on smaller applications.   (~8000 lines)

=item 3x faster startup on tiny applications < 1024 lines of code

=item 2x faster startup for very tiny applications

=item Guessed: 2x-10x faster run-time for CC optimised code, esp. arithmetic.

=back

=head1 Benchmarks (by cPanel)

    Web Service Daemon <br>

    Resident Size (perlcc)  9072 <br>
    Resident Size (perl)    9756 <br> <br>

    DNS Settings Client <br>

    Startup Time (perl)   0.074 <br>
    Startup Time (perlcc) 0.021 <br> <br>

    HTML Template Processor <br>

    Startup Time (perlcc) 0.037 <br>
    Startup Time (perl)   0.695 <br>

=head1 Plans

2010: Find and fix all remaining bugs

2010: Faster testsuite (Now C<8 min - 40min - 2 days>)

2011: CC type and sub optimisations

2012: CC unrolling => jit within perl (C<perl -j>)

Emit parrot pir.

=head1 B::CC Limitations

run-time ops vs compile-time ...

dynamic range 1..$foo

goto/next/last $label

Undetected modules behind eval "require": <br>
use B<-uModule> to enforce scanning these

=head1 B::CC Limitations

run-time ops vs compile-time<br>
BEGIN blocks only compile-time side-effects.

  BEGIN { <br>
&nbsp;&nbsp;    use Package;   # okay <BR>
&nbsp;&nbsp;    chdir "dir";   # not okay. <BR>
&nbsp;&nbsp;                   # only done at compile-time, not at the user<BR>
&nbsp;&nbsp;    print "stuff"; # okay, only at compile-time <BR>
&nbsp;&nbsp;    eval "what";   # hmm; depends <br>
  }

Move eval "require Package;" to BEGIN

=head1 B::CC Bugs

Custom sort BLOCK is buggy, I<wrong queue implementation>

=head1 B::CC Bugs

Custom sort BLOCK is buggy, I<wrong queue implementation, causing an endless loop>

  sort { $a <=> $b }  <br>
  <small>is optimised away, ok</small><br><br>

  sort { $hash{$a} <=> $hash{$b} } <br>
  <small>maybe?</small><br><br>

  sort { $hash{$a}->{field} <=> $hash{$b}->{field} }  <br>
  <small>for sure not</small>

=head1 Testsuite

B<user> make test I<(via cpan)>:

35x bytecode + c -O0 - O4 + cc -O0 - O2

=> B<8 min>

=head1 Testsuite

B<author> make test:

35x bytecode + c -O0 - O4 + cc -O0 - O2 I<(8 min)>

B<modules.t top100> B<(16 min)>

+ B<testcore.t> B<(16 min)>

=> B<~40 min>

=head1 Testsuite

author make test B<40 min>

for B<5-10> perls I<(5.6, 5.8, 5.10, 5.11 / threaded + non-threaded)> 4*2=8

on B<5 platforms> (cygwin, debian, centos, solaris, freebsd)

=> 26 h (8*5*40 = 1600min) = B<1-2 days>, similar to the gcc testsuite.

=head1 Testsuite

top100 modules?

See webpage or svn repo for results for all tested perls / modules

With 5.8 non-threaded 3 fails B<Attribute::Handlers> B<B::Hooks::EndOfScope> B<YAML> B<MooseX::Types>

With blead non-threaded 4 fails B<Attribute::Handlers> B<File::Temp> B<ExtUtils::Install>

unpredictable results: e.g. threaded 5.10 B<39/98> (cygwin release) vs B<3/80> (a test version) fails. Innocent change => fatal consequences.

=head1 CC

=over

=item Sub calls - Opcodes

What can we statically leave out per pp_?

Now: arguments passing, return values for 50% ops

Planned: more + direct xsub calls.

=item Types - understand declarations

Now: Unroll for known static types pp_opname completely into simple arithmetic.

Known static types at compile-time? User declarations or L<Devel::TypeCheck>

=back

=head1 CC - Type declarations

Currently:

  my $E<lt>nameE<gt>_i;  IV integer <br>
  my $E<lt>nameE<gt>_ir; IV integer in a pseudo register <br>
  my $E<lt>nameE<gt>_d;  NV double 

<hr>

Future ideas are type qualifiers such as <br>
	<code>my (int $foo, double $foo_d); </code>

or attributes such as <br>
	<code>my ($foo:Cint, $foo:Cintr, $foo:Cdouble);</code>

or L<MooseX::Types>

=head1 Code

L<http://search.cpan.org/dist/B-C/>

L<http://code.google.com/p/perl-compiler/>

Planned:

L<http://compiler.perl.org/>

L<mailto:compiler@perl.org>

=head1 Questions?

=cut

rurban Minneapolis, 2010-02-07

__END__
Local Variables:
  fill-column:65
End: