Reini Urban > B-C-1.43 > ramblings/frozenperl_2010.pod

Download:
B-C-1.43.tar.gz

Annotate this POD

Website

View/Report Bugs
Source   Latest Release: B-C-1.52

Frozen Perl 2010 ^

The Perl Compiler

rurban - Reini Urban <br> Graz, Austria

What's new? ^

Fixed most bugs (in work) <br> bytecode: 12=>0, c: 6=>1, cc: 9=>5
5.10 and 5.12, non-threaded favored (faster)
.plc platform compatible, almost version compatible (.plc header change)
added testsuite
more and better optimisations (in work)
removed B::Stash bloat from perlcc, -stash [optional]

Who am I ^

rurban maintains cygwin perl since 5.8.8 and 3-4 modules, guts, B::* => 5.10

Mostly doing LISP Perl and PHP, and support for custom HW, windows + linux + real-time systems in real-life. Coding in winter, surfing in summer.

1995 first on CPAN with the perl5.hlp file and converter for Windows.

Contents ^

Started 1995 by Malcom Beattie, abandoned 2007 by p5p, revived 2008 by me

Very dynamic language. eval "require $foo;" -> which packages?

Overview
Status
Plans

Why use B::C / perlcc? ^

Improved startup time, esp. significant with larger code.
Reduced memory usage. <br><small> 9% less memory w/ 25000 lines</small>
Distribute binary only versions
No need to ship an entire perl install
Self contained application
But you could also use a "Packager", like perl2exe, perlapp, PAR <br> <small>They are no compilers, slower startup </small>
And with B::CC - Improve run-time

Overview ^

In the Perl Compiler suite B::C are three seperate compilers:

B::Bytecode / ByteLoader (freeze/thaw to .plc + .pmc)
B::C (freeze/thaw to .c)
B::CC (optimising to .c)

perl toke.c/op.c - B::C - perl op walker run.c

Eliminate the whole parsing and dynamic allocation time.

The Walker ^

After compilation walk the "op tree" - run.c

The Walker ^

Observation

1. The op tree is not a "tree", it is reduced to a simple linked list of ops. Every "op" (a pp_&lt;opname&gt; function) returns the next op.

2. PERL_ASYNC_CHECK is called after every single op.

Perl Phases - the "Perl Compiler" ^

=> Parse + Compile to op tree (in three phases, see perlguts and perloptree) <br>
BEGIN (use ...)
CHECK (O modules)
INIT (main phase)
END (cleanup, perl destructors)

Normal Perl functions start at INIT, after BEGIN and CHECK. <br> The O modules start at CHECK, and skip INIT.

Perl Phases - the "B Compilers" ^

Parse + Compile to op tree (in three phases)
BEGIN (use ...)
=> CHECK (O) => freeze
compiled INIT (main phase)
compiled END (cleanup, perl destructors)

Perl Phases - the "B Compilers" ^

The B Compilers, invoked via O, freeze the state in CHECK, and invoke then the walker.

  $ perl -MO=C,-omyprog.c myprog.pl <br>
  $ cc_harness -o myprog myprog.c <br>
  $ ./myprog

B::CC - Unoptimised / the walker ^

B::CC - The optimiser / unrolled ^

</font>

B::CC - The optimiser / unrolled ^

<br><br><br>

no CALL_FPTR - call by ref
static direct function call
prefetched into CPU cache!
no unneeded stack handling
PERL_ASYNC_CHECK only after every basic block

Status ^

5.6.2 and 5.8.9 non-threaded B::C are quite usable and have the least known bugs, but 5.10 and 5.12 became also pretty stable now.

Targets:

Bugfixes for B::C
Test top100 CPAN modules (3-4 fail)
Isolate bugs into simple tests (35 cases)
Test the perl cores suite (~20 fails) <br> Estimated 3-4 more open bugs.

Status ^

5.6.2 + 5.8.9 are almost bug free, with B::Bytecode and B::C
B::C >=5.10 threaded (pads) in work <br> 2-3 minor bugs with certain modules
With debugging perls there seem to be less bugs than with releases. <small>Normally it 's the other way round</small>
B::CC has some limitations and some known bugs

See testsuite and STATUS

Projects ^

Which software is compiler critical?

Projects ^

Which software is compiler critical?

Execution time is the same (sans B::CC)

Projects ^

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster

Projects ^

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster.

Web Apps with fast response times -

1 sec more or less => good or bad software

Projects ^

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster.

Web Apps with fast response times -

Optimise static initialization - strings and arrays

New Optimisations ^

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

ltrace reveils Gthr_key_ptr, gv_fetchpv, savepvn, av_extend and safesysmalloc as major culprits, the later three at startup-time.

New Optimisations ^

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

common constant strings with gcc -Os => automatically optimised

New Optimisations ^

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

common constant strings with gcc -Os => automatically optimised

av_extend - run-time malloc => static arrays ?

New Optimisations ^

av_extend - run-time malloc => static arrays ?

static arrays are impossible if not Readonly

can not be extended at run-time, need to be realloc'ed into the heap.

New Optimisations ^

av_extend - run-time malloc => static arrays ?

pre-allocate faster with -fav-init or -O3

at least this is the idea. Same for hashes (nyi).

Real Life Applications ^

cPanel has used B::C compiled 5.6 for a decade, and wants to switch to 5.8.9 (or later).

cPanel offers web hosting automation software that manages provider data, domains, emails, webspace. A typical large webapp. Perl startup time can be too slow for many AJAX calls which need fast initial response times.

Benchmarks (by cPanel) ^

Larger code base => more significant startup improvements

18.78x faster startup for large production applications. (~ 70000 lines)
3.52x faster startup on smaller applications. (~8000 lines)
3x faster startup on tiny applications < 1024 lines of code
2x faster startup for very tiny applications
Guessed: 2x-10x faster run-time for CC optimised code, esp. arithmetic.

Benchmarks (by cPanel) ^

    Web Service Daemon <br>

    Resident Size (perlcc)  9072 <br>
    Resident Size (perl)    9756 <br> <br>

    DNS Settings Client <br>

    Startup Time (perl)   0.074 <br>
    Startup Time (perlcc) 0.021 <br> <br>

    HTML Template Processor <br>

    Startup Time (perlcc) 0.037 <br>
    Startup Time (perl)   0.695 <br>

Plans ^

2010: Find and fix all remaining bugs

2010: Faster testsuite (Now 8 min - 40min - 2 days)

2011: CC type and sub optimisations

2012: CC unrolling => jit within perl (perl -j)

Emit parrot pir.

B::CC Limitations ^

run-time ops vs compile-time ...

dynamic range 1..$foo

goto/next/last $label

Undetected modules behind eval "require": <br> use -uModule to enforce scanning these

B::CC Limitations ^

run-time ops vs compile-time<br> BEGIN blocks only compile-time side-effects.

  BEGIN { <br>
&nbsp;&nbsp;    use Package;   # okay <BR>
&nbsp;&nbsp;    chdir "dir";   # not okay. <BR>
&nbsp;&nbsp;                   # only done at compile-time, not at the user<BR>
&nbsp;&nbsp;    print "stuff"; # okay, only at compile-time <BR>
&nbsp;&nbsp;    eval "what";   # hmm; depends <br>
  }

Move eval "require Package;" to BEGIN

B::CC Bugs ^

Custom sort BLOCK is buggy, wrong queue implementation

B::CC Bugs ^

Custom sort BLOCK is buggy, wrong queue implementation, causing an endless loop

  sort { $a <=> $b }  <br>
  <small>is optimised away, ok</small><br><br>

  sort { $hash{$a} <=> $hash{$b} } <br>
  <small>maybe?</small><br><br>

  sort { $hash{$a}->{field} <=> $hash{$b}->{field} }  <br>
  <small>for sure not</small>

Testsuite ^

user make test (via cpan):

35x bytecode + c -O0 - O4 + cc -O0 - O2

=> 8 min

Testsuite ^

author make test:

35x bytecode + c -O0 - O4 + cc -O0 - O2 (8 min)

modules.t top100 (16 min)

+ testcore.t (16 min)

=> ~40 min

Testsuite ^

author make test 40 min

for 5-10 perls (5.6, 5.8, 5.10, 5.11 / threaded + non-threaded) 4*2=8

on 5 platforms (cygwin, debian, centos, solaris, freebsd)

=> 26 h (8*5*40 = 1600min) = 1-2 days, similar to the gcc testsuite.

Testsuite ^

top100 modules?

See webpage or svn repo for results for all tested perls / modules

With 5.8 non-threaded 3 fails Attribute::Handlers B::Hooks::EndOfScope YAML MooseX::Types

With blead non-threaded 4 fails Attribute::Handlers File::Temp ExtUtils::Install

unpredictable results: e.g. threaded 5.10 39/98 (cygwin release) vs 3/80 (a test version) fails. Innocent change => fatal consequences.

CC ^

Sub calls - Opcodes

What can we statically leave out per pp_?

Now: arguments passing, return values for 50% ops

Planned: more + direct xsub calls.

Types - understand declarations

Now: Unroll for known static types pp_opname completely into simple arithmetic.

Known static types at compile-time? User declarations or Devel::TypeCheck

CC - Type declarations ^

Currently:

  my $E<lt>nameE<gt>_i;  IV integer <br>
  my $E<lt>nameE<gt>_ir; IV integer in a pseudo register <br>
  my $E<lt>nameE<gt>_d;  NV double 

<hr>

Future ideas are type qualifiers such as <br> <code>my (int $foo, double $foo_d); </code>

or attributes such as <br> <code>my ($foo:Cint, $foo:Cintr, $foo:Cdouble);</code>

or MooseX::Types

Code ^

http://search.cpan.org/dist/B-C/

http://code.google.com/p/perl-compiler/

Planned:

http://compiler.perl.org/

mailto:compiler@perl.org

Questions? ^

syntax highlighting: