Fixing Perl5 Core Bugs: Report for Months 33 & 34
Tue, 15-Jan-2013 by
Karen Pauley
edit post
_Dave Mitchell writes:_
The first part of November was spent finishing off the PADRANGE
optimisation and merging it into blead.
Here's the commit message. After it, I'll discuss timings.
bq. [MERGE] add PADRANGE op and $B::overlay
bq. This commit implements three optimisations and one new feature.
bq. The new feature is $B::overlay, which can be set to a hash ref, indexed by op address, that allows you to override the values returned by the various B::*OP methods for a particular op. This specifically allows Deparse to be tricked into seeing a pre-optimisation view of the optree, and so makes adding new optimisations (like the one in this commit) a lot easier.
bq. As regards optimisations: first, a new save type is added: SAVEt_CLEARPADRANGE, which is like SAVEt_CLEARSV but specifies a range of targs to be cleared. The save type, target base and range all fit within a single integer pushed on the save stack.
bq. Second, a pushmark followed by one or more pad[ahs]v ops (and possibly some mixed-in null, list and nextstate ops) will sometimes be replaced by a single padrange op. Like other pad ops, this specifies a targ, but in addition the bottom 7 bits of op_private indicate a target range. pp_padrange has two main actions: with OPpLVAL_INTRO, it pushes a SAVEt_CLEARPADRANGE onto the save stack and turns off SvPADSTALE on all the lexicals; and in non-void context, it pushes all the lexicals onto the stack.
bq. Third, for the specific case of the construct my(...) = @_, the ops to push @_ onto the stack (pushmark/gv[*_]/rv2sv) are skipped, and the OPf_SPECIAL flag on the padrange op is set: this tells pp_padrange to push @_ directly.
bq. Note that not sequences of pad ops are consolidated into a single padrange op; the chief constraints are that:
* they must form a list (i.e. start with pushmark);
* the targs must form a contiguous range;
* the flags of the ops must all be similar; e.g. all INTRO or all not, all void or all not, etc;
* only a subset of flags are allowed; e.g. we don't optimise with OPpPAD_STATE or OPpMAYBE_LVSUB present.
bq. For the specific case of void/INTRO, we consolidate across nextstate boundaries (keeping only the last nextstate); i.e.
bc. my ($a,$b); my @c; my (%d,$e,$f)
bq. becomes a single padrange op. Note that the padrange optimisation is particularly efficient for the void/INTRO combination: formerly, my($a,$b,@c,%d); would be compiled as
bc. pushmark; padsv[$a]; padsv[$b]; padav[@c]; padhv[%d]; list; nextstate
bq. which would have the effect of pushing $a, $b onto the stack, then pushing the (non-existent) elements of @c, then pushing the %d HV; then pp_list would pop all the elements except the last, %h; finally, nextstate would pop %h. Instead, padrange skips all the pushing and popping in void context. Note that this means that there is one user-visible change caused by this optimisation:
bc. f();
my @a;
sub f { tie @a, ...; push @a, .... }
bq. Here, @a is tied and already has elements by the time the 'my @a' is executed; formerly, FETCH would be called repeatedly to push the elements of @a onto the stack, then they would all be popped again at the end of the 'my @a' statement; now FETCH is never called.
bq. The optimisation itself is implemented by converting the initial pushmark op into a padrange, and updating its op_next. The skipped ops are *not* cleared; this makes it easier for S_find_uninit_var() and Deparse.pm to do their stuff. Deparse is implemented by using the new $B::overlay facility to make the padrange op look like a pushmark op again; the rest of the Deparse code just sees the original unoptimised optree and so doesn't require any knowledge of the padrange op.
====================================================
Timings:
consider the following code:
bc. sub f { ... }
bc. for (1..10_000_000) {
f(1,2,3);
}
Running that with dumbbench for various bodies of f gives:
bc. orig padrange %speedup
------ -------- --------
0.4716 0.4850 -2.8% empty loop: skip call to f()
2.4136 2.5536 -5.4% sub f { }
3.9072 3.7901 3.1% sub f { $_[0] + $_[1] * $_[2] }
4.1850 3.3682 24.3% sub f { my ($x,$y,$z); 1 }
5.514 4.8030 14.8% sub f { my ($x,$y,$z) = @_; }
6.2401 5.4497 14.5% sub f { my ($x,$y,$z) = @_; $x+$y*$z; }
Numbers are average wallclock times in secs, lower is better. (x86_64, gcc -O2, lots of code alignment options supplied to help give consistent results). The times are the complete time to run the program, including startup and loop/function-call overhead.
The first three results are essentially noise, since they don't exercise code paths affected by the optimisation; but they also give you an idea of what consumes the CPU. The third one represents a classic code example where direct access to @_ elements is done for speed. This should be compared with the last example, which uses lexical vars for the same end. The lexical variant is still slower, but the gap has been considerably narrowed (40% slower rather than 60% slower).
Finally, a code example specifically chosen to make this optimisation look good :-) ...
bc. for my $i (1..10_000_000) {
my ($a,$b,$c);
my $d;
my (@e,%f);
my $g;
1;
}
This gets particularly good marks, as all those my's are coalesced into a single padrange op, and all the void context stack pushes and nextstates are skipped. In this case, the loop runs 91% faster :-) [ in the mathematical sense that 100% faster == takes half as long to run ]
-----------------------------------------------------------------
The rest of November and the first part of December was mostly spent doing non-TPF stuff. Then later on in December, I looked at the three address-sanitizer bugs that Reini Urban had reported against 5.14.3, and confirmed that none of them were in fact regressions, nor security issues; but I patched maint-5.14 anyway.
I fixed bug #116148: if two successive patterns were executed, the first being utf8 and the second not, and if the first match failed or succeeded purely using re_intuit_start() (this is the 'quick guess' initial stage of the regex engine), then in the second pattern's call to re_intuit_start(), it would be treated as utf8 too. This is a long-standing bug, but only affects a small number of cases (which varies over time based on what optimisations can be handled).
The bug turned out to be that the RF_utf8 flag in PL_reg_flags was set by re_intuit_start() but never unset, in contrast to regexec_flags(), which initially set or reset it as appropriate.
The direct fix was fairly trivial, but I also took the opportunity to eliminate PL_reg_flags altogether. This is one of a number of global (per-interpreter) variables that contain state for the current match, and that ideally need eliminating and replacing with local vars. They are a relic of the original (perl 3 era) Spencer regex package that wasn't designed to be re-entrant, and which kept the state for the current match in a bunch of static vars within regexec.c.
------
Over the last 2 months I have averaged 4 hours per week
As of 2012/12/31: since the beginning of the grant:
bq. 147.1 weeks
1586.5 total hours
10.8 average hours per week
There are 113 hours left on the grant.
Report for period 2012/11/01 to 2012/12/31 inclusive
**Summary**
Effort (HH::MM):
bq. 8:50 diagnosing bugs
26:05 fixing bugs
0:00 reviewing other people's bug fixes
0:00 reviewing ticket histories
0:00 review the ticket queue (triage)
-----
**34:55 Total**
**Numbers of tickets closed:**
bq. 5 tickets closed that have been worked on
0 tickets closed related to bugs that have been fixed
0 tickets closed that were reviewed but not worked on (triage)
-----
**5 Total**
**Short Detail**
bq. 19:05 [perl #114536] Optimize assigning to scalars from @_
1:00 [perl #115602] Moose fails in List::MoreUtils::all use-after-free
3:15 [perl #115990] 3 new severe ptr errors in 5.14.3 (non-threaded asan)
4:00 [perl #115992] 5.14.3 use-after-free in t/op/local.t op_lvalue
0:20 [perl #115994] 5.14.3 S_join_exact global-buffer-overflow
7:15 [perl #116148] Pattern utf8ness sticks around globally
Category:
(none)
Comments (0)