Fixing Perl5 Core Bugs: Report for Month 38
Fri, 10-May-2013 by
Karen Pauley
edit post
_Dave Mitchell writes:_
This month I worked on three 5.18 blocker tickets; all three being regressions related to my jumbo re_eval fix back in 5.17.1.
The first, which I continued working on from last month, was the "Regexp::Grammars" bug.
Basically, my reworking of the /(?{})/ implementation assumed that a constant string segment like "foo" in /foo..../ would indeed be constant; but in the presence of
bc. use overload::constant qr => sub { bless [], ... }
the "constant" can be anything but, including an overloaded or REGEXP object. So the concatenation of the pattern's string segments didn't handle all the extra stuff like doing overloading properly or extracting out pre-compiled code blocks from qr// objects.
This is now fixed.
The second issue concerned handling arrays embedded within literal regexes, e.g. /...@a.../. This was partially to fix a regression from 5.16.x where, if @a contained a qr/...( {...}).../, then suddenly you'd need a 'use re eval' where you didn't need one before: RT #115004. But it also enhances the behaviour of array interpolation relative to 5.16.x too, especially relating to closures and overloading.
Basically, the traditional behaviour of run-time patterns such as /a${b}c/ was to concatenate the pattern components together, then pass it to the regex engine. My 5.17.1 jumbo re_eval fix changed that so that the list of args was preserved and passed as-is to the regex engine. This meant that the engine could do things like extract out existing optrees from code blocks in something like $b = qr/...(?{...}).../, rather than having to recompile them. So closures work properly.
The thing I missed back then was applying the same new handling to arrays as well as scalars. Until my fix, /a@{b}c/ would be parsed as
bc. regcomp('a', join($", @b), 'c')
This meant that the array was flattened and its contents stringified before hitting the regex engine.
I've now changed it so that it is parsed as
bc. regcomp('a', @b, 'c')
(but where the array isn't flattened, but rather just the AV itself is pushed onto the stack, c.f. push @b, ....).
As well as handling closures properly, it also means that 'qr' overloading is now handled with interpolated arrays as well as with scalars:
bc. use overload 'qr' => sub { return qr/a/ };
my $o = bless [];
my @a = ($o);
"a" =~ /^$o$/; # always worked
"a" =~ /^@a$/; # now works too
As well as the new handling of arrays, the pattern concatenation code within Perl_re_op_compile was heavily reworked, resulting in fixing a utf8 edge case, and generally simplifying the code, including enabling the removal of a clunky if (0) { label: ... } bit of code.
This issue is now fully fixed.
The third issue concerned how caller() and __SUB__ work within regex code blocks. It turns out that since my re_eval jumbo fix, code blocks in literal matches were displaying an extraneous extra stack frame. This code:
bc. #!/usr/bin/perl
use Carp;
sub f3 { croak() }
sub f2 { "a" =~ /a(?{f3(3)})/ }
sub f1 { f2(2) }
f1(1);
gives the following results:
bc. 5.16.3:
main::f3(3) called at (re_eval 1) line 1
main::f2(2) called at /home/davem/tmp/p line 6
main::f1(1) called at /home/davem/tmp/p line 7
bc. 5.17.10:
main::f3(3) called at /home/davem/tmp/p line 5
main::f2 called at /home/davem/tmp/p line 5
main::f2(2) called at /home/davem/tmp/p line 6
main::f1(1) called at /home/davem/tmp/p line 7
bc. blead:
main::f3(3) called at /home/davem/tmp/p line 5
main::f2(2) called at /home/davem/tmp/p line 6
main::f1(1) called at /home/davem/tmp/p line 7
In addition, the __SUB__ token, which returns a reference to the current subroutine, was returning a ref to the hidden anonymous sub which is now used to implement closure behaviour correctly for code blocks within qr//'s; that is,
bc. $r = qr/foo(?{...})bar/;
is supposed to behave like
bc. $r = sub { /foo/ && do {...} && /bar/ }
as far as closures are concerned. The trouble is, the anon sub was never designed to be called directly, and in fact perl SEGVs if you do attempt to call it. The workaround for this is to skip regex calls on the context stack when looking for the CV for __SUB__; this has the effect of __SUB__ always returning the sub which executed the pattern match, regardless of what direct code blocks (/(?{})/), or indirect code blocks ( $r = qr/(?{})/; /a$r/ ) have been called. I have documented this as subject to change for now.
Over the last month I have averaged 8.8 hours per week
As of 2013/04/30: since the beginning of the grant:
bq. 164.5 weeks
1656.6 total hours
10.1 average hours per week
There are 43 hours left on the grant.
Report for period 2013/04/01 to 2013/04/30 inclusive
**Summary**
Effort (HH::MM):
bq. 3:08 diagnosing bugs
35:27 fixing bugs
0:00 reviewing other people's bug fixes
0:00 reviewing ticket histories
0:00 review the ticket queue (triage)
-----
**38:35 Total**
**Numbers of tickets closed:**
bq. 3 tickets closed that have been worked on
0 tickets closed related to bugs that have been fixed
0 tickets closed that were reviewed but not worked on (triage)
-----
**3 Total**
**Short Detail**
bq. 7:35 [perl #113928] caller behaving unexpectedly in re-evals
19:33 [perl #115004] perl 5.17.x can't use @var in regexp, but only $var
11:27 [perl #116823] Regexp::Grammars broken since 5.17.1
Comments (0)