Fixing Perl5 Core Bugs: Report for Month 38

Dave Mitchell writes:

This month I worked on three 5.18 blocker tickets; all three being regressions related to my jumbo re_eval fix back in 5.17.1.

The first, which I continued working on from last month, was the "Regexp::Grammars" bug.

Basically, my reworking of the /(?{})/ implementation assumed that a constant string segment like "foo" in /foo..../ would indeed be constant; but in the presence of

    use overload::constant qr => sub { bless [], ... }

the "constant" can be anything but, including an overloaded or REGEXP object. So the concatenation of the pattern's string segments didn't handle all the extra stuff like doing overloading properly or extracting out pre-compiled code blocks from qr// objects.

This is now fixed.

The second issue concerned handling arrays embedded within literal regexes, e.g. /[email protected]/. This was partially to fix a regression from 5.16.x where, if @a contained a qr/...( {...}).../, then suddenly you'd need a 'use re eval' where you didn't need one before: RT #115004. But it also enhances the behaviour of array interpolation relative to 5.16.x too, especially relating to closures and overloading.

Basically, the traditional behaviour of run-time patterns such as /a${b}c/ was to concatenate the pattern components together, then pass it to the regex engine. My 5.17.1 jumbo re_eval fix changed that so that the list of args was preserved and passed as-is to the regex engine. This meant that the engine could do things like extract out existing optrees from code blocks in something like $b = qr/...(?{...}).../, rather than having to recompile them. So closures work properly.

The thing I missed back then was applying the same new handling to arrays as well as scalars. Until my fix, /[email protected]{b}c/ would be parsed as

    regcomp('a', join($", @b), 'c')

This meant that the array was flattened and its contents stringified before hitting the regex engine.

I've now changed it so that it is parsed as

    regcomp('a', @b, 'c')

(but where the array isn't flattened, but rather just the AV itself is pushed onto the stack, c.f. push @b, ....).

As well as handling closures properly, it also means that 'qr' overloading is now handled with interpolated arrays as well as with scalars:

    use overload 'qr' => sub { return  qr/a/ };
    my $o = bless [];
    my @a = ($o);
    "a" =~ /^$o$/; # always worked
    "a" =~ /^@a$/; # now works too

As well as the new handling of arrays, the pattern concatenation code within Perl_re_op_compile was heavily reworked, resulting in fixing a utf8 edge case, and generally simplifying the code, including enabling the removal of a clunky if (0) { label: ... } bit of code.

This issue is now fully fixed.

The third issue concerned how caller() and SUB work within regex code blocks. It turns out that since my re_eval jumbo fix, code blocks in literal matches were displaying an extraneous extra stack frame. This code:

    use Carp;
    sub f3 { croak() }
    sub f2 { "a" =~ /a(?{f3(3)})/ }
    sub f1 { f2(2) }

gives the following results:

        main::f3(3) called at (re_eval 1) line 1
        main::f2(2) called at /home/davem/tmp/p line 6
        main::f1(1) called at /home/davem/tmp/p line 7
        main::f3(3) called at /home/davem/tmp/p line 5
        main::f2 called at /home/davem/tmp/p line 5
        main::f2(2) called at /home/davem/tmp/p line 6
        main::f1(1) called at /home/davem/tmp/p line 7
        main::f3(3) called at /home/davem/tmp/p line 5
        main::f2(2) called at /home/davem/tmp/p line 6
        main::f1(1) called at /home/davem/tmp/p line 7

In addition, the SUB token, which returns a reference to the current subroutine, was returning a ref to the hidden anonymous sub which is now used to implement closure behaviour correctly for code blocks within qr//'s; that is,

    $r = qr/foo(?{...})bar/;

is supposed to behave like

    $r = sub { /foo/  && do {...} && /bar/ }

as far as closures are concerned. The trouble is, the anon sub was never designed to be called directly, and in fact perl SEGVs if you do attempt to call it. The workaround for this is to skip regex calls on the context stack when looking for the CV for SUB; this has the effect of SUB always returning the sub which executed the pattern match, regardless of what direct code blocks (/(?{})/), or indirect code blocks ( $r = qr/(?{})/; /a$r/ ) have been called. I have documented this as subject to change for now.

Over the last month I have averaged 8.8 hours per week

As of 2013/04/30: since the beginning of the grant:

164.5 weeks
1656.6 total hours
10.1 average hours per week

There are 43 hours left on the grant.

Report for period 2013/04/01 to 2013/04/30 inclusive


Effort (HH::MM):

3:08 diagnosing bugs
35:27 fixing bugs
0:00 reviewing other people's bug fixes
0:00 reviewing ticket histories
0:00 review the ticket queue (triage)
38:35 Total

Numbers of tickets closed:

3 tickets closed that have been worked on
0 tickets closed related to bugs that have been fixed
0 tickets closed that were reviewed but not worked on (triage)
3 Total

Short Detail

7:35 [perl #113928] caller behaving unexpectedly in re-evals
19:33 [perl #115004] perl 5.17.x can't use @var in regexp, but only $var
11:27 [perl #116823] Regexp::Grammars broken since 5.17.1

About TPF

The Perl Foundation - supporting the Perl community since 2000. Find out more at

About this Entry

This page contains a single entry by Karen Pauley published on May 10, 2013 3:02 AM.

Alien::Base Grant - Report #9 (Final) was the previous entry in this blog.

Grant Application: Maintaining Perl 5 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


OpenID accepted here Learn more about OpenID
Powered by Movable Type 6.2.2