June 2013 Archives

In accordance with the terms of my grant from TPF this is the monthly
report for my work on improving Devel::Cover covering May 2013.

In May Perl 5.18.0 was released. 5.18.0 introduced a new padrange operator
and I had failed to note the full implications for Devel::Cover. So, a
couple of days after 5.18.0 was released I released Devel::Cover 1.03 which
works correctly with the new operator, and contains a new test to ensure it
stays that way.

A new major release of Perl also means that I was able to clean out all the
tests for 5.17.x, and start testing for 5.19.0. I also updated cpancover.com
to use 5.18.0.

And speaking of cpancover, I put together the work which was done during the
QA Hackathon on perl5cover and got it up and running on cpancover.com.
Perl5cover is the project to provide useful Perl and C level code coverage for
the Perl core. There is a link at the bottom of the main page to
http://cpancover.com/blead/latest/coverage.html where you can see the results.
There is still work to be done in this area. In particular, some of the
results are lower than they should be. I think there is a problem with
collecting coverage for modules which Devel::Cover itself uses. This doesn't
usually show up because I am quite conservative in the CPAN modules I use, but
I haven't worried about the core modules.

There are also too many of the core tests failing under coverage. This is not
surprising, and it's hopefully something which we will be able to improve on
in the future. Jim Keenan is the driving force behind this work, much of
which is being built upon code from Abe Timmerman.

At the Satellite QA Hackathon in Tokyo, Kan Fushihara put together
Devel::Cover::Report::Coveralls. If you haven't tried it out yet, you should
have a look. In order to get the travis/coveralls integration correct he
needed some changes to the cover program, for which he sent me patches which
were applied this month.

To integrate coveralls into your CI process, register your project at
travis-ci.org, and add the following lines to .travis.yml (seasoned to taste):

** cpanm -n Devel::Cover::Report::Coveralls**
** perl Makefile.PL && make && cover -test -report coveralls**

I fixed a long-standing bug in pod coverage in which subroutines in different
packages but within the same file were not covered correctly. This turned out
to be one of those bugs where all the effort was in the search, and the fix
was trivial.

And finally I started investigating speeding up Devel::Cover. Devel::Cover
goes to some effort to have as minimal an overhead as possible. One way it
does this is to try to collect coverage only about the files in which you are
interested. But the checks to determine whether an op is in a file for which
coverage is being collected are also reasonably expensive. There is scope for
improvement here. In particular, the subroutine that does the ultimate check
is written in Perl. Rewriting that in XS would be advantageous. But reducing
the number of times the subroutine is called is also useful. More about this
in June's report though.

Merged pull requests:

  • 61 fix return code (need shift)
  • 57 cover -test exit code is always success, when test fails

Closed RT tickets:

  • #34888 Fix pod coverage for multiple packages in a file.

You can see the commits at https://github.com/pjcj/Devel--Cover/commits/master

Hours worked:

  • 18.05 2:45
  • 20.05 3:30
  • 27.05 1:00
  • 30.05 2:30
  • 31.05 2:15
  • Total 12:00
  • Total hours worked on grant: 269:35

Nicholas Clark writes:

As per my grant conditions, here is a report for the April period.

I started the month looking at the Unicode Names code. At Karl's suggestion I changed it to parse the UnicodeData.txt file properly. Previously it had hardcoded various constants, particularly related to the CJK ideographs and Hangul syllables. The CJK ranges in Unicode have increased in the past, and so it's possible that they will increase again. Not only is it (more) future proof, it also made it simpler to detect significant gaps in the allocated character ranges, which is useful for size optimisations. By handing the gaps better I reduced the data size by 13K, and by using two sizes of arrays for the trie structure, saved a further 25K.

The intent of all this is to provide the data needed for the \N{} syntax directly as C code and static data, to avoid the tokeniser needing to load charnames if it sees \N{}. Given that the C code in question is generated by Perl, but to compile the Perl you need the C code, there's a potential bootstrapping problem here. Not wishing to ship >1M of generated code if avoidable, I experimented to see whether the \N{} escape syntax is needed by miniperl. It turns out that if you replace the \N{} handler by abort() in miniperl, you can still build perl perfectly. Excellent! Also, telling the Makefile to build distinct toke.o and tokemini.o is a one line change - it's nice when easy things are easy.

Frustratingly the work is not yet ready to merge into blead, as it's not yet finished enough, and other things keep taking priority.

We had a bit of a digression involving perl on s390. Merijn was given a CD for a linux distribution for s390, soon had it running on the emulator "Hercules". What does one do next? Obviously - try to build Perl. So he build blead (which worked) and ran the tests (which mostly worked). My initial reaction was:

Is anyone actually using Perl on it?

In that, we've not had any bug reports about this before, and for a lot of these somewhat esoteric platforms I'm sort of wondering at what point do they stop being "fun", and start being "work". Right now, I think it's more at the "fun" level, and it might be telling us something interesting about portability, as it might be that all these tests failing are down to the same problem.

The problems all seemed to be involve conversion of large floating point values to integers. Which we do quite a lot of, and should work, but historically we've been skirting the boundary of what's conformant ANSI C, sometimes on the wrong side. So Merijn and I exchanged code and results as I tried to remote debug what the cause was. We tried to establish whether it was even handling large unsigned integers correctly (it was). We tried to rule out unions and ? : ternaries (which have confused compilers in the past). Nope. In the end, we ascertained that it was a bug in the supplied gcc 4.3.4 - it generated bad code for casting from unsigned integers to doubles.

Dave Mitchell writes:

This month was mostly spent on removing global state from the regex engine, making re-entrantcy less error-prone. The extract from the merge commit description below gives you all the details you could ever want.

Apart from that I spent a few hours re-enabling Copy-on_Write by default post the 5.18.0 release, plus a few other bits and pieces.

It turns out that I have finally used up all the hours on my grant plus extensions. I really must get round to applying for a new grant sometime soon!

commit 7d75537ea64f99b6b8b8049465b6254f5d16c693
Merge: 3a74e0e 28d03b2
Author:     David Mitchell <[email protected]>
AuthorDate: Sun Jun 2 20:59:58 2013 +0100

[MERGE] get rid of (most) regex engine global state

Historically, perl's regex engine was based on Henry Spencer's regex code, which was all contained within a single file and used a bunch of static variables to maintain the state of the current regex compile or execution.

This was perfectly adequate when only a single thread could execute a regex, and where the regex engine couldn't be called re-entrantly.

In 5.0, these vars were promoted to be full global vars as perl became embeddable; then in 5.5 they became part of the perl interpreter struct when MULTIPLICITY was introduced.

In 5.6, the Perl_save_re_context() function was introduced that did a whole bunch of SAVEPPTR type stuff, and was called in various places where it was possible that the engine may be re-entered, to avoid overwriting the global state of the currently executing regex. This was particularly important now that Unicode had been introduced, and certain character classes could trigger a call to the perl-level SWASH code, which could itself execute a regex; and where /(?{ ... })/ code blocks could be called which could do likewise.

In 5.10, The various PL_foo variables became fields within the new re_save_state struct, and a new interpreter var, PL_reg_state, was introduced which was of type struct re_save_state. Thus, all the individual vars were still global state, but it became easier to save them en-mass in Perl_save_re_context() by just copying the re_save_state stuct onto the save stack and marking it with the new SAVEt_RE_STATE type. Perl_save_re_context() was also expanded to be responsible for saving all the current $1 values.

Up until now, that is roughly how things have remained, except for bug fixes such as discovering more places where Perl_save_re_context() needs to be called.

Note that, philosophically speaking at least, this is broken in two ways. First, there's no good reason for the internal current state of the executing regex engine to be stored in a bunch of global vars; and secondly we're relying on potential callers of the regex engine (like the magic tie code for example), to be responsible for being aware that they might trigger re-entrancy in the regex engine, and to thus do Perl_save_re_context() as a precaution. This is error-prone and hard to prove correct. (As an example, Perl_save_re_context() is only called in the tie code if the tie code in question is doing a tied PRINT on STDERR; clearly an unusual use case that someone spotted was buggy at some point).

The obvious fix, and the one performed by the series of commits in this merge, is to make all the global state local to the regex engine instead. Indeed, there is already a struct, regmatch_info, that is allocated as a local var in regexec(), then passed as an argument to the various lower-level functions called from regexec(). However, it only had limited use previously, so here we expand the number of functions where it is passed as an argument. In particular, it is now also created by re_intuit_start(), the other main run-time entry point to the regex engine.

However, there is a problem with this, in that various regex vars need cleaning up on croak (e.g. they point to a malloced buffer). Since the regmatch_info struct is just a local var on the C stack, it will be lost by the longjmp done by a croak() before leave_scope() can clear up.

To handle this, some fields that logically should go in regmatch_info, are instead added to two new structs: regmatch_info_aux and regmatch_info_aux_eval; the first represents all the normal fields that need some sort of cleanup handling; the second represents extra fields (also possibly needing cleanup) that are only only needed if the pattern contains /(?{})/ code blocks. These two structs are allocated in the next two free PL_regmatch_state stack slots - since these slots are allocated in 4K slabs anyway, they are essentially free of charge. A single destructor function, S_cleanup_regmatch_info_aux() is then used with SAVEDESTRUCTOR_X() to perform all cleanup at the end of execution.

In addition, all state and cleanup setup has been consolidated into a single point near the start of regexec(); previously it was spread across regexec(), regtry() and regmatch(). This used also to result in various inefficencies, such as PL_regmatch_state stack freeing all higher unused slabs at the end of each call to regmatch(), which might be called multiple times by regexec(). Now it just frees once.

As part of this series of fixes it was necessary to change the API of Perl_re_intuit_start(). This is because the API was broken: unlike Perl_regexec_flags(), it didn't have a strbeg arg, and would try to guess it from the SV (if any) passed to it. This could fail on overloaded SVs for example, or where its called without an SV (not done from core, but officially supported by the API). Note that this is likely to break Re::Engine plugins, plus any code which directly calls intuit.

Finally, note that although struct re_save_state and SAVEt_RE_STATE are gone, Perl_save_re_context() still does something useful: the equivalent of local($1,$2...). Fixing that properly is a whole separate kettle of fish, not addressed here.

As far as I'm aware, the only remaining global vars associated with the
regex engine are

    PL_reg_curpm, PL_regmatch_state, PL_regmatch_slab, PL_colors, PL_colorse

None of these are effected by re-entrancy. The state stack is, erm, a stack, so it can handle re-entrancy quite happily, and the others are safe too.

Over the last month I have averaged 9.3 hours per week

As of 2013/05/31: since the beginning of the grant:

168.7 weeks
1700.0 total hours
10.1 average hours per week

There are 0 hours left on the grant.

Report for period 2013/05/01 to 2013/05/31 inclusive


Effort (HH::MM):

3:30 diagnosing bugs
39:51 fixing bugs
0:00 reviewing other people's bug fixes
0:00 reviewing ticket histories
0:00 review the ticket queue (triage)
43:21 Total

Numbers of tickets closed:

1 tickets closed that have been worked on
0 tickets closed related to bugs that have been fixed
0 tickets closed that were reviewed but not worked on (triage)
1 Total

Short Detail

33:29 [perl #114878] Regular Expression matching in signal handler
2:00 [perl #116407] SvPVutf8 != SvPVX, and sv_2pvutf8
4:33 [perl #116569] Re: 5.17.7 breaks rules of assignment
1:49 [perl #117135] v5.17.9-80-g9f351b4 breaks SARTAK/Path-Dispatcher-1.04.tar.gz
1:00 [perl #117917] Bug in Regex-Engine (?)
0:30 [perl #118213] /$qr/p broken under 5.18.0

About TPF

The Perl Foundation - supporting the Perl community since 2000. Find out more at www.perlfoundation.org.

About this Archive

This page is an archive of entries from June 2013 listed from newest to oldest.

May 2013 is the previous archive.

July 2013 is the next archive.

Find recent content on the main index or look in the archives to find all content.


OpenID accepted here Learn more about OpenID
Powered by Movable Type 6.2.2