Improving Perl 5: Grant Report for Month 18

No Comments

Nicholas Clark writes:

gcc 4.8.0 was released on 22nd March. This version of gcc has integrated Address Sanitizer, a Google project to provide a fast runtime heap and stack validation tool. I set it off building gcc from source, which pleasingly worked first time on the system I chose for the task. In turn blead built with it without problems, which is good in itself, and a known good starting point for building with Address Sanitizer enabled.

I didn't expect to find many problems with gcc's ASAN, as I believe we're currently clean when running under valgrind, and when building with the ASAN in clang 3.1. Fortunately I was right - I didn't find many. Unfortunately it wasn't zero, as I did find 3 problems, all of which are now fixed in blead.

Firstly, PerlIO_find_layer() was using the wrong length for memEQ(). So if it was looking for "perlio", it would compare 6 bytes, even if the current layer was named "unix". I believe that it's unlikely to be a problem in production (in this example, as "u" != "p", the comparison will fail without needing data from beyond the end of the shorter name). It would only exhibit itself if someone had layers named where one is initial substring of other and the random memory read happened to be identical to the longer layer, and if so would result in the shorter named layer being used where the longer should have been found. It is now fixed in blead and will be in v5.18.0.

Secondly, a test for heredoc terminators was sometimes reading beyond the end in a memNE(). The failing test case was for an unterminated heredoc used inside code a string eval, something which used actually to SEGV until fixed by Father Chrysostomos post v5.16.0. I think that the bug isn't triggered without something similarly obscure, but it is now fixed.

Thirdly, the memory referenced by the CV for &DynaLoader::boot_DynaLoader to record information about the C code used to generate it was pointing to the C stack, and so anything reading from it would pick up garbage. The tests for B::Deparse happened to read this name while searching for a different CV. The worst case would be that code using B would have a Perl string containing some of the contents of the C stack, when they were expecting something sensible. This is only going to cause bugs if that code was very interested in &DynaLoader::boot_DynaLoader specifically, and it's fixed now.

Something which might seem to be simple, but is actually an inherently complex problem, is the build system. The distribution needs to bootstrap itself up from source files to a complete working tested perl, without assuming that there is already a perl installed on the system (else you have a chicken and egg problem), but also without getting confused if there is a perl installed on the system (as it is probably a different version, or has incompatible build options). Moreover, most of the build system is written in Perl (being the same code as the CPAN tool chain), so that has to run against an uninstalled perl, and without touching files outside the build directory (so it can't use ~/.cpan) or talking to the Internet. Finally, there's no single tool you can rely on existing with which to start the bootstrap (DCL on Win32? Shell scripts on a vanilla VMS install? Batch files on *nix?), so there's more than one of it to get your head around.

[For completeness, I should also note that

  1. there's a clear distinction between the steps for configuration, building and testing*,
  2. that all tests, core and modules, are run as one stage with the results reported together
  3. that where possible, the build and tests run in parallel, which speeds up the build & test cycle by a factor of at least 8 on decent current hardware, which is a significant productivity boost if you're doing this on a daily or hourly basis

]

Anyway, the reason this is relevant is that Zefram submitted a patch which provides XS implementations for some of the key methods in File::Spec. Whilst we agreed that for something as fundamental as File::Spec, it's too close to v5.18.0 to safely put it in, I did try to make a branch of blead, to test that it worked in blead.

This turned out to be rather more involved that I thought it would be.

I expected it to be potentially interesting, because File::Spec and Cwd are used early during the bootstrap of the toolchain, which means that a bunch of scripts early in the build process need to be able to run it (pure-Perl) from dist/

But it proved to be a bit more interesting that just that. For starters, I hit a failure mode that I wasn't expecting. We have this rule to create lib/buildcustomze.pl, the setup file which primes miniperl to be able to run uninstalled and change directory. (Which you need for the build):

    lib/buildcustomize.pl: $(MINIPERL_EXE) write_buildcustomize.pl
            $(MINIPERL) write_buildcustomize.pl >lib/buildcustomize.pl

Problem is that the upgrade caused a compile time error in write_buildcustomize.pl, because Cwd now requires constant and re. It's roughly akin to this:

    $ git diff
    diff --git a/write_buildcustomize.pl b/write_buildcustomize.pl
    index ec3b36e..e8efec1 100644
    --- a/write_buildcustomize.pl
    +++ b/write_buildcustomize.pl
    @@ -1,5 +1,6 @@
     #!./miniperl -w
    +die;
     use strict;
     if (@ARGV) {
         my $dir = shift;
    $ make lib/buildcustomize.pl
    ./miniperl -Ilib write_buildcustomize.pl >lib/buildcustomize.pl
    Died at write_buildcustomize.pl line 3.
    make: *** [lib/buildcustomize.pl] Error 255
    $ echo $?
    2
    $ ls -l lib/buildcustomize.pl
    -rw-r--r-- 1 nick nick 0 Mar  2 16:15 lib/buildcustomize.pl

ie the build fails, but the generated file is not removed. So if you attempt to continue by running make again, that file is assumed to be good, and something ugly fails soon after for all the wrong reasons, spewing error messages that aren't related to the original problem.

So, having figured out that there is a second bug obscuring the real bug, it becomes easier to fix the actual causes :-) Although the bad news is that this means changes to the Win32 and VMS makefiles too. I pushed a tentative fix to the Makefile bootstrapping logic in smoke-me/nicholas/build-bootstrap, which George Greer's Win32 smoker seems fine with. However, I think I can see how to improve it by more use of buildcustomize (removing -Ilib), but things have been sufficiently busy that I've not had a chance to look further. In particular, we don't have any VMS smokers, so I'll have to test things manually on VMS, which makes it all more time consuming.

I also worked on a couple of things which ultimately didn't make the cut for code freeze. Partly this was due to to not being around for a chunk of March due to a trip to my parents and then a trip to Berlin for the German Perl Workshop.

[The German Perl Workshop was excellent. It had probably the best ever venue
for a conference social event - the Computer Games Museum:

http://www.computerspielemuseum.de/1210_Home.htm
]

gcc provides a function named __builtin_expect(), which is intended to permit the programmer to tell give an optimisation hint to the compiler's code generator about which code path is more common, so that the faster path through the generated code is for the more common case.

We added a probe for this a while back, and some macros (LIKELY and UNLIKEY) in perl.h, which expand to use __builtin_expect() when compiling with a gcc which supports it, and (obviously enough) everywhere else don't use it. However, we didn't then use it very much, as it turned out to be very hard to measure that these actually helped by speeding up code, and it seemed a bad use of time churning the source code, possibly introducing errors or inefficiencies, for no measurable gain.

Steffen Mueller recently added UNLIKELY and LIKELY (mostly UNLIKELY) to a few places in the code where one can be confident that they hold true. (And, as much, that the design and code presumes that this is the case, so they are documenting assumptions.) I had a look at the generated code on ARM, and it's quite different with the use of the macros. There is rather too much to figure out, but as best I could tell from the first diff hunk, it's inverting the sense of branches. I don't know how much branch prediction current ARMs have (15 years ago they had none), so this is going to make a difference of a cycle per branch avoided. Which isn't going to be much, but is in the right direction provided the annotation is more correct than the compiler's guesswork. Which is probably going to be true for the easy to reason cases.

This caused Zefram to notice that the macros as added, are actually subtly wrong:

    #ifdef HAS_BUILTIN_EXPECT
    #  define EXPECT(expr,val)                  __builtin_expect(expr,val)
    #else
    #  define EXPECT(expr,val)                  (expr)
    #endif
    #define LIKELY(cond)                        EXPECT(cond,1)
    #define UNLIKELY(cond)                      EXPECT(cond,0)

The intent is that one passes them an expression that is likely true, or something that is likely false, and by implication that that expression is considered to be a boolean. However, the documentation of __builtin_expect() is

    long __builtin_expect (long exp, long c)
    The return value is the value of exp, which should be an integral
    expression. The semantics of the built-in are that it is expected that
    exp == c.

ie, that long integer comparison is used for the evaluation. This raises the potential of bugs. In particular, in C, NULL pointers are false, non-NULL pointers are true. So, one would expect to write LIKELY to indicate that one expects that pointer is non-NULL. However, the way the code currently is, that will compare the pointer with the integer 1 (cast to a pointer), and declare to the code generator that this comparison is likely to be true. Which, of course, it almost certainly is not. So for that case it's completely counter-productive.

Those macros should change to this:

    #define LIKELY(cond)                        EXPECT(cBOOL(cond),TRUE)
    #define UNLIKELY(cond)                      EXPECT(cBOOL(cond),FALSE)

(using our cast-to-bool macro, to avoid other problems some C99 compilers have with the bool type), and they are now pushed to a smoke-me branch to verify that they are OK, and ensure that they don't get lost.

However, that also changes the return value of LIKELY and UNLIKELY - before it was the value passed in, now it's TRUE or FALSE. This might just matter, so I searched CPAN to ensure that nothing was using the return value of our macros. All seemed clear, although the search was complicated by the fact that several XS modules define their own macros. However, as it's not a regression, and it is a potential behaviour change, it didn't seem the right time to change them on blead with the code freeze and release imminent.

The other thing I worked on which didn't make the code freeze was a fix for RT #114576. The bug is an unintended performance regression introduced as a side effect of a space optimisation. It's not new - it dates from three years ago, and shipped with v5.12.0. The desired to optimise (and hence make trade offs) comes because their is only one "hash" type in Perl 5, and it's horribly overloaded. It's used for 3 arguably disjoint purposes

  1. the program's associative arrays
  2. the internals of objects
  3. the interpreter's symbol tables

and sadly those three have differing needs. If a program is storing data in a few associative arrays, the fixed overhead of each is not significant, but access and manipulation costs are. If a program is creating lots of objects, then the fixed overheads rapidly start to mount up, and can be comparable in size to the object members themselves. Plus, almost no code does things like iterate over the members of an object, but plenty of code does just that with an associative array.

The second tension is that the value of a hash in scalar context is documented as being 0 if the hash is empty, otherwise a truthful value that is fiddly to format, and requires overhead to calculate. Almost no code is actually using the details of that truthful value, but it's documented behaviour, longstanding, and we try to avoid changing things that would break working code.

So the optimisation assumed that nearly all boolean cases were spotted by the peephole optimiser, which would permit the implementation to "cheat" and return 0 or "truth", rather than the documented value. This would mean that it was viable to avoid storing information on the number of hash chains populated in every hash, shrinking every hash (and hence every object) by one integer.

The regression is that there are cases where this doesn't work out. I have a fix (which avoids increasing the size of objects, but restores performance for large hashes) pushed to a smoke-me branch, but as I was away and busy just before the code freeze, and with the priority being on security related hash fixes, it didn't get considered for merging. I believe that it's technically viable to backport it later as a bug fix, if it meets the other review criteria.

Sadly not all progress is forwards.

I worked on on RT #116943, "deprecating ** and friends", attempting to get it ready for Ricardo to approve it going into blead before v5.17.10.

The magic variable $* was used to enable multi-line matching, but has been deprecated since 5.000, superseded by the /s and /m modifiers. The magic was removed in v5.10.0, and the (then default-off) deprecation warning when using it replaced by a default-on warning that it no longer does anything:

    $ ~/Sandpit/5000/bin/perl -lwe '$* = 1
    Use of $* is deprecated at -e line 1.
    $ ~/Sandpit/5100/bin/perl -e '$* = 1'
    $* is no longer supported at -e line 1.

The other variables in the typeglob, @*, &*, ** and %*, have never had any special meaning.

The goal of the changes is to permit the parser to be modified in future to stop treating $* etc as magic variables, and instead enable * to be used as part of new future syntax, for example $*foo, @*bar and %*baz. Searching CPAN for existing users is tricky, as strings such as @* and %* seem to produce a lot of false positives, but after weeding out non-Perl code, and format strings, it seemed that there are virtually no users, hence the trade-off seemed worthwhile - remove something unused by virtually all production code, to enable new features that will be of use to production code.

So I set off to replicate the existing code for warning on use of $* (the SCALAR in the typeglob) into the other users of the typeglob. On the way I discovered that several regression tests are using @*. Some use it because it's a convenient punctuation variable to test, and others were using it to exploit a bug - mention @* first, and $* doesn't warn:

    $ ~/Sandpit/5000/bin/perl -lwe '$* = 1; print $]'
    Use of $* is deprecated at -e line 1.
    5.000
    $ ~/Sandpit/5000/bin/perl -lwe '0 if @*; $* = 1; print $]'
    5.000

So I fixed that bug, verified that the other tests could be rewritten using other punctuation variables, and so was now ready to "deprecate" the rest, as there were now no use cases outside the tests for warnings.

This I did, pushed the code as a smoke-me, and then when that exposed no problems, checked with Ricardo and merged it to blead in time for v5.15.10

Unfortunately, that then revealed the actual problems with this approach. Which, I guess at least proves me right on one thing - I wanted to get it into a blead release to be confident that it wasn't going to cause surprises. It's much better to get your surprises now, rather than in a release candidate, or even worse, after you ship and you can't turn the clock back. But the problems are for a future report.

At the start of March, because it was fresh in my head, I continued working on Unicode name lookup, finishing off the Perl prototype code by implementing the algorithmic lookup for CJK ideographs and Hangul syllables. CJK ideographs are easy, Hangul syllables less so. Currently Perl 5 uses a regex for them, but accessing the regex engine from C is, well, harder work than setting up the call stack and calling back into Perl. Plus neither are pretty, nor are they simple. Instead I found a way to implement them in fairly simple C, partly by calling back into the trie lookup code used for all the other names, and completed a prototype of this in Perl, Perl written to be easy to transcode to C. With this milestone completed, I stopped.

Three weeks later, needing a change after head-hurting C debugging, I returned to it. A Perl prototype is all very well, but it had but no C data structures [to validate that my estimates of size were correct], and no C code to read them [to validate that it would work at all! :-)] So I fixed this, by writing code to generate C data structures for the needed tables, and then by transcoding the prototype lookup code from Perl to C. This approach worked particularly well - as I had all this done in two days, and the C code worked almost first time (the only error was actually in one of the constants in the code for Hangul syllables). The trick is to write your Perl like C - obviously no regular expressions, but maybe less obviously, write really clunky code that chews through strings using ord, s/\A.// and substr, and expresses all comparisons using nothing more complex than == or eq.

I cross checked the generated C code against a set of "Golden Results" generated from Perl and it performed perfectly. (I was hoping for this as the Perl prototype code also performed perfectly, but a lot lot more slowly.) As I was on a roll, I continued by implementing some space optimisations for data representation which cut the size of the tables down from about 900K to 400K. At this point I have small stand alone executable which can lookup Unicode code points by name. All that's left is the small matter of integrating it into toke.c, and the build system. All...

I also tried to build and test blead on my Raspberry Pi. This used to work (it takes a few hours, but runs to completion). It doesn't now - it aborts somewhere near the end of the tests, seemingly due to running out of memory. So what changed? I set the pi off smoking to figure it out:

bqc export TEST_JOBS=1; bisect.pl -j1 --start 9f68b0f7acd1bcb04e7baa4bdb7cfec8e5d985c8 --end blead^ -Doptimize=-g -Dusethreads -Duse64bitint -- sh -c 'cd t && ./perl harness; echo $?; ls -l test_state'

ie bisect.pl builds up to C (the default), and then the test case is to run the entire test suite under t/harness, and see whether t/harness runs to completion and writes out the test state in t/test_state.

Given that each iteration of the bisection will take something like 7¼ hours, this was going to take a while...

It eventually produced this, after more than 3 days non-stop on the job:

    HEAD is now at 3283393 Restrict the valid identifier syntax, fix some identifier bugs.
    bad - non-zero exit from sh -c cd t && ./perl harness; echo $?; ls -l test_state
    32833930e32dc619abdaaab54e88de2a2765fb86 is the first bad commit
    commit 32833930e32dc619abdaaab54e88de2a2765fb86
    Author: Brian Fraser <[email protected]>
    Date:   Tue Mar 5 18:18:49 2013 -0300
        Restrict the valid identifier syntax, fix some identifier bugs.
        Fixes:
            * Length-one identifiers are now restricted to
                [\p{XIDS}\p{POSIX_Punct}\p{POSIX_Digit}\p{POSIX_Cntrl}]
              plus, if under 'no utf8', the 128 non-ASCII characters in the
              Latin1 range.
            * Identifiers that start with ASCII letters can be followed with
              XIDC characters
        (The committer made some small edits in the pod)
    :100644 100644 e8f5402a77dd011f065f7d91db354fdfd6da6830 8ac08abf50107cbfba3a8296df34556449134ba6 M      gv.c
    :040000 040000 9ae3a108394119259ca10d38ef44ef23c237fb91 3a55dc5feeea97da7346fc74df0601d3200e32e3 M      pod
    :040000 040000 695729610bd917237944558cfe729d529dfb6514 714c16587ae882fc635c17fd82362e5ff804f16b M      t
    :100644 100644 27485462ffca6939c895961092a743775401fa14 1ea0da7fde5aa50521ffe68378a87980048578cd M      toke.c
    bisect run success
    That took 274968 seconds

If I reverse just the test cases added by that commit then the tests all pass:

    Test Summary Report
    -------------------
    uni/variables.t                                                 (Wstat: 0 Tests: 5 Failed: 0)
      TODO passed:   5
    Files=2388, Tests=620014, 18222 wallclock secs (2020.18 usr 55.35 sys + 13406.55 cusr 290.67 csys = 15772.75 CPU)
    Result: PASS

whereas with that test restored again, the harness fails:

    ../lib/charnames.t ................................................ ok
    Could not execute (./perl -I.. -MTestInit=U1 ../lib/diagnostics.t): open3: fork failed: Cannot allocate memory 
at ../lib/TAP/Parser/Iterator/Process.pm line 168.
    make: *** [test_harness] Error 12

If I double the amount of swap available, everything passes once more:

    All tests successful.
    Files=2388, Tests=685775, 18699 wallclock secs (2216.11 usr 60.91 sys + 14095.49 cusr 300.45 csys = 16672.96 CPU)
Result: PASS

So, yes, it's to do with the amount of memory available. Test::Builder is innocent. That test itself is innocent. All it does wrong is to add a lot of testcases:

    -plan (tests => 5);
    +plan (tests => 65850);

Which makes me wonder - just how much memory is TAP::Parser using as it remembers the results from all the test scripts run?

I've sent my findings to the perl-qa list, in the hope that someone finds it an interesting problem to dig into, and produces a patch to the code that reduces it for the common case. Source code being here:

https://github.com/Perl-Toolchain-Gang/Test-Harness

A more detailed breakdown summarised from the weekly reports. In these:

16 hex digits refer to commits in http://perl5.git.perl.org/perl.git
RT #... is a bug in https://rt.perl.org/rt3/
CPAN #... is a bug in https://rt.cpan.org/Public/
BBC is "bleadperl breaks CPAN" - Andreas König's test reports for CPAN modules

HoursActivity
1.25ASAN
EBCDIC
2.75File::Spec & bootstrap
2.00LIKELY/UNLIKELY
0.50PL_stderrgv
6.50RT #114576
1.50RT #116477, RT #116615
0.75RT #116833
14.00RT #116943 -- deprecating ** and friends
1.00RT #116971
0.25RT #117141
0.25RT #117315
37.00Unicode
Unicode Names
0.75bison 2.7
1.00clang ASAN
9.75gcc 4.8.0
gcc 4.8.0 ASAN
4.25pahole interpreter struct
2.00process, scalability, mentoring
0.75randomised iteration
34.00reading/responding to list mail
0.25regcomp/setjmp
0.50smoke pi
smoke pi (when did blead exhaust swap)

121.00 hours total

* ie you don't end up with the *build* finishing with "oh, I couldn't actually build this extension. See some buried log for what you need to do before cleaning and restarting". Which was a source of much merriment with one other language which I found myself iteratively re-building until I could get it to build the extension that was a requirement for the tool I ultimately needed.

Leave a comment

About TPF

The Perl Foundation - supporting the Perl community since 2000. Find out more at www.perlfoundation.org.

About this Entry

This page contains a single entry by Karen published on April 25, 2013 5:22 AM.

YAPC::NA Call for Sponsors was the previous entry in this blog.

2013Q2 GP: Next Release of Pinto With Key Features is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.38