Improving Perl 5: Grant Report for Month 20

No Comments

Nicholas Clark writes:

As per my grant conditions, here is a report for the June period.

The nice thing about standards is that you have so many to choose from (Andrew S. Tanenbaum).

I guess the same can be said about build systems.

So the structural intent of the build is

  1. Permit the user to choose configuration options
  2. Build the package (which may take some time, and shouldn't need user intervention)
  3. Test the package, and collate all test results into one report at the end (an excuse for a second tea break)
  4. Install the package (which probably runs with elevated privileges)

As well as trying to avoid a long period where a human needs to babysit the build in case it stops to ask a question, this approach also has the benefit that you find out by the end of configuration what extensions the build stage should be producing. Or, more importantly (compared with at least one other similar language), you don't need to wait until the end of the build run to discover that an extension you really needed isn't built, and then have to iterate the entire configure & build steps until you figure out the correct form of rubber chicken sacrifice to make it all work.

Of course, the problem is that for step 1 one can't assume you have a copy of Perl already (because how did it get built?) so the configuration system has to run using native tools. And the more platforms the package is ported to, the more variations of native tools you have.

So, on *nix and VMS, where the OS, architecture and even the make utility will vary, the configuration script figures out which extensions are shipped by scanning the file system, because even the Makefile has to be programmatically generated to cope with platform quirks. On Win32 variations are a lot less, so it's viable to ship a pair of Makefiles which between them cover all the common make variants. Hence on Win32 configuration is implemented by changing options in the appropriate Makefile, and the build determines which extensions are wanted by combining those options with a scan done by the (uninstalled) FindExt module.

So that's a Perl module right? Which means that we can test it in a platform-independent way. Which turned out to be useful back in 2009 when I was working out how to move modules to cpan/ dist/ and ext/ as part of the big rearranging to make dual life a lot simpler, as I could mostly verify that my changes were going to work on Win32 without having any direct access to a Win32 system to test it. The tests written for that purpose were robust enough that they were moved to t/porting and run as standard, which verifies that the logic in FindExt is consistent with that of Configure.

However we weren't able to test everything. We couldn't correctly test the list of static extensions due to various problems, and list of dynamically built extensions failed match due to 2 discrepancies between Configure logic and FindExt.

Firstly, due to a typo in checking defines in %Config::Config, FindExt thought that I18N::Langinfo would never be built (whereas it is built on most *nix systems). So I fixed that, and everything now passed on *nix. However, the test still failed on Win32, thanks to a problem that was a bit more convoluted. In replicating Configure's logic, FindExt thought that ODBM_File *would* be built on Win32, because win32 canned configs had i_rpcsvcdbm set to define. What on Earth is i_rpcsvcdbm?

This variable conditionally defines the I_RPCSVC_DBM symbol, which indicates to the C program that exists and should be included. Some System V systems might need this instead of .

Eh? Win32 is most definitely not an ancient System V Unix, and won't repeat the same old quirks (it has brave new quirks instead). It turned out that FindExt was quite correct, and the canned configs (and header files) had been wrong since 1997. The problem hadn't been spotted because the Win32 configuration explicitly says not to build ODBM_File. Now it's correct. Combine all this with fixes by (at least) Steve Hay and Tony Cook, and it's now possible to test that FindExt and Configure agree on which extensions are to be built, and which are dynamically linked, which are statically linked, and which are non-XS. While these changes of low utility themselves, all this would prove useful to unravelling more of the build complexity.

I spotted a way to remove a few more tangles from the build, on *nix, VMS and Win32. It's always fun having to juggle three different objects together, and this was no exception.

The build has never depended on having Perl installed. Perl's portability was able to scale to multiple architectures and OSes by

  1. having the configuration system compile and run test programs to find out what works, and what needs to be worked around
  2. bootstrapping as quickly as possible to a minimally working perl and then writing as much of the rest of the build infrastructure once, in Perl.

Attempting to adapt that to also permit cross-compiling is hard, which is why it hasn't happened (yet). But all our build tools cross compile nicely. (On *nix, that would be sh, sed, awk, grep, make, cc.) Hence one can bootstrap Perl 5 onto a new platform, albeit in a rather round about way, by first bootstrapping a native toolchain.

The various platform Makefiles contain the logic to try to get from some C source to "working miniperl" as rapidly as possible. Part of the fun is that a lot of the modules that are needed to "work" are actually dual life, hence are shipped in dist/ or cpan/, and some modules, most importantly Config, need to be generated from the platform specific build files. Additionally, the build needs to be able to run in parallel*, which means that

  1. it's beneficial to split build tasks as small as possible to maximise concurrency
  2. it's necessary for every task to know its pre-requisites, so that make won't accidentally run a rule before something it depended on gets built

(or, how this actually manifests - the build fails some of the time due to a race condition caused by a missing dependency, and it's very hard to recreate and track down.)

Hence the build rules for things early in the build ended up being quite tightly coupled to everything else early in the build, because as soon as one changes where a file is located, or how it is built, all its explicit and implicit dependencies have to be updated.

One particularly "big" dependency (because it is very early) is the file lib/build_customize.pl. This is a key part of enabling the build to work at all. If "$INC0/build_customize.pl" exists, then it's loaded by miniperl. The trick is that lib/build_customize.pl sets @INC to the absolute paths of all the toolchain modules in ext/, dist/ and cpan/, so that the toolchain can be shipped in an easy to maintain layout, but is capable of being loaded to install each module into lib/ without first being in lib/ In turn, lib/build_customize.pl is written by write_buildcustomize.pl using the pure-Perl code in Cwd, building on the existing cross platform nature of the Perl code to avoid having to produce 3 (or more) platform specific ways of converting directories to absolute paths.

Once lib/build_customize.pl is in place, just running `./miniperl -Ilib` is enough to make the otherwise unbuilt distribution behave enough like a "normal" installed perl that the rest of the build system doesn't need to set up anything special. The upshot of all this is that there's one small piece of code which works everywhere (win for the Perl build scripts), but every rule in the Makefile (and the Win32 Makefiles, and DESCRIP.MMK) needs to ensure that it exists.

What I realised was that by removing one little bit of concurrency it would be possible to simplify quite a lot of the other rules. Not just the direct simplification of only having one dependency, but also a more subtle simplification - once lib/build_customize.pl is in place, then Cwd is in @INC (being one of the toolchain modules that write_buildcustomize.pl locates) hence various other rules which previously had miniperl invoked with multiple -I options to ensure that the pure-Perl Cwd could be loaded from dist/ could now have all those extra -I options eliminated, as -Ilib does it all once lib/build_customize.pl exists.

Specifically, by combining the rule that links miniperl with the rule to generate lib/build_customize.pl, all this simplification would fall out. And, somewhat perversely, it's actually conceptually simpler to have the rule "officially" be for lib/build_customize.pl, with the miniperl rule depending on it, than the other way round, as this means that the rest of the Makefile(s) can depend on miniperl, which is much simpler to skim.

Of course, all this is only obvious in hindsight, and inevitably the devil is in the detail when it comes to actually getting it to work, and work reliably.

While removing the dependencies on [.lib]build_customize.pl from the the VMS makefile I noticed that for VMS there was a second dependency that featured heavily - [.lib.VMS]Filespec.pm - thanks to a requirement to copy it from [.vms.ext] before it could be used. And, bonus, more code to copy its test to [.t.lib]. All this was special case code, which could be completely eliminated if both files could be moved into a regular extension in the directory ext/VMS-Filespec, similar to ext/VMS-DCLsym and ext/VMS-Stdio, and like them only built on VMS. The only thing added would be one line in write_buildcustomize.pl to add ext/VMS-Filespec/lib to the toolchain @INC.

Of course, all this should be simple. But if it were simple, how come VMS::Filespec isn't already in ext/? After all, VMS::DCLsym and VMS::Stdio were both previously in vms/ext/, so how come all three weren't moved at the same time? After all, *nix and Win32 already know to not try to build or test VMS::DCLsym and VMS::Stdio, so why not add a third?

The answer (as ever) turns out to be another yak that needs shaving. VMS::DCLsym and VMS::Stdio are XS modules. The build and test infrastructure is quite capable of skipping XS modules. It has to be, because not all XS modules can be built everywhere. But for various reasons, none of which were really designed, it's not capable of not building a pure perl module. I was aware of this already, but now I had a real use case that it was preventing me from implementing, it was irritating enough that I had reason to fix it. Of course, it wasn't a small job, and consumed a good chunk of a second week too...

So, what prevents us from having a pure-Perl extension in ext/ but not building it? And how did it happen?

The situation we had reached was that there were 5 configuration variables:

dynamic_ext:        built dynamically linked XS modules
static_ext:         built statically linked XS modules
nonxs_ext:          built pure-Perl modules (from ext/, dist/ and cpan/)
extensions:         "$dynamic_ext $static_ext $nonxs_ext"
known_extensions:   *just* the XS modules shipped in ext/, dist/ and cpan/

with the upshot that "extensions" is typically much larger than "known_extensions". Daft.

This situation has come about through "organic growth", rather than design. I guess it's summarised as

  1. Perl 5 predates CPAN
  2. Originally ext/ only held XS code
  3. Originally there was no concept of dual-life - if you wanted the extensions in ext/, you had to build them with perl (There wasn't even a toolchain - you could add other extensions into ext/ and they would be build)
  4. 15 years ago was patched to add nonxs_ext (commit 4318d5a0158916ac) ready to support Errno (Errno was added about two weeks later in commit eab60bb1f2e96e20 [curiously that commit adds Errno to known_extensions but not to extensions]
  5. A few days later commit bfb7748a896459cc updates Configure so that nonxs_ext are in extensions, but are not in known_extensions. The description of the change is:

Explicitly split list of extensions into 3 kinds: dynamic, static,
and non-xs. The Configure variable $extensions now holds all three.
(The only current non-xs extension is Errno).

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1998-07/msg00136.html

It also updates Porting/Glossary, explicitly changing the description of known_extensions from "list of all extensions included" to "list of all XS extensions included", and extensions from "all extension files linked into the package" to "all extension files (both XS and non-xs linked into the package."

[Note that Errno is architecture specific, so gets installed into the same directory subtree as all the shared objects]

Fast forward from 1998 to 2006

6. Commit 1d8961043b9b86e1 (or thereabouts) in April 2006 regenerates the sample config.sh to this:

       nonxs_ext='Compress/IO/Base Compress/IO/Zlib Compress/Zlib Errno'

at which point, we have 3 more non-XS extensions, all of which are architecture independent.

Subsequent re-arranging of dual-life modules in 2009 means that we've got a lot more.

Effectively, the term "extensions" has been meaning "things we build via Makefile.PL" for at least 7 years, if not 15, despite what all the documentation tries to claim.

So after a lot of figuring out the why and how, and what it would likely break (answer, nothing), I patched the *nix and Win32 build systems to fix this. (I chickened out of figuring out enough DCL to deal with VMS. Craig Berry was kind enough to deal with that.)

So why did this even matter? Because whilst the build system was quite happy not building a pure-Perl module, all the tests for it would still be run (and fail), due to implementation details of how t/TEST (and thus also t/harness) decides what to skip. It refuses to skip anything unless it's in "known_extensions" but missing from "extensions". As Andy Dougherty observed after I submitted the patches to fix the build, nothing after Configure should actually use known_extensions. Hence t/TEST is arguably buggy and needs fixing. Maybe I could have used a smaller hammer if I had spotted the correct problem to hit. :-)

However, it's done now, and the distribution is saner for it. And it permitted the tests for FindExt to be made more comprehensive (and have fewer special cases and skips).

Whilst looking at the *nix Makefile a lot trying to figure out how to resolve the problems above, I noticed that there are quite a lot of short-cut targets. These are targets added to simplify running various commands, and I don't think that anyone uses. For example there were targets related to profiling and testing tools for Tru64 and Irix (pixie and Third Degree), for purify, quantify and purecov, targets to run the tests through B::Deparse, to convert the tests to UTF-8 or UTF-16 before running them, and to run the tests with -t to flag up taint warnings. (Plus, in some cases targets to combine two of the above actions.)

It's still perfectly possible to run any of the above programs by "hand" - no underlying functionality has been removed from the Makefile. It's just got a little bit shorter and a little bit clearer.

We also discovered a problem with the previously described refactoring of the initial build rules. While Father Chrysostomos was trying something out (which seriously broke the ability of miniperl to even parse code), his make went into an infinite loop calling itself recursively. Effectively, a fork bomb. This isn't supposed to happen - a build failure is supposed to stop, not take out one's machine.

The problem is that the nix Makefile contains a lot of places where it calls back to itself *in the same directory to build a different target. I'd been bitten by these some time ago. If things don't go as intended, you can end up with an infinite loop as each recursive invocation of make decides that the same thing needs doing first, and calling make again with the same arguments. It gets even worse running make in parallel.

I think that historically things had been done this way as a means to have various little utility commands or command sequences available, without having to clutter the build directory with a shell script for each desired "program", or repeating the same commands in multiple places in the Makefile. Even if you get it right (ie avoid the above problems) then I feel that it actually makes the build less clear, because you have to scan back through the same Makefile, and then work out if the target requested is stand alone, or going to have more side effects. Hence I'd considered these as a pain point some time ago, and had tried to work to eliminate them.

They actually even directly work against correctness. The miniperl build rules used to be this:

$(LDLIBPTH) $(RUN) ./miniperl$(HOST_EXE_EXT) -w -Ilib -MExporter -e '<?> || $(MAKE) minitest

The intent is to be "helpful" and automatically run minitest if miniperl fails a basic sanity test. The problem is that minitest then looks like this:

# Can't depend on lib/Config.pm because that might be where miniperl
# is crashing.
minitest: $(MINIPERL_EXE) minitest.prep
        - cd t && (rm -f $(PERL_EXE); $(LNS) ../$(MINIPERL_EXE) $(PERL_EXE)) \
                && $(RUN_PERL) TEST base/*.t comp/*.t cmd/*.t run/*.t io/*.t re/*.t opbasic/*.t op/*.t uni/*.t </dev/tty

with a dependency on minitest.prep, which looks like this:

minitest.prep:
        -@test -f lib/Config.pm || $(MAKE) lib/Config.pm $(unidatafiles)
        @echo " "
        @echo "You may see some irrelevant test failures if you have been unable"
        @echo "to build lib/Config.pm, or the Unicode data files."
        @echo " "

Hence to avoid a recursive loop when attempting to helpfully run minitest automatically, it needs to recurse to a third level, and to skip doing so if lib/Config.pm already exists. Note, exists, not "is up to date".

ie correctness has been sacrificed, although it's not immediately obvious. The reason is that done this way, if you update the pre-requisites for lib/Config.pm, make won't automatically re-build it. Meaning that you may get bogus results if you edit them, and then re-run minitest to check your changes.

The problem here seemed to be that my other changes made things more fragile, and the fork bomb a lot more likely to trip. For now, I've removed the automatic recursion (with make) to run minitest, as it removes the fragility. Given that running minitest is one line, albeit rather long (as shown above) I think that it should be possible to have that run directly by the Makefile (without calling back to make to do it), but I can't quite see how to do it. It feels like it ought to be possible to merge it with the shell script that runs the regular tests, but I can't yet see a way that merges the two without using more code than doing it separately. Something is eluding me.

I also found a small but representative example of how the best of intentions don't always produce the best solution to a problem, actually increasing clutter.

a2p, the awk to Perl converter, is written in C. It dates from perl 1 time, so two years before the first ANSI C standard, and like perl 1 it started with the then classic 3-argument main() function:

main(argc,argv,env)
register int argc;
register char **argv;
register char **env;
{

K&R style was converted to ANSI style with commit f0f333f455368029 back in 1997 and it had stayed fundamentally the same ever since, although the register declarations have been removed, and const added. The perl interpreter's main() function has evolved in the same way.

Hence in 2005, when Jarkko cranked up the strictness on the Tru64 compiler, and fixed all issues that it warned about, he added the relevant pragma to both perl and a2p to stop the compiler warning about the non (ANSI) standard third parameter. Seems sane.

What no-one noticed was that unlike perl, a2p's main() doesn't actually use the env parameter, so a better solution is to remove it. Which means that the pragma can be removed too. So that's 4 lines gone, and 1 line simplified.

Each of these sort of things on their own isn't really a problem, and really aren't a priority to find, let alone fix. But there are potentially many things which could be terser, tidier and clearer, and the sum of all the little bits of suboptimal verbosity mounts up, making the core's code harder for everyone to follow. Hence it seems sane to tackle them as and when they are found, if there's an obvious simple safe fix.

The end of the month was quiet because we were visiting my parents. It was only planned to be partly a holiday, but no plan survives contact with the enemy (or good weather).

As the network at my parents is whatever we bring with us, I concentrated on things that could be done locally. George's clang smoker** had been showing failures for configurations with -Accflags=-DPERL_GLOBAL_STRUCT_PRIVATE when built with with clang's address sanitizer. PERL_GLOBAL_STRUCT_PRIVATE is a build option intended for extremist embedding - not just no global variables, but even the variable used to hold the address of the structure wraps the globals is itself hidden behind a function.

I had thought that the problems were fundamentally insoluble, due to conflicting requirements between freeing that structure within global destruction, versus code needing to look into it (to get the thread local context) in the routine that called perl_destruct(). However, it turned out that there is no fundamental conflict. The "use after free" error was actually more obscure than that - it was actually code run by atexit() after main() returns which was the problem, and the code wasn't defensive enough to cope. As that code was just checking for a flag, the fix was as simple as setting a variable to NULL after calling free(), and adding a NULL check in the routine called by atexit().

Other problems that ASAN reported were mostly caused as a side effect of how PERL_GLOBAL_STRUCT_PRIVATE is the only configuration that allocates storage for the globals using malloc(). Every other configuration has the globals as actual globals (either individual variables, or a structure which is global), which results in them being zero initialised. Hence the setup code for PERL_GLOBAL_STRUCT_PRIVATE needs to zero a couple more globals.

The final problem it revealed wasn't specific to PERL_GLOBAL_STRUCT_PRIVATE, but we hadn't noticed it before on any other configuration with the existing test cases. There had been a long standing bug that perl didn't cope correctly with a here-doc at the end of the script without a final newline (RT #65838). The fix for this in some cases could end up reading from free()d memory, if a particular buffer needed to be resized. However, the code itself is only run if the Perl program ends with a heredoc (which is an unusual structure), and if the last line of the file on disk has no terminating newline character (which is also unusual as many editors default to adding a final newline). Hence it's pretty rare to hit it.

A more detailed breakdown summarised from the weekly reports. In these:

16 hex digits refer to commits in http://perl5.git.perl.org/perl.git
RT #... is a bug in https://rt.perl.org/rt3/
CPAN #... is a bug in https://rt.cpan.org/Public/

HoursActivity
1.00APIs
1.00ASAN causing Makefile loop
0.25File::Spec XS
0.25FindExt
14.25Makefile target pruning
0.25Porting/Maintainers.pl
3.75RE_TRACK_PATTERN_OFFSETS/parse_start
0.25RT #109744
0.25RT #114576
0.25RT #118175
0.25RT #118195
10.50RT #118283
0.50RT #118365
1.00RT #118509
3.50RT #118549
2.00RT #118603
0.25RT #118653
1.00RT #38812
0.25RT #40403
0.25RT #47467
1.00RT #67114
0.25Regexp::Grammars
6.50Storable
Storable (HP/UX)
6.25VMS
VMS-Filespec and known_extensions
1.50Win32 & i_rpcsvcdbm
2.25Win32/FindExt
0.50a2p
2.75dots
3.25failures under -DPERL_GLOBAL_STRUCT_PRIVATE
21.25known_extensions, unbuilt non-XS extensions
0.25lib/perlmodlib.PL
16.25miniperl Makefile bootstrap ordering
2.75process, scalability, mentoring
12.75reading/responding to list mail
0.50smoke-me branches
0.25static build on Win32
0.50static extensions
4.00toke.c heredoc at EOF
0.25utils/
0.50what does "deprecated" mean?

124.50 hours total

* Being able to run the build in parallel is cheaper than increasing the number of hours in the day. Although I'm sure if you ask nicely on the Internet, someone will offer to take money from you to implement the latter solution. :-)

** http://m-l.org/~perl/smoke/perl/linux/blead_clang_sanitize=address/?C=M;O=D

Leave a comment

About TPF

The Perl Foundation - supporting the Perl community since 2000. Find out more at www.perlfoundation.org.

About this Entry

This page contains a single entry by Karen published on July 31, 2013 9:53 AM.

Improving Perl 5: Grant Report for Month 19 was the previous entry in this blog.

Improving Perl 5 Grant Extended is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.38