_Nicholas Clark writes:_
As per my grant conditions, here is a report for the June period.
bq. The nice thing about standards is that you have so many to choose from (Andrew S. Tanenbaum).
I guess the same can be said about build systems.
So the structural intent of the build is
# Permit the user to choose configuration options
# Build the package (which may take some time, and shouldn't need user intervention)
# Test the package, and collate all test results into one report at the end (an excuse for a second tea break)
# Install the package (which probably runs with elevated privileges)
As well as trying to avoid a long period where a human needs to babysit the build in case it stops to ask a question, this approach also has the benefit that you find out by the end of configuration what extensions the build stage should be producing. Or, more importantly (compared with at least one other similar language), you don't need to wait until the end of the build run to discover that an extension you really needed isn't built, and then have to iterate the entire configure & build steps until you figure out the correct form of rubber chicken sacrifice to make it all work.
Of course, the problem is that for step 1 one can't assume you have a copy of Perl already (because how did it get built?) so the configuration system has to run using native tools. And the more platforms the package is ported to, the more variations of native tools you have.
So, on *nix and VMS, where the OS, architecture and even the make utility will vary, the configuration script figures out which extensions are shipped by scanning the file system, because even the Makefile has to be programmatically generated to cope with platform quirks. On Win32 variations are a lot less, so it's viable to ship a pair of Makefiles which between them cover all the common make variants. Hence on Win32 configuration is implemented by changing options in the appropriate Makefile, and the build determines which extensions are wanted by combining those options with a scan done by the (uninstalled) FindExt module.
So that's a Perl module right? Which means that we can test it in a platform-independent way. Which turned out to be useful back in 2009 when I was working out how to move modules to cpan/ dist/ and ext/ as part of the big rearranging to make dual life a lot simpler, as I could mostly verify that my changes were going to work on Win32 without having any direct access to a Win32 system to test it. The tests written for that purpose were robust enough that they were moved to t/porting and run as standard, which verifies that the logic in FindExt is consistent with that of Configure.
However we weren't able to test everything. We couldn't correctly test the list of static extensions due to various problems, and list of dynamically built extensions failed match due to 2 discrepancies between Configure logic and FindExt.
Firstly, due to a typo in checking defines in %Config::Config, FindExt thought that I18N::Langinfo would never be built (whereas it is built on most *nix systems). So I fixed that, and everything now passed on *nix. However, the test still failed on Win32, thanks to a problem that was a bit more convoluted. In replicating Configure's logic, FindExt thought that ODBM_File *would* be built on Win32, because win32 canned configs had i_rpcsvcdbm set to define. What on Earth is i_rpcsvcdbm?
bq. This variable conditionally defines the I_RPCSVC_DBM symbol, which indicates to the C program that exists and should be included. Some System V systems might need this instead of .
Eh? Win32 is most definitely not an ancient System V Unix, and won't repeat the same old quirks (it has brave new quirks instead). It turned out that FindExt was quite correct, and the canned configs (and header files) had been wrong since 1997. The problem hadn't been spotted because the Win32 configuration explicitly says not to build ODBM_File. Now it's correct. Combine all this with fixes by (at least) Steve Hay and Tony Cook, and it's now possible to test that FindExt and Configure agree on which extensions are to be built, and which are dynamically linked, which are statically linked, and which are non-XS. While these changes of low utility themselves, all this would prove useful to unravelling more of the build complexity.
I spotted a way to remove a few more tangles from the build, on *nix, VMS and Win32. It's always fun having to juggle three different objects together, and this was no exception.
The build has never depended on having Perl installed. Perl's portability was able to scale to multiple architectures and OSes by
# having the configuration system compile and *run* test programs to find out what works, and what needs to be worked around
# bootstrapping as quickly as possible to a minimally working perl and then writing as much of the rest of the build infrastructure once, in Perl.
Attempting to adapt that to also permit cross-compiling is hard, which is why it hasn't happened (yet). But all our build tools cross compile nicely. (On *nix, that would be sh, sed, awk, grep, make, cc.) Hence one can bootstrap Perl 5 onto a new platform, albeit in a rather round about way, by first bootstrapping a native toolchain.
The various platform Makefiles contain the logic to try to get from some C source to "working miniperl" as rapidly as possible. Part of the fun is that a lot of the modules that are needed to "work" are actually dual life, hence are shipped in dist/ or cpan/, and some modules, most importantly Config, need to be generated from the platform specific build files. Additionally, the build needs to be able to run in parallel*, which means that
# it's beneficial to split build tasks as small as possible to maximise concurrency
# it's necessary for every task to know its pre-requisites, so that make won't accidentally run a rule before something it depended on gets built
(or, how this actually manifests - the build fails some of the time due to a race condition caused by a missing dependency, and it's very hard to recreate and track down.)
Hence the build rules for things early in the build ended up being quite tightly coupled to everything else early in the build, because as soon as one changes where a file is located, or how it is built, all its explicit and implicit dependencies have to be updated.
One particularly "big" dependency (because it is very early) is the file lib/build_customize.pl. This is a key part of enabling the build to work at all. If "$INC/build_customize.pl" exists, then it's loaded by miniperl. The trick is that lib/build_customize.pl sets @INC to the absolute paths of all the toolchain modules in ext/, dist/ and cpan/, so that the toolchain can be shipped in an easy to maintain layout, but is capable of being loaded to install each module into lib/ without first being in lib/ In turn, lib/build_customize.pl is written by write_buildcustomize.pl using the pure-Perl code in Cwd, building on the existing cross platform nature of the Perl code to avoid having to produce 3 (or more) platform specific ways of converting directories to absolute paths.
Once lib/build_customize.pl is in place, just running `./miniperl -Ilib` is enough to make the otherwise unbuilt distribution behave enough like a "normal" *installed* perl that the rest of the build system doesn't need to set up anything special. The upshot of all this is that there's one small piece of code which works everywhere (win for the Perl build scripts), but every rule in the Makefile (and the Win32 Makefiles, and DESCRIP.MMK) needs to ensure that it exists.
What I realised was that by removing one little bit of concurrency it would be possible to simplify quite a lot of the other rules. Not just the direct simplification of only having one dependency, but also a more subtle simplification - once lib/build_customize.pl is in place, then Cwd is in @INC (being one of the toolchain modules that write_buildcustomize.pl locates) hence various other rules which previously had miniperl invoked with multiple -I options to ensure that the pure-Perl Cwd could be loaded from dist/ could now have all those extra -I options eliminated, as -Ilib does it all once lib/build_customize.pl exists.
Specifically, by combining the rule that links miniperl with the rule to generate lib/build_customize.pl, all this simplification would fall out. And, somewhat perversely, it's actually conceptually simpler to have the rule "officially" be for lib/build_customize.pl, with the miniperl rule depending on it, than the other way round, as this means that the rest of the Makefile(s) can depend on miniperl, which is much simpler to skim.
Of course, all this is only obvious in hindsight, and inevitably the devil is in the detail when it comes to actually getting it to work, and work reliably.
While removing the dependencies on [.lib]build_customize.pl from the the VMS makefile I noticed that for VMS there was a second dependency that featured heavily - [.lib.VMS]Filespec.pm - thanks to a requirement to copy it from [.vms.ext] before it could be used. And, bonus, more code to copy its test to [.t.lib]. All this was special case code, which could be completely eliminated if both files could be moved into a regular extension in the directory ext/VMS-Filespec, similar to ext/VMS-DCLsym and ext/VMS-Stdio, and like them only built on VMS. The only thing added would be one line in write_buildcustomize.pl to add ext/VMS-Filespec/lib to the toolchain @INC.
Of course, all this should be simple. But if it were simple, how come VMS::Filespec isn't already in ext/? After all, VMS::DCLsym and VMS::Stdio were both previously in vms/ext/, so how come all three weren't moved at the same time? After all, *nix and Win32 already know to not try to build or test VMS::DCLsym and VMS::Stdio, so why not add a third?
The answer (as ever) turns out to be another yak that needs shaving. VMS::DCLsym and VMS::Stdio are XS modules. The build and test infrastructure is quite capable of skipping XS modules. It has to be, because not all XS modules can be built everywhere. But for various reasons, none of which were really designed, it's not capable of not building a pure perl module. I was aware of this already, but now I had a real use case that it was preventing me from implementing, it was irritating enough that I had reason to fix it. Of course, it wasn't a small job, and consumed a good chunk of a second week too...
So, what prevents us from having a pure-Perl extension in ext/ but not building it? And how did it happen?
The situation we had reached was that there were 5 configuration variables:
bc. dynamic_ext: built dynamically linked XS modules
static_ext: built statically linked XS modules
nonxs_ext: built pure-Perl modules (from ext/, dist/ and cpan/)
extensions: "$dynamic_ext $static_ext $nonxs_ext"
known_extensions: *just* the XS modules shipped in ext/, dist/ and cpan/
with the upshot that "extensions" is typically much larger than "known_extensions". Daft.
This situation has come about through "organic growth", rather than design. I guess it's summarised as
# Perl 5 predates CPAN
# Originally ext/ only held XS code
# Originally there was no concept of dual-life - if you wanted the extensions in ext/, you had to build them with perl (There wasn't even a toolchain - you could add other extensions into ext/ and they would be build)
# 15 years ago was patched to add nonxs_ext (commit 4318d5a0158916ac) ready to support Errno (Errno was added about two weeks later in commit eab60bb1f2e96e20 [curiously that commit adds Errno to known_extensions but not to extensions]
# A few days later commit bfb7748a896459cc updates Configure so that nonxs_ext *are* in extensions, but are *not* in known_extensions. The description of the change is:
bq. Explicitly split list of extensions into 3 kinds: dynamic, static,
and non-xs. The Configure variable $extensions now holds all three.
(The only current non-xs extension is Errno).
bq. It also updates Porting/Glossary, explicitly changing the description of known_extensions from "list of all extensions included" to "list of all XS extensions included", and extensions from "all extension files linked into the package" to "all extension files (both XS and non-xs linked into the package."
[Note that Errno *is* architecture specific, so gets installed into the same directory subtree as all the shared objects]
Fast forward from 1998 to 2006
bq. 6. Commit 1d8961043b9b86e1 (or thereabouts) in April 2006 regenerates the sample config.sh to this:
bc. nonxs_ext='Compress/IO/Base Compress/IO/Zlib Compress/Zlib Errno'
bq. at which point, we have 3 more non-XS extensions, all of which are architecture independent.
Subsequent re-arranging of dual-life modules in 2009 means that we've got a lot more.
Effectively, the term "extensions" has been meaning "things we build via Makefile.PL" for at least 7 years, if not 15, despite what all the documentation tries to claim.
So after a lot of figuring out the why and how, and what it would likely break (answer, nothing), I patched the *nix and Win32 build systems to fix this. (I chickened out of figuring out enough DCL to deal with VMS. Craig Berry was kind enough to deal with that.)
So why did this even matter? Because whilst the build system was quite happy not building a pure-Perl module, all the tests for it would still be run (and fail), due to implementation details of how t/TEST (and thus also t/harness) decides what to skip. It refuses to skip anything unless it's in "known_extensions" but missing from "extensions". As Andy Dougherty observed after I submitted the patches to fix the build, nothing after Configure should actually use known_extensions. Hence t/TEST is arguably buggy and needs fixing. Maybe I could have used a smaller hammer if I had spotted the correct problem to hit. :-)
However, it's done now, and the distribution is saner for it. And it permitted the tests for FindExt to be made more comprehensive (and have fewer special cases and skips).
Whilst looking at the *nix Makefile a lot trying to figure out how to resolve the problems above, I noticed that there are quite a lot of short-cut targets. These are targets added to simplify running various commands, and I don't think that anyone uses. For example there were targets related to profiling and testing tools for Tru64 and Irix (pixie and Third Degree), for purify, quantify and purecov, targets to run the tests through B::Deparse, to convert the tests to UTF-8 or UTF-16 before running them, and to run the tests with -t to flag up taint warnings. (Plus, in some cases targets to combine two of the above actions.)
It's still perfectly possible to *run* any of the above programs by "hand" - no underlying functionality has been removed from the Makefile. It's just got a little bit shorter and a little bit clearer.
We also discovered a problem with the previously described refactoring of the initial build rules. While Father Chrysostomos was trying something out (which seriously broke the ability of miniperl to even parse code), his make went into an infinite loop calling itself recursively. Effectively, a fork bomb. This isn't supposed to happen - a build failure is supposed to stop, not take out one's machine.
The problem is that the *nix Makefile contains a lot of places where it calls back to itself *in the same directory* to build a different target. I'd been bitten by these some time ago. If things don't go as intended, you can end up with an infinite loop as each recursive invocation of make decides that the same thing needs doing first, and calling make again with the same arguments. It gets even worse running make in parallel.
I think that historically things had been done this way as a means to have various little utility commands or command sequences available, without having to clutter the build directory with a shell script for each desired "program", or repeating the same commands in multiple places in the Makefile. Even if you get it right (ie avoid the above problems) then I feel that it actually makes the build *less* clear, because you have to scan back through the same Makefile, and then work out if the target requested is stand alone, or going to have more side effects. Hence I'd considered these as a pain point some time ago, and had tried to work to eliminate them.
They actually even directly work against correctness. The miniperl build rules used to be this:
bc. $(LDLIBPTH) $(RUN) ./miniperl$(HOST_EXE_EXT) -w -Ilib -MExporter -e '> || $(MAKE) minitest
The intent is to be "helpful" and automatically run minitest if miniperl fails a basic sanity test. The problem is that minitest then looks like this:
bc. # Can't depend on lib/Config.pm because that might be where miniperl
# is crashing.
minitest: $(MINIPERL_EXE) minitest.prep
- cd t && (rm -f $(PERL_EXE); $(LNS) ../$(MINIPERL_EXE) $(PERL_EXE)) \
&& $(RUN_PERL) TEST base/*.t comp/*.t cmd/*.t run/*.t io/*.t re/*.t opbasic/*.t op/*.t uni/*.t