Nicholas Clark writes:
As per my grant conditions, here is a report for the July period.
The main theme for this month seems to be "clean up parts of the build".
perl was first developed on a Unix system, back in the times when there dozens of different Unix variants. Hence portability was initially across the different C libraries (and even C compilers - remember, Perl predates the C89 standard, let alone its implementation). Figuring out precisely what the system could(n't) do (correctly) was performed by a shell script, Configure, and it expanded its variables into files needed for the build by using simple shell scripts as templates.
It's still how configuration is done by perl 5, and "figure things out with a shell script" is the standard way that pretty much all open source software builds on *nix systems. There's an awful lot of useful hard-won knowledge distilled into these shell-based configuration systems, and whilst they aren't that pretty, they work, which to the end user is what matters.
The template expansion is necessary because to maintain sanity, it's necessary to maintain a clean distinction between files that are generated as part of the configuration and build, and files that shipped with the package (from a tarball, or from a version control system). There's implicitly a concept of who "owns" the file - the human using a text editor, or the build system.
Makefiles are one of the types of files whose contents need to vary based on things determined by Configure. Hence, it naturally falls out from the above description that the various subdirectories have Makefiles which are generated by templates implemented as shell scripts. This plan doesn't fare so well now that "portability" no longer just means to different *nix systems, but also to systems that don't even have shells.
A particular mess was the Makefile for the utils/ directory. Like the other Makefiles it had been being generated by running a Makefile.SH. Unsurprisingly this doesn't work on Win32, so the solution was to also check the generated Makefile into the repository. Just to add to the fun, here VMS is special, and has rules in its top-level Makefile (DESCRIP.MMS) to generate files in utils/, and completely ignores utils/Makefile.
This particular Makefile has very few configuration controlled changes, so typically the regenerated version was identical to the checked in version. Unfortunately for some configurations it would differ, with the result that configure && make of a clean tree would result in a dirty tree, with (seeming) unchecked in changes. This is wrong - building should never make a clean tree dirty.
The irony is that the Makefile generation doesn't need to be a shell script. By the time that utils/Makefile is needed, there is already a fully serviceable miniperl ready and waiting.
So I bit the bullet and replaced utils/Makefile.SH with utils/Makefile.PL Carefully. Firstly by replacing the shell script with a perl script that generates a byte-for-byte identical utils/Makefile (including boilerplate that claims it was generated by utils/Makefile.SH), which was called from the same place that the *nix Makefile had been calling Makefile.SH
With that working on *nix, I then added rules to the Win32 Makefiles to call utils/Makefile.PL to generate utils/Makefile, and rules to both them and the *nix Makefile to delete the generated file as part of the cleanup targets. With that done, I could check in a commit that deleted utils/Makefile from git, and we no longer had a file in the repository being overwritten as part of the build.
Tied up with maintaining sanity by keeping a clear distinction between "build products" and source code is a requirement to clean up everything. We can't just use git to do this for a variety of reasons
- Perl 5 runs on platforms on which git doesn't
- Perl 5 still needs to build from a tarball (or equivalent) even on platforms where git runs (for example, how Linux distributions manage versioning their upstream - ie us)
- git clean works by removing everything which is not known to git, rather than only things known to be built. Zapping everything this is "unhelpful" when you have work in progress which isn't ready for checking in, but still need to clean.
- git uses Perl 5.
(Circular dependencies are bad. It's why File::Temp wasn't able to switch from ExtUtils::MakeMaker to Module::Build, because Module::Build depends on File::Temp)
So the approach the core takes is to explicitly delete files and directories that were created, as this avoids inadvertently deleting the wrong thing, such as work in progress. But this does mean that the core needs to know the identity of all files that it builds. That can be easier "said" than done. For example...
As part of building XS extensions and dual life modules, their files are copied into the lib/ directory at the top level of the build tree. As lib/ doesn't start off empty, but contains various core-only modules, this makes cleanup complex. Fortunately ExtUtils::MakeMaker can be relied upon to delete any files that it copied there, but unfortunately it doesn't track which directories it needed to create. So this ends up with a section of manually maintained rmdir commands in Makefile.SH, and of course more work with rmdir /s /q commands in the Win32 Makefiles.
It had occurred to me some time ago that effectively this information about "what should we delete" is duplicated in .gitignore files. To ensure that git status on a build of a clean tree looks something like this:
$ git status # On branch blead nothing to commit (working directory clean)
even though the build created lib/Fcntl.pm as a copy of ext/Fcntl/Fcntl.pm (etc). We could have solved that by brute force telling git to ignore all files named *.pm, but that would be a bad plan, because then it wouldn't warn us if we've created a new file but forgotten about it, eg:
$ touch lib/Imposter.pm $ git status # On branch blead # Untracked files: # (use "git add <file>..." to include in what will be committed) # # lib/Imposter.pm nothing added to commit but untracked files present (use "git add" to track)
So we need to keep a list of files copied by the build.
Or do we? I had an insight that I'd previously missed. We can actually infer the list of files that will get copied to lib/ by the build process. After all, ExtUtils::MakeMaker figures out what to copy where, so it's possible to use the same approach to generate a lib/.gitignore that lists the copied files. It's actually relatively simple, given that we know the source directories for potential files (cpan/, dist/, and ext/) and the target (lib/). We use the list of files that ship in lib/ to infer the directories that will exist in lib/ for a fresh checkout. Then, for every file in the source directories, we figure out where it will be copied to. This is a bit of a game - sometimes we can get it from just the filename, sometimes we need to scan the file to find a C
So that's lib/.gitignore now automated.
And it turns out that doing that is already nearly all the work needed to figure out which directories are going to be created by ExtUtils::MakeMaker. Combine that with the existing infrastructure that manages updating the Makefiles (first written to automate handling of the many pod files) and we now have Makefiles that will clean up everything, without long dead rules for files and modules now gone, and without missing files which were recently added. And the existing infrastructure also gives us for free a regression test to ensure that the edited Makefiles and generated lib/.gitignore aren't stale.
And with that, a lot of scut-work is gone.
Another build related mess that I tidied up this month was ExtUtils::Miniperl. If you're building an XS extension and want to link it statically with perl, ExtUtils::MakeMaker achieves this by writing out a new perlmain.c for you, compiling that, and linking it with the your extension and libperl.a (from the installed tree). It generates that new perlmain.c using the module ExtUtils::Miniperl. It's also the case that perlmain.c as used by the core tree to build perl itself is generated by ExtUtils::Miniperl (well, since commit fbcaf61123069fe4 in Nov 2010, when I changed the *nix build to use the module, and eliminated the shell script that used to do it - spot a pattern here?).
However, how ExtUtils::Miniperl used to get built was itself an exercise in layered yaks. ExtUtils::Miniperl is a Perl module to generate C code. It used to be generated at build time by a script minimod.pl, which would read the file miniperlmain.c (checked into git - the file containing the main() used by miniperl), scan the C code for known markers, and then write out the module. So we had Perl code scanning C code to generate Perl code that will generate C code.
This is obviously more complex than it needs to be. But what's the best way to untangle it? Ideally we'd also write out miniperlmain.c using ExtUtils::Miniperl, to avoid needing to have code that scans it (and fragile marker comments). The problem here is that miniperlmain.c is needed to build miniperl, which is needed to run ExtUtils::Miniperl to generate minperlmain.c, so there's a circular dependency. So that's not looking good.
Turns out that there is a simple solution, by exploiting a different feature of the distribution. There are some files in the build needed to build miniperl which are generated by Perl code. These are not updated automatically (they can't be - otherwise you end up with circular build dependencies, which make abhors, and can often confuse a parallel make sufficient to make it fork bomb), so they are really only suitable for files that change rarely. It also used to be the case that the checked in version could get out of sync with the code and data used to generate it. (Humans are involved, so forgetfulness happens - we've fixed this now with a regression test that spots it.). So by codifying miniperlmain.c as a file regenerated by a small script using ExtUtils::Miniperl, it was possible to eliminate the scanning code and an entire layer of code that generates code.
The fun isn't quite over yet. It turns out that there's yet another twist in the tail. As well as all the interaction between miniperlmain.c and ExtUtils::Miniperl, there's yet another copy of perlmain.c in ExtUtils::Embed.
ExtUtils::Embed is a module added to the core back by commit a2c6f8f14c6bd897 in July 1996. It's described as "Utilities for embedding Perl in C/C++ applications", including 4 routines to generate the C code to initialise statically linked extensions. Given that the perl executable is structured as a trivial main() wrapper around an embedded perl interpreter, this means that ExtUtils::Embed is doing pretty much the same job as ExtUtils::Miniperl, and indeed the code was effectively forked off ExtUtils::Miniperl 17 years ago.
The code hasn't diverged much (and fortunately this also means that there weren't bugs fixed in one but not the other) so it was relatively simple to refactor ExtUtils::Miniperl to call routines in ExtUtils::Embed. The result is that there is now only one copy of Perl code to generate the C for an xsinit() function, for all the different use cases.
It always seems to be the small things that can explode the most. As part of some previous refactoring work, I spotted an unnecessary use of the Makefile macro $(LDLIBPTH) in the link line for miniperl. But before removing it, I went for a quick check of revision history to find out who added it. Which was where the fun began.
The macro had been added, removed, and then re-added 10 years ago after a sequence of reports from someone with a regular smokes running on a PowerPC Linux system. Power PC Linux systems not being the most common thing back then (or any time since) it was assumed that the problem might be something PowerPC Linux specific, particularly as no-one else was able to replicate the problem, or figure it out.
Fortunately, thanks to the GCC Compile Farm I currently have access to a Power7 system ( http://gcc.gnu.org/wiki/CompileFarm#Machine_Detailed_List ) However, in some ways this was most useful in proving that it wasn't architecture specific (or really, even Linux specific)
The "fun" is because of an awkward combination of things
- The smoker's "compiler" is actually implemented as a Perl wrapper that calls the real compiler.
- The system's /usr/bin/perl is built to use a shared libperl.so, which means that it needs to find the system installed libperl.so to work (properly)
- The smoker's configuration was building with a shared libperl.so, which means that to test it (uninstalled, obviously) it needs to be forced to find the newly built libperl.so
Most people aren't using a Perl wrapper for a compiler so they don't tick all the boxes. The default isn't to build a shared perl library, so that avoids ticking one box. (Shared libraries mandate telling the C compiler to generate position independent code, which isn't as efficient at runtime.) Which is why no-one hit the problem.
The difficulty is that the tools available to force a binary to load shared objects from a non-default path aren't that flexible, even on Linux. And you can't not do this, as without this you can't even test it before installing. The "obvious" answer is to set the environment variable LD_LIBRARY_PATH to the path of libperl.so in the build tree. Different dynamic linking systems have different names for the variable that does this, hence why they are abstracted in the Makefile macro $(LDLIBPTH).
To avoid possible confusion, note that LD_LIBRARY_PATH only affects the paths for runtime loading of shared libraries - not what the compiler "bakes" into a binary about where it should search for shared libraries. If you run gcc with LD_LIBRARY_PATH set in the environment, it affects the startup of gcc, not what gcc outputs. To change the output, you need to pass -rpath options to the linker, or use a different environment variable LD_LOAD_PATH. So the Makefile setting LD_LIBRARY_PATH in the command line used to link miniperl is a complete red herring.
The frustration is that this approach of setting LD_LIBRARY_PATH to force loading of the newly created library goes subtly wrong - LD_LIBRARY_PATH adds directories to search at the end of the list, not the start. Hence with this, any installed copy of the library is found first, even by the uninstalled perl you've just built and are trying to test. Which isn't good, because it means that you're not testing the thing that you just built. Instead you have the ./perl shim loading up and running the installed libperl.so, which at best crashes early, and at worst gives sufficiently believable test results that you don't suspect anything.
So there's a brute force solution for this problem - use LD_PRELOAD to force the dynamic linker to load ./libperl.so before doing anything else. This does at least mean that the tested ./perl will always be loading ./libperl.so, even there is another libperl.so in the default path that it would search. Obviously as there's only one environment, and only one set of environment variables to control this, so if you set the variables before running the test, every executable that runs during testing is affected. Not so obviously what this actually means is that every dynamically linked executable gets to load and link with ./libperl.so, from /bin/true to /usr/bin/make, but as all of them don't use any symbols defined by libperl.so, it makes no difference.
The problem comes because the regression tests need to test that ExtUtils::MakeMaker can build XS extensions, which means that those tests need to run the C compiler toolchain. And if your C compiler toolchain is actually a Perl program, even if it's an installed Perl program running against the installed /usr/bin/perl, then it too will be forced to load ./libperl.so Which, quite likely, will crash in "interesting" and "exciting" ways. (But almost certainly won't work, unless the two versions match exactly)
And there really isn't a solution to this. I can't see any way to have all of
- Testing the actual binary that will get installed
- Overriding the shared library that it loads to use the newly built library, not the existing installed library
- Without overriding the shared library loaded by any other installed perl binary
Specific to the problem in question which caused all the churn back then, I tried to replicate the problem on the Power Linux system using a wrapper running on perl v5.8.0 (which would have been the version that the bug reporter would be using), testing the then current version of blead. From that, and from reading the various e-mail threads at the time, I concluded that that the ticket doesn't tell the whole story about the make environment and that the toolchain was more non-standard than described in the ticket.
Hence I'm confident that the changes to the Makefile adding/removing LD_LIBRARY_PATH were independent of the reported problems, so I was eventually able to make the following small fix, and the generated Makefile got a little simpler:
diff --git a/Makefile.SH b/Makefile.SH index 15b32fd..9d44c2c 100755 --- a/Makefile.SH +++ b/Makefile.SH @@ -901,5 +901,5 @@ lib/buildcustomize.pl: $& $(mini_obj) write_buildcustomize.pl lib/buildcustomize.pl: $& $(mini_obj) write_buildcustomize.pl [email protected] -f miniperl.xok - $(LDLIBPTH) $(CC) $(CLDFLAGS) -o $(MINIPERL_EXE) \ + $(CC) $(CLDFLAGS) -o $(MINIPERL_EXE) \ $(mini_obj) $(libs) $(LDLIBPTH) $(RUN) ./miniperl$(HOST_EXE_EXT) -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl. Please run make minitest; exit 1' @@ -913,5 +913,5 @@ lib/buildcustomize.pl: $& $(mini_obj) write_buildcustomize.pl $(PERL_EXE): $& perlmain$(OBJ_EXT) $(LIBPERL) $(static_ext) ext.libs $(PERLEXPORT) write_buildcustomize.pl [email protected] -f miniperl.xok - $(SHRPENV) $(LDLIBPTH) $(CC) -o perl $(CLDFLAGS) $(CCDLFLAGS) perlmain$(OBJ_EXT) $(static_ext) $(LLIBPERL) `cat ext.libs` $(libs) + $(SHRPENV) $(CC) -o perl $(CLDFLAGS) $(CCDLFLAGS) perlmain$(OBJ_EXT) $(static_ext) $(LLIBPERL) `cat ext.libs` $(libs)
# Microperl. This is just a convenience thing if one happens to
The full explanation is at the end of RT #23212, and most easily viewed as https://rt.perl.org/rt3/Ticket/Attachment/1236123/643477/
The other entertainment of note this month was investigating why builds on HP-UX had started failing one of the regression tests for regular expressions. Bisecting showed that the failure started with commit f1e1b256c5c1773d, which fixed the rules for parsing numeric escapes in regexes. So why did the C code fail? This is a job for a debugger:
$ gdb ./perl Detected 64-bit executable. Invoking /pro/local/bin/gdb64. /usr/lib/pa20_64/dld.sl: Unable to find library 'libiconv.sl.4'. Killed
Bother. That's not very useful. Try on the other machine:
$ gdb ./perl Detected 64-bit executable. Invoking /pro/local/bin/gdb64. GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "hppa64-hp-hpux11.11"... (no debugging symbols found) (gdb) run Starting program: /perl/usr/nick/Perl/perl/perl Error trying to get information about dynamic linker. (gdb)
A chocolate teapot would be about as much use.
So in theory, at this point I start having to figure out HP's own adb. Which seems to be even more user-hostile than dbx on Solaris. This isn't looking like it's going to be fun...
So it was a great relief to discover that I didn't need to fight various HP debuggers (much). In that, the combination of SIGBUS and "the stack seems to be corrupt" triggered something in my head - these are the typical symptoms reported by a debugger when a C program has recursed too deeply and bust the C stack. At which point I could replicate it (down to the same test) by tweaking the stack size on x86_64 Linux. It's nothing directly to do with architecture or OS. The default thread stack size on PA-RISC HP-UX is too small. The regex compiler recurses (in C) for each () group, so 100 nested () groups makes an abnormally large amount of recursion.
I could reproduce it by setting the thread stack size to 86016 on x86_64 (that number will vary with build options), and tune the failure point earlier or later by changing the thread size. The regression test will also fail during parsing even without the associated C code change. Huzzah. Because that means that the solution is pure-Perl, simply changing the test runner to set a larger stack size.
(Of course it wasn't quite that simple. The "big" stack size I chose turned out to be smaller than the default used by gcc on Linux when building with ASAN, so the first change was a regression elsewhere. And the default on OS X is also too small, so it also needs an unconditional stack size increase.)
It's good that perl itself is "stackless", else we'd hit these stack issues with threading whenever Perl code recurses too much.
One last little thing I did was to add sort to the code which generates Errno and Pod::Functions so that the same output is always generated, independent of randomised hash ordering. The alternate orderings of generated code wasn't a bug, so didn't absolutely need fixing, but being able to compare the generated files and have them byte-for-byte identical makes debugging and refactoring a lot easier.
A more detailed breakdown summarised from the weekly reports. In these:
16 hex digits refer to commits in http://perl5.git.perl.org/perl.git
RT #... is a bug in https://rt.perl.org/rt3/
CPAN #... is a bug in https://rt.cpan.org/Public/
ID YYYYMMDD.### is an bug number in the old bug system. The RT # is given
afterwards. You can look up the old IDs at https://rt.perl.org/perlbug/
|1.00||Compiled-in POSIX character class inversion lists|
|1.50||Exporter -> dist/Exporter|
|ExtUtils::Embed::canon and -Uusedl|
|4.00||File::Find -> ext/|
|0.75||ID 20000804.001 (RT #3634)|
|1.25||Makefile.SH valgrind targets|
|0.75||RT #119069/RT #119071/RT #119073/RT #119075|
|7.00||Storable blessed big endian failure|
|3.75||VMS-Filespec and known_extensions|
|1.25||deterministic Errno and Pod::Functions|
|3.75||gcc ASAN (merging -DPERL_GLOBAL_STRUCT fixes)|
|0.25||process, scalability, mentoring|
|33.00||reading/responding to list mail|
|0.50||version -> cpan/|
151.50 hours total