Nicholas Clark writes:
Sorry for the delay in this report - I've been ill, been away, been ill while away, and generally disrupted, and I wasn't the only one ill in the house. Fortunately everyone is well again, and the backlog of other tasks is considerably reduced.
Zombie undead global backref destruction panics strike again. I don't know what it is with these critters, but another variant turned up, in almost the same place, and triggered by almost the same code. I think that this is the last one, simply because this was the third and final branch of the backref code, and now it's been diagnosed and fixed.
In this case, the problem is that it's possible for the the last (strong) reference to the target [tsv in the code in Perl_sv_del_backref()] to have become freed before the last thing holding a weak reference. If both survive longer than the backreferences array (the previous cause of problems), then when the referent's reference count drops to 0 and it is freed, it's not able to chase the backreferences, so those backreferences aren't NULLed.
For example, a CV holds a weak reference to its stash. If both the CV and the stash survive longer than the backreferences array, and the CV gets picked for the SvBREAK treatment first, and it turns out that the stash is only being kept alive because of an our variable in the pad of the CV, then midway during CV destruction the stash gets freed, but CvSTASH isn't set to NULL. It ends up pointing to the freed HV. Hence that pointer is chased into Perl_sv_del_backref(), but because it's pointing to a freed HV the relevant magic structure was no longer there to be found, a NULL pointer was assigned to a local variable. Subsequent code panicked because it thought that could never happen, at least not without a bug. Except, as the investigation showed, it could happen quite legitimately in exactly this scenario. During global destruction, all bets are off.
I don't believe that "better" destruction ordering is going to help here - during global destruction there's always going to be the chance that something goes out of order. We've tried to make it foolproof before, and it only resulted in evolutionary pressure on fools. Which made us look foolish for our hubris. :-(
I think that the reason that all these critters are shuffling towards us*now*, despite being in code that's quite long lived, is because since 5.14.0 Dave has re-worked the SV destruction code. Previously it would recurse into data structures, which had the unpleasant side effect of blowing the C stack when it tried to do too much at once. [Crash and burn - not good] Dave has made much of that code iterative now, which avoids the crashing. [Generally this is seen as progress :-)] However, it's changed the destruction order, and I think in some cases that is exposing long-latent bugs elsewhere in the code that destruction calls, particularly during global destruction.
Another signficant activity in February week was diagnosing and resolving ticket #37033. The low bug number reveals that this is quite a long standing bug, related to the parser leaving file handles open in some cases. The background is that the perl interpreter is either passed a filename, or if none is passed defaults to reading from STDIN. If reading from STDIN, the interpreter should not close it, as it did not open it (and implicitly closing STDIN out from underneath a program is bad). But any file handle the interpreter opened (internally, as a side effect of doing its job) should be closed.
The problem was the the check for close-or-not used to be simply "is this filehandle STDIN?"
The troubles arise from the interaction of the definition of STDIN with the POSIX semantics of open. "is this STDIN?" is pretty much "is this file handle open on file descriptor 0?". open gives you the lowest unused file descriptor. So if the user program closes STDIN, then the next file handle that is opened meets the "is this STDIN?" test. And the parser was mistaking this for the situation where it had defaulted to reading the program from STDIN because no filename was passed in, so it was leaving the file handle open. Of course, this is a leak. However, it was also starting to show up as visibly buggy behaviour in code that manipulated STDIN - close STDIN, do some "innocent" calculation, (re)open STDIN, only it's not on file descriptor 0 now - what's wrong?
What changed? Karl's work in improving Unicode semantics means that several ops now need look up certain Unicode properties in some cases. If these aren't cached yet, there's a call back into the parser as part of the loading routines. And if that call into the parser happened while STDIN was closed, oh dear, come close time the parser used to thing that had been defaulting to reading from STDIN, and left the handle open.
The parser now explicitly tracks what it opened, and hence closes everything correctly.
A side effect of tracking all this down was that I started to look at the code in perl.c that handles the parsing of the command line options and the initialisation they trigger, along with the the handling of the "script file name", the "-e" option and the default to STDIN. Some of the cleanup has been committed to blead and will be in 5.16.0. Other parts are in the branches nicholas/RT37033-followup and smoke-me/kick-FAKE_BIT_BUCKET However it remains a work in progress - right now perl needs to open /dev/null as part of -e handling, because of assumptions in the parser. However, I think I can see roughly how to make a small change to the source filter code that would permit -e to avoid needing to open any file handle.
installhtml also forced it way onto my plate again in February. This was mysteriously passing its tests on George Greer's Win32 setup that smokes "smoke-me" branches, and yet failing one test on George Greer's Win32 setup that smokes blead. Same operating system, same code, same everything, so why isn't it the same result? After quite a lot of head scratching, code chasing (you are in a twisty maze of abstraction layers, all alike), it turns out that the problem is that on one VM, the smoker is configured to smoke at a pathname D:\path\to\smoker and on the other VM its configured to smoke at d:\path\to\smoker. That difference is almost irrelevant, because filenames on Win32 are* case preserving, case insensitive, and most things aren't concerned with which name a file has, merely whether they can open (or delete) it.
However, Pod::Html has an explicit white-box test for its caching. That test needs to be sure that the right pathnames are in the cache index. And it turns out that the pathnames written to the cache are canonicalised by File::Spec, whilst the expectation list built by the test is not. And that whilst files and directories on Win32 are case preserving, and hence unchanged on canonicalisation, drive letters are canonicalised to upper case. Yay.
I wonder what useful stuff gets displaced from my brain by this obscure knowledge? I'm sure I used to be able to remember the lyrics to American Pie... :-(
To finish the month, I finished a chunk of work related to a regression introduced by commit 6634bb9d0ed117be. A side effect of the improvements in that change was a small regression in how code such as this would deparse:
say "Perl rules";
It should deparse as C
A more detailed breakdown summarised from the weekly reports. In these:
16 hex digits refer to commits in http://perl5.git.perl.org/perl.git
RT #... is a bug in https://rt.perl.org/rt3/
CPAN #... is a bug in https://rt.cpan.org/Public/
BBC is "bleadperl breaks CPAN" - Andreas König's test reports for CPAN modules
|7.25||B::Deparse CORE::say regression|
|Pod::Functions (and then Carp-related pre-req fail)|
|RT #37033 related tidy up|
|RT #37033, RT #111070|
|1.00||SVf and HEKf|
|8.50||another backref panic|
|bisect.pl (usability improvements)|
|0.50||defined and exists|
|3.25||perl.c init functions|
|8.25||process, scalability, mentoring|
|46.25||reading/responding to list mail|
139.50 hours total*generally. My understanding is that NTFS can be case sensitive, and can do hardlinks, but that most Win32 code would have kittens if it were run with these settings. Likewise, *nix can quite happily mount case insensitive file systems, and that's going to be breaking some code too...