January 2013 Archives

Alien::Base Grant - Report #8

From the grant manager: This project is in a phase where a bit of help from the community is required. Get in touch with Joel if you are able to help him with his problem.

Joel Berger wrote:

After a busy Christmas season and being engrossed in my upcoming thesis defense I have found it hard to find too much time to focus on Perl projects. Still Alien::Base has been on my mind, and happily it has been on the minds of others too!

In this report I want to focus first on the high points. Far and away my high point has been that in my apparent absence others have taken up the mantle. I have gotten bug reports and pull requests from preaction, giatorta, amannb, mokko, tobyink and as always productive conversations with David Mertens. It helps in a tough project like this has been to see the excitement of other developers, waiting for me to finish this project.

Several bugs have been fixed, others have been identified. I hope to have another dev release out soon. Further I plan to release a version of Alien::GSL which depends on Alien::Base to CPAN not long after that.

Now for the bad news. While it seems that Alien::Base is coming to a preliminary form of "completion" I have started to notice a disturbing trend coming from CPANtesters. Let me preface this by saying I think CPANtesters might be the killer app of CPAN; it is spectacular and I have said so before. There is a problem though.

Because I wanted to be sure that testing was done with properly installed modules, I created two phony distributions: Acme::Alien::DontPanic which provides a tiny C library of my own creation, and Acme::Ford::Prefect which relies on it. For this reason, when considering the test results for Alien::Base, the results of these modules must be considered as well.

As I have noted before, certain platforms (Mac OSX) need to know the full path of the library's final install location at build time. The only way I was able to overcome this problem was to have Alien::Base build the library in a temporary location and reference the final location in the build system. Then when the module is installed, the build tool's install command is issued. The library's build system therefore avoids the Perl build system. Let me say this a different way, the built library never lands in a blib directory before moving to its final location.

Now I want you to note, that I have taken special care to make sure that Alien::Base refreshes the installation packlist and all-in-all there should have been no adverse effects from this plan. It wasn't until I started to get failing tests for the packlist handling on Acme::Ford::Prefect that I noticed the problem. While I was able to prevent those problems from being reported, they belied a deeper problem.

This brings us back to CPANtesters. When testing some module, many of the CPANtesters smoke testers don't actually install the dependent modules. What they do is build each dependent module, then add all the relevant blib directories to the @INC array. This is a clever system, it means that a fresh dependency chain is available for each test, it means that the testing platform is not affected by installing all these modules. Unfortunately its not testing a real install environment. For most people and for most modules this is fine. For Alien::Base as currently constituted however, its a very real distinction.

On platforms where this library path information is not problematic, Alien::Base will work correctly, even under CPANtesters blib scheme. For platforms that suffer from this problem, I believe that Alien::Base will work correctly, but not under the blib scheme. Therefore no dependent modules will be able to successfully dynamically link to the provided C library.

Until Alien::Base-based modules can install and test on the three major platforms, I hesitate to call Alien::Base completed. I think I'm rather close to declaring victory on 5 of those 6 goals, however if CPANtesters testing on Mac is broken, I'm still stuck saying that there is work to be done before any victory may be declared.

Perhaps there is some way that CPAN testers can work with me to avoid this problem, admittedly I haven't talked to them about this. If anyone with more knowledge on the subject wants to comment below, email me or file a github issue, please do.

Original article at Joel Berger [blogs.perl.org].

25 Years On: The Perl Community

Twenty-five years ago, Larry Wall conceived of a way to make his work a little bit easier by combining the UNIX tools he found most useful into something more like a general purpose programming language than the various shells available to him. This modest act started the chain of events that lead Perl to be one of the longest-standing F/OSS projects we have today.

An unavoidable side effect of this gift to the world of practical computing was the formation of the conceptual gravity well of Perl. This strange attractor pulled, inevitably and inexorably, so many interesting and clever people into Perl's orbit, forming what we would come to know as the Perl Community.

Almost concurrent with Perl's 25th anniversary, TPF has established a new rôle: the Community Advocate. I am very pleased to be the first of these, and to have the opportunity to help foster the community that has given me livelihood, friends, and much enjoyment over the nearly 20 years I have used Perl.

The Community Advocate rôle is not entirely new, however its scope and goals are. Previously, TPF's community work was focused on liaison to the Perl Mongers groups (the basic unit of the Perl Community) all over the world, through the capable agency of my proto-predecessors. What is different today is the comprehensive mission (from the community perspective) of the Community Advocate, and of the associated Community Advocacy Committee, of which I am chair.

The CA committee charter reads, in part:

The Committee shall support the following specific tasks:

  1. Advocating for TPF in the community, and for the community inside TPF
  2. Supporting Perl Mongers and other perl-related community groups
  3. Community building at YAPC and other perl-related events
  4. Establishing regional community-building efforts
  5. Supporting the community "identity" to foster a sense of belonging
  6. Increasing communications within the world-wide Perl community across national and language lines

To accomplish these goals, we need several things. Principal among them is understanding just what the Perl Community is. This is not so easy. The idea of community may seem straightforward at first glance. But, when you try to pin it down it appears to wiggle this way and that until, if you manage to get it to stand still, it vaporizes in a cloud. This leaves you wondering if you were imagining the whole thing to begin with.

I've done considerable thinking about this, and come to some conclusions with I will recount here in abridged form over the next couple of weeks. I will present an analysis of "community" and what it might be; trace the growth of it, and its "maturity; and propose a "technology" to help build and strengthen the community so many of us consider our technological and social home. While I will start out generically, I will also focus in on the special case of the Perl community, its attributes, and how they lead to both our great strengths and sometimes frustrating weakness; and how both can be employed to strengthen and grow the community that gives rise to them.

I look forward to a dialogue with you, an integral part of the community. This is an interactive project, and it is in service to the community and its members.

To date, the P5CMF has been used to pay out $130,733 in grants for the improvement of Perl 5. We have allocated $35,610 towards additional grant work not yet completed. There remains $116,643 in unallocated grant funds.

For full financial details regarding the P5CMF, please visit this Google Doc.

For information regarding the Perl 5 Core Maintenance Fund, including how it is administered and how to apply for a grant, please view the fund page on The Perl Foundation's website.

Dave Mitchell writes:

The first part of November was spent finishing off the PADRANGE
optimisation and merging it into blead.

Here's the commit message. After it, I'll discuss timings.

[MERGE] add PADRANGE op and $B::overlay

This commit implements three optimisations and one new feature.

The new feature is $B::overlay, which can be set to a hash ref, indexed by op address, that allows you to override the values returned by the various B::*OP methods for a particular op. This specifically allows Deparse to be tricked into seeing a pre-optimisation view of the optree, and so makes adding new optimisations (like the one in this commit) a lot easier.

As regards optimisations: first, a new save type is added: SAVEt_CLEARPADRANGE, which is like SAVEt_CLEARSV but specifies a range of targs to be cleared. The save type, target base and range all fit within a single integer pushed on the save stack.

Second, a pushmark followed by one or more pad[ahs]v ops (and possibly some mixed-in null, list and nextstate ops) will sometimes be replaced by a single padrange op. Like other pad ops, this specifies a targ, but in addition the bottom 7 bits of op_private indicate a target range. pp_padrange has two main actions: with OPpLVAL_INTRO, it pushes a SAVEt_CLEARPADRANGE onto the save stack and turns off SvPADSTALE on all the lexicals; and in non-void context, it pushes all the lexicals onto the stack.

Third, for the specific case of the construct my(...) = _, the ops to push _ onto the stack (pushmark/gv[*_]/rv2sv) are skipped, and the OPf_SPECIAL flag on the padrange op is set: this tells pp_padrange to push @_ directly.

Note that not sequences of pad ops are consolidated into a single padrange op; the chief constraints are that:

  • they must form a list (i.e. start with pushmark);
  • the targs must form a contiguous range;
  • the flags of the ops must all be similar; e.g. all INTRO or all not, all void or all not, etc;
  • only a subset of flags are allowed; e.g. we don't optimise with OPpPAD_STATE or OPpMAYBE_LVSUB present.

For the specific case of void/INTRO, we consolidate across nextstate boundaries (keeping only the last nextstate); i.e.

        my ($a,$b); my @c; my (%d,$e,$f)

becomes a single padrange op. Note that the padrange optimisation is particularly efficient for the void/INTRO combination: formerly, my($a,$b,@c,%d); would be compiled as

        pushmark; padsv[$a]; padsv[$b]; padav[@c]; padhv[%d]; list; nextstate

which would have the effect of pushing $a, $b onto the stack, then pushing the (non-existent) elements of @c, then pushing the %d HV; then pp_list would pop all the elements except the last, %h; finally, nextstate would pop %h. Instead, padrange skips all the pushing and popping in void context. Note that this means that there is one user-visible change caused by this optimisation:

        my @a;
        sub f { tie @a, ...; push @a, .... }

Here, @a is tied and already has elements by the time the 'my @a' is executed; formerly, FETCH would be called repeatedly to push the elements of @a onto the stack, then they would all be popped again at the end of the 'my @a' statement; now FETCH is never called.

The optimisation itself is implemented by converting the initial pushmark op into a padrange, and updating its op_next. The skipped ops are not cleared; this makes it easier for S_find_uninit_var() and Deparse.pm to do their stuff. Deparse is implemented by using the new $B::overlay facility to make the padrange op look like a pushmark op again; the rest of the Deparse code just sees the original unoptimised optree and so doesn't require any knowledge of the padrange op.



consider the following code:

    sub f { ... }
    for (1..10_000_000) {

Running that with dumbbench for various bodies of f gives:

  orig  padrange %speedup
------  -------- --------
0.4716  0.4850      -2.8% empty loop: skip call to f()
2.4136  2.5536      -5.4% sub f { }
3.9072  3.7901       3.1% sub f { $_[0] + $_[1] * $_[2] }
4.1850  3.3682      24.3% sub f { my ($x,$y,$z); 1 }
5.514   4.8030      14.8% sub f { my ($x,$y,$z) = @_; }
6.2401  5.4497      14.5% sub f { my ($x,$y,$z) = @_; $x+$y*$z; }

Numbers are average wallclock times in secs, lower is better. (x86_64, gcc -O2, lots of code alignment options supplied to help give consistent results). The times are the complete time to run the program, including startup and loop/function-call overhead.

The first three results are essentially noise, since they don't exercise code paths affected by the optimisation; but they also give you an idea of what consumes the CPU. The third one represents a classic code example where direct access to @_ elements is done for speed. This should be compared with the last example, which uses lexical vars for the same end. The lexical variant is still slower, but the gap has been considerably narrowed (40% slower rather than 60% slower).

Finally, a code example specifically chosen to make this optimisation look good :-) ...

   for my $i (1..10_000_000) {
        my ($a,$b,$c);
        my $d;
        my (@e,%f);
        my $g;

This gets particularly good marks, as all those my's are coalesced into a single padrange op, and all the void context stack pushes and nextstates are skipped. In this case, the loop runs 91% faster :-) [ in the mathematical sense that 100% faster == takes half as long to run ]


The rest of November and the first part of December was mostly spent doing non-TPF stuff. Then later on in December, I looked at the three address-sanitizer bugs that Reini Urban had reported against 5.14.3, and confirmed that none of them were in fact regressions, nor security issues; but I patched maint-5.14 anyway.

I fixed bug #116148: if two successive patterns were executed, the first being utf8 and the second not, and if the first match failed or succeeded purely using re_intuit_start() (this is the 'quick guess' initial stage of the regex engine), then in the second pattern's call to re_intuit_start(), it would be treated as utf8 too. This is a long-standing bug, but only affects a small number of cases (which varies over time based on what optimisations can be handled).

The bug turned out to be that the RF_utf8 flag in PL_reg_flags was set by re_intuit_start() but never unset, in contrast to regexec_flags(), which initially set or reset it as appropriate.

The direct fix was fairly trivial, but I also took the opportunity to eliminate PL_reg_flags altogether. This is one of a number of global (per-interpreter) variables that contain state for the current match, and that ideally need eliminating and replacing with local vars. They are a relic of the original (perl 3 era) Spencer regex package that wasn't designed to be re-entrant, and which kept the state for the current match in a bunch of static vars within regexec.c.


Over the last 2 months I have averaged 4 hours per week

As of 2012/12/31: since the beginning of the grant:

147.1 weeks
1586.5 total hours
10.8 average hours per week

There are 113 hours left on the grant.

Report for period 2012/11/01 to 2012/12/31 inclusive


Effort (HH::MM):

8:50 diagnosing bugs
26:05 fixing bugs
0:00 reviewing other people's bug fixes
0:00 reviewing ticket histories
0:00 review the ticket queue (triage)
34:55 Total

Numbers of tickets closed:

5 tickets closed that have been worked on
0 tickets closed related to bugs that have been fixed
0 tickets closed that were reviewed but not worked on (triage)
5 Total

Short Detail

19:05 [perl #114536] Optimize assigning to scalars from @_
1:00 [perl #115602] Moose fails in List::MoreUtils::all use-after-free
3:15 [perl #115990] 3 new severe ptr errors in 5.14.3 (non-threaded asan)
4:00 [perl #115992] 5.14.3 use-after-free in t/op/local.t op_lvalue
0:20 [perl #115994] 5.14.3 S_join_exact global-buffer-overflow
7:15 [perl #116148] Pattern utf8ness sticks around globally

Nicholas Clark writes:

As per my grant conditions, here is a report for the December period.

A frustrating month.

Having completed the investigation into the state of hashing described in November's report, and becoming comfortable that it wasn't likely to explode without warning, I turned to dealing with the backlog of everything else. Given that I've only been able to do about 2 weeks' work in the past 2 months, routine traffic on the list means that a lot of "stuff" which has accumulated which I've not had time to deal with.

The obvious thing to fix first was the failing smoke testing reports. Smoke testing reports are great as long as they pass. Starting from the known state of "passing", you make a change, and if everything still passes you have reasonable confidence that you didn't break anything. Alternatively, if the tests start to fail, you quickly know that your change has an unexpected problem, and a clue as to where to start investigating.

Failing smoke tests are frustrating because you no longer have this clean distinction between "it passes" and "it fails, hence I introduced a problem". The tests fail, but they were going to fail anyway. So you have to look carefully to work out whether the failures you see are new (and hence you broke something), or the ones that were there before (and hence you probably didn't break anything, but you can't be sure).

The right thing to do is to fix the problems that are causing the smoke tests to fail, before trying to change anything else. In this case the problem is with the smoker that George Greer runs which is building perl with using clang's Address Sanitizer, a tool which looks for C programming errors such as uninitialised and out of range memory access. It's particularly important to investigate (and fix) such reports, as the problems found are potentially nasty security risks. In this case, as the smoker had started finding problems at some point since September's blead release, it can't be a problem in a production version, but it does need to be nailed before v5.18.0 ships, and the sooner the better.

Unfortunately, I couldn't replicate the problem on any machine I had access to on which I'd successfully built perl with clang. So I tried to replicate the problem on dromedary, the rather beefy 8 core server which acts as hot backup to the main git server. What followed was several days of discovering, stalling, and eventually working round bugs in building LLVM and clang, or in building perl with clang, in the hope of getting to the point where I could replicate the problem under a debugger, and hence fix it.

dromedary already had a version of clang installed into /usr/local. However, I'd never been able to get it to complete the build of perl. The perl binary would build successfully, but the build will bomb out whilst building XS extensions with errors such as

    encengine.o: In function `fstat64':
    /usr/include/sys/stat.h:449: multiple definition of `fstat64'
    Encode.o:/usr/include/sys/stat.h:449: first defined here
    encengine.o: In function `fstatat64':
    /usr/include/sys/stat.h:457: multiple definition of `fstatat64'
    Encode.o:/usr/include/sys/stat.h:457: first defined here
    encengine.o: In function `gnu_dev_major':

This time, however, I did manage to get to the bottom of that one. Whilst it's reported as an LLVM bug: http://llvm.org/bugs/show_bug.cgi?id=1699 the underlying problem is actually in the system headers provided by glibc, which specify that certain functions are to be inlined always. If this instruction is followed correctly by the compiler, then one copy is inlined into each object file, and this results in the clash seen above at link time.

So how come it builds with gcc? Well, the version of gcc on the system doesn't correctly honour the inlining request, and hence doesn't generate multiple clashing bodies for the the functions. In turn, this is why the glibc headers are buggy - no-one was aware of the problem back when they were released, because no compiler was good enough to spot the problem.

So, how come it only fails in the XS modules? After all, the perl binary is made by linking multiple object files together, and surely they have duplicate function definitions? It turns out that it's a side effect of how the core's build system automatically adds various warnings and strictures when building the core C code, which it does not when building extensions. The flags are only used for the core C code, because as it's code we control, we've progressively tidied it to address the problems the warnings highlighted, and hence keep the build output clean. (Like the smokes, silent is good, because it's the easiest to skim.)

One of these flags is -std=c89, forcing ANSI C89 behaviour, which ./cflags.SH determines is viable when building with clang on Linux. It turns out that this changes the behaviour of the headers, to avoid inlining. (The inline keyword came in with C++ and C99.) At which point the problem of multiple bodies goes away, and the link is clean. So, explicitly add -std=c89 to the standard build flags and the build completes, tests run, and tests pass.

However, sadly, the locally installed version of clang is too old to be useful, as it's 3.0, and Address Sanitizer wasn't merged in until 3.1. Not to worry, let's build 3.1...

Turns out that also this isn't that easy. For starters, LLVM and clang are big, very big. Several gigabytes for the build tree, and another several gigabytes for the installed tree. So this necessitates a clean up of my home directory, just to make room on the partition for the build. And then the build fails with an assertion deep in LLVM, which Googling suggests is due to a bug in g++ 4.2 - LLVM puts a lot of stress on the C++ compiler used to build it. The bug is fixed in later versions of g++, but as dromedary's OS is old, g++, like the system headers, isn't the most current.

Rather than build my own newer copy of gcc, I decided to build a newer LLVM, in case that fixed it. Which means getting it from source control - http://llvm.org/svn/llvm-project/llvm/trunk At which point I discover that there is no subversion client installed on dromedary - presumably none of us have ever needed it. This suggests something about the trends in version control systems used by open source projects. So more yak shaving as I download and build that, which turns out not to be too hard. Which gets me a current LLVM, which I can install and use to build perl.

But I still can't replicate the problem.

So I checked out the exact same svn revision of clang and LLVM that George's smoker reports that it is using, built and installed that.

Still no joy.

Wondering whether this was something to do with the age of the header files, I tried on one other machine (a rather newer Ubuntu desktop). Again, the exact same revision as George, and no joy. Similarly svn HEAD can't replicate the problem.

After three days of trying I had learned a lot about how (not) to build LLVM and clang, but failed to actually solve the problem that I was setting out to solve. So I'm no nearer to actually nailing the cause of the "unknown-crash" which his instance of Address Sanitizer cryptically reports for a complex expression deep in the code which compiles regular expressions.

Prodding further, we discovered that Merijn could replicate the report when he built on his laptop, and that after shipping the build tree to dromedary, I could run it there and replicate the report. However, as clang built with optimisation, it wasn't very useful trying to debug this with gdb. (Actually, it wasn't at all useful trying to use /usr/bin/gdb, as the debug format has expanded. I had to use the newer gdb I'd built the week before from source, as part of my earlier clang yak shaving.)

So, he built again with -g, to enable C level debugging. Problem goes away. Bother. So he built again, with both -O and -g, to keep the optimiser, but also add debugging symbols. Debugging optimised code is a pain, but it's better than nothing. However, even this hid the problem. At which point, we gave up again, because remote-delivery of "adding printf" style debugging tying up two busy people simply isn't a productive use of time.

I did get somewhat further with another yak shaving exercise. To fix a bug, DosGlob had been updated, but in the process it gained a declaration after statement in the C code, which broke the build on some platforms. I wondered, why hadn't we got an automated test for this, to be able to rapidly test the change on a branch, and avoid the problem in the future. At least on FreeBSD the system headers are clean enough to be able to build with gcc -ansi -pedantic, which I thought was good enough to trap this. It turns out that I'm wrong - what one needs is gcc -ansi -pedantic-error. Trying to build blead with this, I discovered that there are only two places that don't build with this, which feels like it should be small enough to fix. One, in perlio.c, is a long standing problem which generates one of the only warnings from the C code. I'd love to fix it, but will need a Configure test, or possibly even removing the code in question as part of a major PerlIO implementation cleanup.

The other error was in GDBM_File.xs, and looked trivially simple to fix, by adding a single cast. So I added it, but much to my surprise and frustration, this cast broke the build on C++.

So, an aside - why are we testing on C++? After all, Perl 5 is written in C, not C++. So why the pain of also testing on C++. It's because we support building C++ extensions, which means that Perl's headers must also be valid as C++. Most ANSI C is also valid as C++, but a couple of constructions aren't, and occasionally these slip in. If we don't test for it, we don't notice when it has happened, and if no-one notices before a release, then it's too late to fix. Specifically, one slipped into one of the stable releases of 5.8.x that I made. This isn't a mistake I'd like to repeat. (But the problem was also in the release candidates. So I'm not loosing sleep over this. Release candidates exist for this sort of reason - to permit people to test for the things that matter to them, and get problems addressed before they are made permanent in a release.)

So, how to test with C++? The Configure system doesn't have a way to probe the location of a C++ compiler (and doesn't need to), as it's only interested in the C compiler. So easiest way to automate testing this is to get the testing environment to name a "C" compiler which is actually a C++ compiler. This does mean that we have to ensure that all the C code is also conformant C++, but that's generally not too onerous, and often the changes made result in cleaner code.

So, how to make C++ compilers happy with GDBM_File.xs? I investigated not adding the cast. This took me down into the rat's nest...

GNU GDBM is stable. Very stable. (But not "specialist biologist word for stable".) The current version is 1.10. 1.9 was released in 2011, the version before that, 1.8.3 was released in 2002, while 1.7.3 was released in 1995, which was between the releases of perl 5.000 and 5.001.

The cast is on the fifth argument to gdbm_open(). The fifth argument is an optional callback function for fatal errors. Only GDBM is this flexible - the other four *DBM libraries that the core wraps don't have this last argument.

The GDBM_File module provides a Perl wrapper to the GNU GDBM. It's been in the core since the start. It also turns out that from the start GDBM_File has attempted to provide Perl access to this fifth argument. The XS code in 5.000 looks like this:

    gdbm_TIEHASH(dbtype, name, read_write, mode, fatal_func = (FATALFUNC)croak)

GBDM_File's typemap file defines FATALFUNC like this:

    FATALFUNC               T_OPAQUEPTR

and in turn the global typemap file defines T_OPAQUEPTR like this:

    unsigned long *         T_OPAQUEPTR

which is a pointer to data, which cannot legally be cast to/from a pointer to a function under ANSI C, hence why gcc in pedantic mode is getting upset. But thinking further, this isn't very useful - the XS code is expecting to pull a C function pointer out of the fifth parameter, not a reference to a Perl function. This isn't exactly a useful interface. Searching CPAN suggests that no-one currently uses it. But has it ever worked? So I set out to test it. Inevitably, this proved more "interesting" than expected.

So, to test it, we need to determine when the fatal_func callback is actually called. Digging around the GDBM source code suggests that it's only called for "should never happen" situations, such as low level reads or writes failing on an open file descriptor. So I wrote a test that ties a GDBM file to a hash, figures out the numeric file descriptor that the GDBM library is using to access it, and then closes that file descriptor, forcing some sort of write error and hence use of the callback.

The callback was called. But then it crashed with a SEGV. GDBM_File.xs defaults the callback to calling Perl_croak(), but gdb showed that execution was in different C function that simply isn't reached from anything Perl_croak() calls. This shouldn't be possible, yet clearly it happens.

It turns out that the problem is really subtle, and one that I didn't know about. Perl_croak takes a pointer to a string, followed by a variable number of arguments:

    void Perl_croak(const char* pat, ...)

The callback is prototyped to take a pointer to a function that takes a string. That's compatible, right? After all, a variable number number of arguments includes the possibility of zero extra arguments, which this is.

Problem is that the machine-level calling convention for variable argument functions can differ from that for fixed arguments. Specifically, on x86_64 (on which I was testing), for a variable argument function, %rax is set to the number of arguments passed in floating pointer registers. I didn't know any of this, and I doubt that most people do. But the upshot is that on x86_64, if the compiler knows that it's calling a function with fixed arguments, it doesn't bother "wasting" an instruction setting %rax to zero, whereas the function prelude for the called function assumes that %rax is set to something meaningful, and computes a jump based on it. If %rax contains out of range garbage, that jump can leap into any nearby code. Whoops!

So, this means that likely no-one has ever even tried to rely on the callback function, because it's a booby-trap that likely will only crash. At which point it seemed simplest to remove the complications of an unused feature, and drop the ability to change the function used for the callback, simply always using croak() as the callback.

The observant will notice that there's a second bug here - Perl_croak() takes a *printf-style format string, whereas there is no guarantee that the string GDBM passes to the callback does not contain % characters. We can fix both this bug and the problem with the calling convention by using a small wrapper function

    static void
    croak_string(const char *message) {
        Perl_croak_nocontext("%s", message);

and now we're done.

Sadly not. Failing smoke tests reveal that the prototype for the callback function passed as the fifth argument has changed between GDBM 1.8.3 and 1.9.0, from void (*)() to void(*)(const char *). This distinction doesn't matter to a C compiler - to it, the former is "arguments unspecified", and the latter is "one argument specified", but to a C++ compiler, the () of the former means "zero arguments specified". So if you're compiling with a C++ compiler, the above static function is C++, the types conflict, and it's a compile-time error without a cast.

But which cast to use? Really, I wanted to simply prototype my function as "C" linkage - ie

    static "C" void
    croak_string(const char *message) {
        Perl_croak_nocontext("%s", message);

Sadly this isn't allowed. extern "C" makes sense. static "C" is obviously useless. Except that I've just found a use case for it. Bother!

In the end, I opted to go for conditional compilation based on the C-pre-processor macros defined by GDBM, instead of something more complex such as probing. There are just 8 tarballs of GDBM released over the past 18 years, so it was perfectly possible to verify that this approach worked on all of them.

A more detailed breakdown summarised from the weekly reports. In these:

16 hex digits refer to commits in http://perl5.git.perl.org/perl.git
RT #... is a bug in https://rt.perl.org/rt3/
CPAN #... is a bug in https://rt.cpan.org/Public/
BBC is "bleadperl breaks CPAN" - Andreas König's test reports for CPAN modules

0.25RT #115910
3.00RT #115928
0.25Unicode data structures
3.75cflags, DosGlob, -std=c89
clang dromedary
0.25gcc -ansi -pedantic-error
2.75perl.h, X2P
0.25process, scalability, mentoring
32.50reading/responding to list mail
1.00used only once warning

81.50 hours total

The Perl Foundation is looking at giving some grants ranging from $500 to $2000 in March 2013.

You don't have to have a large, complex, or lengthy project. You don't even have to be a Perl master or guru. If you have a good idea and the means and ability to accomplish it, we want to hear from you!

Do you have something that could benefit the Perl community but just need that little extra help? Submit a grant proposal until the end of January. You would like to have the money, you have the knowledge, but do not know what do propose? Ask around and you will probably get some ideas.

As a general rule, a properly formatted grant proposal is more likely to be approved if it meets the following criteria

  • It has widespread benefit to the Perl community or a large segment of it.
  • We have reasons to believe that you can accomplish your goals.
  • We can afford it (please, respect the limits or your proposal should be rejected immediately).

To submit a proposal see the guidelines at http://www.perlfoundation.org/how_to_write_a_proposal and TPF GC current rules of operation at http://www.perlfoundation.org/rules_of_operation. Then send your proposal to [email protected] Your submission should be properly formatted accordingly with our POD template.

Proposals will be made available publicly (on this blog) for public discussion, as was done in the previous rounds. If your proposal should not be made public,
please make this clear in your proposal and provide a reason.

Enrique Nell and Joaquin Ferrero reported:

Project status: https://docs.google.com/spreadsheet/ccc?key=0AkmrG_9Q4x15dC1MNWloU0lyUjhGa2NrdTVTOG5WZVE

CPAN distribution: http://search.cpan.org/dist/POD2-ES/

Project host: https://github.com/zipf/perldoc-es

If we hadn't lost so much time building a nuclear shelter in the backyard, our final report would have been ready by Christmas time... We still can't believe the Mayan prophecy was wrong!!

This is our last monthly grant report. It also includes a summary of the tasks completed during this 6-month period.

New files added this month

  • perlre
  • perlgit

Reported source pod bugs


Status of our v5.16 track (currently v5.16.2):

  • Total documents: 169 (109 in core docs)
  • Total words: 947,607 (459,773 in core docs)
  • % translated: 34.14% (63.61% of core docs)
  • % reviewed: 21.35%


In late December, Joaquín and Enrique met in Madrid to work on the translation process, and to discuss issues and ways to improve the tools.


Changes in postprocess.pl

Fixed a couple of bugs in the diff_file routine. Changed the input record separator (properly localized) while reading the file to \n\n, to prevent issues in segments containing line breaks. Also fixed a loop limit that caused missing the last segment.

Future work

  • We still have a lot pending in our TO DO list, so we will keep informing the community about our progress, most likely through blog posts announced in Perl Weekly.
  • Provide an English version of the translation process documentation in our github wiki.
  • In our first report we mentioned a couple of bugs related to the lack of internationalization features in podlators: The CPAN page for POD2::ES does not show the description of the translated modules because the POD processor does not find section NAME (it's NOMBRE in Spanish); also, POD links containing references to sections are partially rendered in English: «la sección "Plataformas compatibles" in perlport» (for Spanish, "in" should be "en"). We have analyzed this issue and located the modules that should be fixed, but we are not sure of the best way to fix it (we also tried to find ways to fix it on our side). We should contact the authors of these modules and the translation teams of other languages to discuss this and reach an agreement. An easy fix for the second issue that wouldn't require adding any language information would be changing the link processing routines to generate something similar to: «la sección "Plataformas compatibles" (perlport)», i.e., a language-independent output.
  • Compare the output returned by our terminology extraction tool with the results obtained using Lingua::FreeLing3 and Lingua::YaTeA, and explore the possibilities of bilingual extraction provided by Lingua::NATools. This could be the subject of a future blog post.
  • Translate into English the comments/labels in update_statistics.pl (currently in Spanish).

Summary of Project Tasks Completed During the Grant Period

Files added to the POD2::ES distribution

  • perlbot
  • perlboot
  • perlcheat
  • perlclib
  • perldata
  • perlexperiment
  • perlgit
  • perlglossary
  • perlhacktut
  • perlhist
  • perlmod
  • perlmodinstall
  • perlmroapi
  • perlnewmod
  • perlobj
  • perlootut
  • perlre
  • perltodo
  • perltooc
  • perltoot

The translation percentage increased from 42% to 63%, and the revision percentage increased from 3% to 21% (i.e., we ended up a little behind our estimate for revision, and a little ahead for translation).

However, these figures are a bit fuzzy, since new Perl versions are released often (we started our grant work in v5.14.0, and reached v5.16.2), and updating to a new version also changes the total word counts and usually requires more translation/revision work to align already translated files with their new versions.

Improvement/addition of tools

The basic post-processing script was improved and currently checks the POD syntax, generates a WYSIWYG html file for final checks, creates a diff file in html format to give feedback to the translator, keeps track of who translated what to update the translators section included in translated documents, and also sorts alphabetically the translated entries of perlglossary.pod.

We also added scripts to:

  • configure git
  • get the .pod files in a given source distribution to add them to the translation project
  • check the setup of POD2::ES
  • merge the reviewed segments into the translator's translation memory
  • update the statistics in the project tracking spreadsheet
  • compared pod versions in source and target folders.

Bug reports

Talk at the Portuguese Perl Workshop 2012 by Enrique Nell (in Portuguese)


2012 Year End Report

The Perl Foundation is proud to present this report to it's members. 2012 was a spectacular year for Perl and The Perl Foundation. TPF has supported the community via grants programs, conferences, and new outreach efforts. The community has, in turn, supported TPF through their generous donations of time and money.

We look forward to continuing our support as we begin Perl's next 25 years.

2012 Year End Report

About TPF

The Perl Foundation - supporting the Perl community since 2000. Find out more at www.perlfoundation.org.

About this Archive

This page is an archive of entries from January 2013 listed from newest to oldest.

December 2012 is the previous archive.

February 2013 is the next archive.

Find recent content on the main index or look in the archives to find all content.


OpenID accepted here Learn more about OpenID
Powered by Movable Type 6.2.2