Grant report: Optimizations building upon the new Raku dispatch mechanism

Sat, 04-Dec-2021 by Matthias Bloch edit post

Jonathan writes: <hr> My latest [grant proposal](https://news.perlfoundation.org/post/grant_proposal_raku_dispatch_optimizations) was [recently approved](https://news.perlfoundation.org/post/grants_nov_2021_votes). I had the possibility to dedicate quality time to Raku work at the start of November, but knew I would be tied up with some other work in the latter part of the month. Thus I optimistically forged ahead with some grant work while I could, crossing my fingers for approval. This report covers what got done. The main completed task was to reorganize and streamline return and stack unwind handling. My work on the new dispatch mechanism introduced a new callstack layout. This created an opportunity to simplify the way we handle stack unwinding - that is, removing frames either because we are returning or because of an exception. Since this happens for every single non-inlined block or routine that we call, savings here have an impact on all but the most micro of micro-benchmarks. For example, a recursive Fibonacci benchmark (written recursively to frustrate inlining) showed a 5% improvement from this work. The work also elimiated some `malloc` calls in favor of callstack allocation in a number of situations, and resulted in overall simpler and smaller code in MoarVM. Faster and simpler is certainly welcome. I also did some optimization on frame invocation, primarily by splitting the specialized and unspecialized callframe setup paths, which allowed for eliminating a number of branches that the C compiler was not able to. This new factoring also revealed an opportunity to fold two `memset` calls into one, which was also a welcome saving. This was worth a further 3% off the recursive Fibonacci benchmark. (To give a picture of how Raku compares with Perl in this benchmark, Raku runs it in around two thirds of the time, despite the fact that it has to cope with the potential upgrade of `Int` to a big integer.) A central goal for the grant as a whole is to make progress on escape analysis. When I worked on this previously, a particular challenge was the reliance on attribute container to vivify (get allocated) upon first touch. This was not so much an optimization as a means to determine if an attribute had been initialized, for the purpose of running defaults. Unfortunately, however, it greatly complicates the escape analysis of object graphs at creation time, and makes all attribute access a little more costly. (One could also get occasionally surprised by the fact that reading an attribute during a constructor would count as initializing it too.) Thus, I started working on a new appraoch, based upon container descriptors, which are also the mechanism used in array and hash element auto-vivification. The work in progress is currently a pull request, which needs further work to analyze why it causes regressions in a small number of modules; this new approach does, however, already passes the specification test suite. I also did some design work for a faster and simpler way to handle `LEAVE` blocks. Today they carry quite some performance overhead, we are unable to ever inline them, and needing to support them imposes a small, but non-zero, cost on the exit of every callframe, regardless of if they have a `LEAVE` block. The new design I have worked on should fix all of these issuses, and I hope to implement it during December. Even if `LEAVE` is rarely directly used, it plays an important part in ensuring locks are reliabily released, and so is used implicitly in many programs. Finally, I also tracked down and fixed a bug in the intersection of dispatch resumption and inlining. ``` Time spent on the grant: 30 hours 52 minutes Time remaining on the grant: 169 hours 8 minuates ```

Category: Raku Development

Grant report: Optimizations building upon the new Raku dispatch mechanism

Comments (0)

Sign in to add comment

About TPF

Categories

Popular Tags

Recent Tags

Get Perl

Links