Raku Dispatch and Compiler Improvements: Grant Report Jonathan Worthington
Tue, 14-Sep-2021 by
Matthias Bloch
edit post
Jonathan reports a lot of progress on his grant. We would like to thank the sponsors and Jonathan for his work.
Here is his report:
---
# Raku Dispatch and Compiler Improvements Grant Update
Since the [approval](https://news.perlfoundation.org/post/grants_may_2021_votes)
of my [grant](https://news.perlfoundation.org/post/grant_proposal_raku_dispatch_compiler_improvements)
in late June, I have been making a lot of progress with it. The grant allowed
me to dedicate the vast majority of my working time in July and August to Raku
(although I was away for 2 weeks of August on vacation). This report covers
the work done between grant approval up to the end of August.
The key goal of the grant is to bring my work on a new generalized dispatch
mechanism to the point where it can be merged and delivered to Raku users.
In summary, the new dispatch mechanism:
* Delivers greatly improved performance for a number of constructs that
are very slow in Rakudo/MoarVM today, including deferral with `callsame`
and other such functions (thus also aiding code using `wrap`), multiple
dispatch involving `where` clauses or named arguments, method calls on
roles that are punned into classes, invocation of objects that implement
`CALL-ME`, and others.
* Replaces many special-case performance mechanisms with a single, general,
programmable one. This simplifies MoarVM internally, while simultaneously
allowing it to do more optimization.
Far more details can be found in the presentation I gave about this work at
The Raku Conference 2021 ([slides](https://jnthn.net/papers/2021-trc-dispatch.pdf),
[video](https://www.youtube.com/watch?v=yRFyGDVHl0E)).
At the point the grant got underway, the new dispatch mechanism was looking
promising, but still some distance from being ready to ship. The work so far
under this grant has decisively changed that, the expectation being that it
will be merged shortly after the September monthly releases (of Rakudo and
MoarVM) and thus be delivered to Raku users in the October releases.
Key tasks performed under the grant up to the end of August are as follows:
* Switch all method and subroutine dispatches in both NQP and Raku over to
using the new dispatch mechanism, taking care of cross-language calls
(for example, where the compiler calls bits of Raku code at `BEGIN` time)
* Switch over all implicit calls emitted during compilation to use the new
dispatch mechanism also
* Switch the regex compiler over to emitting its calls using the new dispatch
mechanism
* Replace the boolification mechanism and complex `if`/`unless` object ops,
which previously involved an opaque chunk of C code, over to the new
dispatch mechanism; this eliminated a bunch of code in the optimizer too
* Replace NQP's stringification and numification - which also involved a
bunch of custom logic in MoarVM - with a dispatcher
* Bring the implementation of Raku multiple dispatch using the new dispatch
mechanism to completion, including handling of required named arguments,
typed exceptions on dispatch failure, `Junction` failover, `Proxy` args,
dispatch based on argument unpacking, and `nextcallee` support in complex
dispatch cases
* Add support for `callwith` to the method, wrap, and multiple dispatchers
* Various fixes to `lastcall` handling
* Switch NQP's multiple dispatch over to the new dispatcher
* Implement support for `CALL-ME`, which can be handled far more efficiently
using the new dispatch mechanism (current Rakudo has an intermediate
invocation that leads to slurping and re-flattening arguments, which in turn
frustrates optimization; with the new dispatcher, the `CALL-ME` body can even
be a candidate for inlining)
* Handle coercions using the new dispatch mechanism, again with some
performance wins
* Replace the `findmethod`, `tryfindmethod`, and `can` ops with a dispatcher
based solution; while the use of `nqp::ops` in modules is discouraged, these
are among the more common ones, so retaining the API compatibility is good
for the module ecosystem
* Implement a dispatcher-based solution for `istype`: if the answer cannot be
given by the type cache, then a dispatcher is now used for the fallback. This
opens the door to a range of future optimizations.
* Implement sink handling in Raku using a dispatcher, which in turn allows us
to avoid a huge number of method calls in the common no-op situation, by
instead using a type guard and mapping it directly to `Nil`
* Eliminate lots of superseded mechanisms in MoarVM: the multiple dispatch
cache, smart coercion ops, the method cache, the legacy argument capture
data structure, the invocation protocol mechanism, and the legacy calling
conventions
* Replace a number of Rakudo extension ops with dispatcher-based solutions
(these are C extensions to MoarVM, which we are seeking to fully eliminate;
while this is not a goal for the new dispatcher work, we are now down to
around 10 of them, putting it in reach in the near future; this is of some
end user interest as it is currently a blocker for making a single executable
that bundles MoarVM, Rakudo, and a program)
* Reinstate type statistics collection when using the new dispatcher, so the
type specializer can start to do its optimization work again
* Start translating dispatch programs built at callsites into sequences of
ops, including guards. This means that, in specialized code, we can very
often avoid interpreting dispatch programs, and instead have JITted guard
sequences (with the guards potentially being eliminated), and also exposes
dispatches resulting in bytecode invocation for further optimization
* Reinstate specialization linking for bytecode invocations (this is where
one piece of specialized code can directly call a specialized form of the
caller without additional type checks); this is restricted so far to
calls that don't have potential resumptions, so doesn't yet work for method
or multi calls, for example
* Resinstate inlining, with the same restrictions as for specialization
linking
* Reinstate OSR (On Stack Replacement, used to switch hot loops into their
optimized form when it is available)
* Design and implement a solution for better handling of megamorphic method
callsites, and make use of it in the NQP method dispatcher
A few other improvements were made not directly related to the new dispatch
mechanism, but because the opportunity for improvement was spotted during
performance analysis:
* Rework how action methods are invoked, such that most such invocations are
monomorphic rather than all going through a megamorphic site; this should
allow simple action methods to even be inlined in the future
* Make specializer statistics cleanup much cheaper, meaning the specializer
thread can spend more time doing useful work
The total time worked up to the end of August on the grant is **144 hours
42 minutes**, meaning that 55 hours and 18 minutes remain.
Comments (0)