November 2018 Archives

The Grants Committee is accepting grant proposals all the time. We evaluate them every two months and another evaluation period is upon us.

September 2018 Grant Votes

The Grants Committee has concluded the voting of the September 2018 round.

There were two proposals this round, both of which were approved and funded.

Original article was published on November 9, 2018

The overview page now shows all data displayed in the previous profiler's page as well as adds a "Start times of threads" chart. "GC" tab has been updated with sub-tabs to customise graphs using different display modes. The routines list now features a "goto" arrow for smooth and easy navigation.

Read more at: Where did I leave my AT-KEYs?

Where did I leave my AT-KEYs?

Even though it's only been a week and a half since my last report, there's already enough new stuff to report on! Let's dig right in.

shallow focus of lovelocks
Photo by NeONBRAND / Unsplash

Overview Page


Is there terribly much to say about this? It now shows the same data that was already available in the old profiler's overview page. It's the go-to page when you've changed your code in the hopes of making it faster, or changed your version of rakudo in hopes of not having to change your code in order to make it faster.

Here's a screenshot from the other profiler for comparison:


The main addition over the previous version is the "start times of threads" piece at the top left. In multi-threaded programs it shows you when more threads were added, for example if you use start blocks on the default ThreadPoolScheduler.

The GC Performance section gives you not only the average time spent doing minor and major collections, but also the minimum and maximum time.

The rest is pretty much the same, except the new version puts separators in numbers with more than three digits, which I find much easier on the eyes than eight-digit numbers without any hints to guide the eye.

GC Run List


The graphs at the top of the GC tab has changed a bit! There's now the option to show only major, only minor, or all collections in the graph, and there are three different display modes for the "Amounts of data" graphs.

The one shown by default gives bars split into three colors to signify how much of the nursery's content has been freed (green), kept around for another round (orange), or promoted to the old generation (red). That's the mode you can see in the screenshot above.


The second mode is "Combined Chart", which just stacks the amounts in kilobytes on top of each other. That means when more threads get added, the bars grow. In this example screenshot, you can barely even see orange or red in the bars, but this program is very light on long-lived allocations.


The third mode is "Split Charts", which has one chart for each "color". Since they all have their own scales, you can more easily see differences from run to run, even if some of the charts appear tiny in the "combined" or "combined relative" charts.

Routines List

The routines overview – and actually all lists of routines in the program – have a new little clickable icon now. Here it is:


The icon I'm talking about is the little up-right-arrow in a little box after a routine's name. When you click on it, the row turns blue. Huh, that doesn't sound so useful? That's because the button brings you to the routines list and scrolls to and highlights the routine you've clicked on. If you're already right there, you will not notice a lot of change, of course.

However, it gets more interesting in the callers or callees lists:


Even better, since it actually uses links to actual URLs, the browser's back/forward buttons work with this.

Other useful places you can find this navigation feature are the allocations list and the call graph explorer:



Where are my AT-KEYs at?

If you have a very big profile, a routine you're interested in may be called in many, many places. Here's a profile of "zef list". Loading up the call graph for this routine may just explode my computer:


Note the number of Sites: 27 thousand. Not good.

But what if you're already in the call graph explorer anyway, and you just want to find your way towards functions that call your routine?

Enter the search box:


As you can see, when you input something in the search bar, hand icons will point you towards your destination in the call graph.

I'm looking to add many more different kinds of searches, for example I can imagine it would be interesting to see at a glance "which branches will ever reach non-core code". Searching for files ought to also be interesting.

Another idea I've had is that when you've entered a search term, it should be possible to exclude specific results, for example if there are many routines with the same name, but some of them are not the ones you mean. For example, "identity" is in pretty much every profile, since that's what many "return"s will turn into (when there's neither a decont nor a type check needed). However, Distributions (which is what zef deals in) also have an "identity" attribute, which is about name, version, author, etc.

At a much later point, perhaps even after the grant has finished, there could also be search queries that depend on the call tree's shape, for example "all instances where &postcircumfix:{ } is called by &postcircumfix:{ }".

That's it?

Don't worry! I've already got an extra blog post in the works which will be a full report on overall completion of the grant. There'll be a copy of the original list (well, tree) of the "deliverables and inchstones" along with screenshots and short explanations.

I hope you're looking forward to it! I still need to replace the section that says "search functionality is currently missing" with a short and sweet description of what you read in the previous section :)

With that I wish you a good day and a pleasant weekend
- Timo

Maintaining Perl 5 (Tony Cook): October 2018 Grant Report

This is a monthly report by Tony Cook on his grant under Perl 5 Core Maintenance Fund. We thank the TPF sponsors to make this grant possible.

Approximately 49 tickets were reviewed, and 6 patches were

[Hours]         [Activity]
  2.18          #125760 re-test branch and apply to blead
                #125760 perldelta
 11.90          #126706 get tests working, work on installer with
                #126706 re-work to use @rpath
                #126706 polish, testing
                #126706 fixes, testing, comment with patch
                #126706 look at fixing embed tests, re-work to avoid rpath
                #126706 more re-work, simplify, testing, polish, comment
                with patch
  1.27          #131649 (sec) find fix in blead/5.28, backport to 5.26,
                comment with patch
  1.73          #132147 (sec) review, look for supposed other project
  2.44          #132782 review patches, work on tests, find an existing
                vec() bug, rebuild for debugging
                #132782 debugging, comment
  1.10          #133396 review ticket and code
  1.58          #133423 (sec) check blead, try to forward-port patch, test
                #133423 (sec) testing, comment
  0.35          #133439 retesting, apply to blead
  3.64          #133440 review discussion, research, work on improving the
                error messages
                #133440 more work on improving errors
  0.73          #133442 review, testing
                #133442 push to blead
  0.13          #133494 re-check, apply to blead
  0.75          #133511 review, research and comment
  1.40          #133519 try to reproduce, fail on win7, setup win10
                #133519 try to reproduce on win10
  1.58          #133523 (sec) review regexp code, comment
                #133523 (sec) review regexp code, comment
  2.54          #133535 debugging
                #133535 debugging, comment
  0.85          #133550 research, testing, apply to blead
  0.32          #133567 test, apply to blead
  0.95          #133582 comment with patch
  0.35          #133585 review and comment
  0.12          #133597 (sec) reject
  1.82          #133603 try to reproduce with gcc 8.1.0
                #133603 manage to reproduce, try to debug
  4.54          #133604 push skip, debugging, track down cause, testing
                #133604 work on regression tests, testing, apply to blead
  0.42          #133610 comment
  1.35          #133620 (sec) reproduce, work on bisect
                #133620 (sec) more work on bisect, comment
  0.27          #133630 comment
  0.17          ask khw about two of the issues, some tracking admin
  2.07          bang head against encoding issues trying to apply patches
  0.20          comment on private File::Slurp thread
  0.55          comment svtype thread
  0.50          debugging binmode
  1.37          diagnose :utf8 recv fatal tests on Win32 (binmode doesn’t
                appear to work), encounter some build issues along the way
  2.05          feature sysio_bytes
  2.35          feature sysio_bytes, add some :utf8 fatal tests for recv,
                send, testing
  1.38          feature sysio_bytes, debugging
  1.95          feature sysio_bytes, more testing
  1.00          feature sysio_bytes, rebase, debugging, testing
  1.92          feature sysio_bytes: docs, testing
  1.33          find cause of encoding issues, finish patching
  1.07          more security fix checks
  1.78          polish, perldeta, open RFC 133610
  2.08          reply security email, work on checking security fixes for
  1.25          request CVE IDs
  2.03          respond to security email from sawyerx, try to work out
                broken ranges for security tickets
  0.60          review Jim’s patches
  0.85          security ticket wrangling
  0.42          update tickets with CVE IDs
  1.08          utf8 readline track down problem
  2.02          utf8 readline – debugging
  0.22          utf8-readline testing
 74.55 hours total

Based on hints from the trac ticket for the OS X -Duseshrplib issue,
this has finally been fixed in blead.  [perl #126706]

This is a monthly report by Dave Mitchell on his grant under Perl 5 Core Maintenance Fund. We thank the TPF sponsors to make this grant possible.

I've been almost entirely absent from perl stuff for the last couple of
months - due to doing other things and lack of enthusiasm. Hopefully
things will start picking up.

Did a little bit of work on a couple of tickets

      0:40 RT #133518 svref_2object regression in 5.28.0
      0:30 RT #133523 Read from an invalid address
      1:10 TOTAL (HH::MM)

 264.0 weeks
3152.5 total hours
  11.9 average hours per week

There are 313 hours left on the grant

"Overview" tab is now functional but in flux. The "Routines" tab has been improved to include sorting functionality for columns, a minimal view in the "Paths" sub tab and a new "Callers" sub tab. An "Allocations" top level tab has also been added. Read more at: Full Screen Ahead!

Full Screen Ahead!

Whew, it's been a long time since the last report already! Let's see what's new.

train on bridge surrounded with trees at daytime
Photo by Jack Anstey / Unsplash

Improvements to the Routine Overview

The first tab you'll usually go to is the routine overview. It gives you a list of every routine that was encountered in the program. In a bunch of columns it shows you the name of the routine, along with the file it comes from and the line it's defined on. It tells you how often the routine was entered and how many of these entries were into or inside of jitted code.

The last column shows how much time the routine accounted for. The last part is split up twice. Once into inclusive vs exclusive time, which tells you how much total time was spent from entry to exit vs how much time was spent only inside the routine itself, rather than any routines called by it. The second split is into total time vs time divided by number of entries. That lets you more easily figure out which routines are slow by themselves vs which routines take a lot of time because they are called very, very often.

Usually it's best to start with the routines that take the most time in total, but routines that take longer per call may have more obvious optimization opportunities in them in my experience.

It is for that reason that being able to sort by different columns is extra important, and that wasn't available in the new tool for a whole while. That changed just yesterday, though! It still looks quite odd, because it's just a bunch of buttons that don't even react to which sorting mode is currently active, but that will change soon enough.


The routines overview of course lets you expand the individual routines to get at more details. The details that were already in the previous blog post, were Callees, Paths, and Allocations. All three of them have changed a bit, and one more has been added recently.

All that happened in the Callees tab is that it is now also sortable.

The Paths tab has got a little toggle button called "Minimal" that reduces every cell in the tree to a little square. On top of that, the cells are now colored by the routine's filename - in the normal view the lower border of the cells shows the color, in the minimal view the cells themselves are colored in.

Here's two screenshots comparing regular and minimal view:



The Allocations tab now shows the total amount of memory allocated by the given routine for the given type. Of course this doesn't directly correspond to memory usage, since the profiler can't tell you how long the given objects survive.


The new tab that was recently added is a very useful one. It's the Callers tab. It lets you see which routines have called the given routine, how often they called it, and to what percentage the routine got inlined into the callers. Here's a screenshot:


Allocations Tab

There's a whole new top-level tab, too. It's titled "Allocations" and does very much what you would expect.

The tab contains a big table of all types the profiler saw in the program. Every row shows the Type name along with potentially a link to the docs website, the size of each object and how much memory in total was allocated for it, and how often that type of object was allocated in total.

On the left there is a button that lets you expand the type's column to get extra details:


None of these things are very surprising, but they are useful.

Overview Tab

The overview tab used to just have a placeholder text. However, it now displays at least a little bit of information. Have a look!


It's still very fresh and I expect it to look completely different in a week or two.

What's with the "Full Screen" pun?

This post starts with a silly little pun on "Full Steam Ahead". What does it mean for the profiler tool?

The answer is very simple, it lets you use the whole width of your browser window. Bootstrap – which is the library of CSS styles and primitives I use so that I don't have to spend three fourths of the entire development time fiddling with CSS rules so that things look okay or even work at all – is convinced that the outermost Container element of the site shouldn't be wider than a specific width, probably because things can look kind of "lost" on a page that's not filled side-to-side with stuff. If you have big tables full of data or wide graphs with tiny bars, it sure is good to be able to expand the page sideways.

Here's a little animation I made a few weeks ago that shows the fullscreen button in action for one example.

You're my favourite customer!

Stefan 'nine' Seifert has recently been working on replacing the "mast" stage in the Perl 6 compiler.

In the simplest terms, it used to take the abstract syntax tree to rewrite it into a very flat tree consisting of lists of moar instructions. Those were then fed into MoarVM as objects, which were then translated into actual moar bytecode, literally as bytes.

The main drawback of the earlier approach was memory use. On top of all the QAST node objects that were the input to the mast stage, a huge batch of new objects were created before it was all passed to moar's bytecode compiler as one set. The more alive objects there are, the higher the maximum memory usage gets, and the longer it takes the GC to go through everything to decide what's garbage and what needs to stick around.

The new approach starts with the QAST nodes again, but immediately starts writing out bytes into buffers. That has a whole bunch of implications immediately: Every Object consists of at least a header that's a couple of bytes big. Buffers (and native arrays) are just a single object with a big consecutive blob of memory they "own". No matter how big the blob grows, the garbage collector only has to look at the object header and can ignore the blob itself! Not only does it save memory by not having as many individual objects, it also saves the garbage collector time!

Stefan told me on IRC that the new profiler tool already helped him a lot in making the new implementation faster.

I now managed to get a profile of the MAST stage using a patched --profile-stage, but the profile is too large even for the QT frontend :/
timotimo++ # moarperf works beautifully with this huge profile :)

timotimo++ # the profiler is indespensable in getting nqp-mbc merge worthy.

This obviously makes yours truly very happy :)

There's still lots of "paper cuts" to be ironed out from a User Experience standpoint, but I'm very glad that the tool is already making work possible that used to be impossible.

That's all the improvements I have to report for now, but I'll be continuing work on the UI and backend and write up another report (hopefully) soon!

BTW, a big conference on React.js (the tech I use to make the frontend tick) just happened and the react core dev team unveiled a few new APIs and tricks that I expect will let me make the existing code a lot simpler and write new code faster than before!

Also, during the conference Chris Trevino presented a new charting library based around the "Grammar of Graphics" philosophy(?) that is also the basis of vega and vega-lite which as I understand it is The Big Cheese among data visualization experts. I've been having some gripes with recharts, so I'll be checking this out soon!

Hope to see y'all in the next one!
- Timo

Jonathan writes:

My performance work in October focused for the most part on escape analysis and scalar replacement. This work remains in a branch, however it has now reached the milestone of performing its first couple of real-world optimizations, eliminating short-lived boxes and wrapper objects across inline boundaries. I also started with some long-planned work on more aggressive optimization of lexical variables, such that they are stored as "locals" where Rakudo can see they will never be accessed from outside of the current scope. This allows for slightly cheaper access, but more significantly much easier analysis in the specializer. This will later combine with the escape analysis work to eliminate many short-lived Scalar containers, which will in turn reduce the number of guards and allocations. I expect to merge both this work and the initial partial escape analysis and scalar replacement work in November.

On the reliability side of things, I tracked down a couple of problems relating to parametric types and precompilation. When we have a type like Array[Int], many modules may declare it, but we need to ensure that - no matter which modules we load - we only end up with one instance of this type, so that they will match and be considered equal. This was, in some cases, not happening. Now it is.

10:08   Continue work on escape analysis and scalar replacement
3:34    Develop more aggressive lexical to local lowering
        optimizations in the Rakudo optimizer (this will help all
        backends, not just MoarVM)
4:28    Debug and fix problems involving parametric types and

Total: 18:10

Total time spent on current grant period: 126:01
Total time remaining on current grant period: 73:59

Jonathan writes:

My main deliverable in September was a significant improvement to the performance of object construction and initialization. I wrote a blog post describing the ways in which this was achieved. I also improved the performance of array assignment, took on a tricky bug that stood in the way of merging a GC performance improvement, and took another small step with the work on escape analysis.

5:33    Lots of performance improvements to object initialization
3:24    Make ASSIGN-POS inlineable, improving array assignment
        performance; fix an optimization ordering problem while
        doing so, which not only helped array assignment
        performance, but had wider benefit
3:03    Hunt down and fix a bug introduced by an optimization to
        generational GC, allowing the optimization to be merged
1:20    Support scalar replacement of native attribute types in
        the upcoming escape analyzer
1:27    Fix problems with how we handled inlinees with multiple
        returns, such that they don't violate SSA form; this
        allows for more precise analysis of such code, which may
        lead to better optimization
1:16    Assorted other debugging/fixing of smaller bugs

Total: 16:03

About TPF

The Perl Foundation - supporting the Perl community since 2000. Find out more at

About this Archive

This page is an archive of entries from November 2018 listed from newest to oldest.

October 2018 is the previous archive.

December 2018 is the next archive.

Find recent content on the main index or look in the archives to find all content.


OpenID accepted here Learn more about OpenID
Powered by Movable Type 6.2.2