September 2018 Archives

My grant work in August focused on improving the performance of a number of very commonly used constructs in Perl 6 programs, of note array and hash access, box/unbox operations (which take place inside many of the builtins), and various common operations on Int (the default integer type in Perl 6, which is arbitrary precision). My work also uncovered a problem that could be best addressed with a small language design tweak, and I wrote up a proposal on that (which has since gained broad acceptable, and will be implemented for Perl 6.d).

I also made various improvements to the optimization infrastructure. Most notably, I made it possible to track what is known about the value coming into a guard vs. its state after the guard (guards are the things that ensure a speculative optimization still applies, and deoptimize if not). This most usefully allows for elimination of guards (for example, when program anlaysis - perhaps after inlining - proves a property that was guarded, then we can remove the guard). However, this change also means we properly deal with facts before/after the guard point; earlier we'd not tracked this precisely in a few cases and so had to forego optimization opportunities. Now we can take them.

I also improved GC behavior a bit, making values assigned into long-lived aggregates get promoted sooner and making the optimizer thread join in with GC in a more timely manner.

Finally, there was a range of other smaller optimizations, as well as a good amount of bug hunting/fixing.

1:12    Decrease GC latency during specialization, 
        so that we won't block it so long while optimizing longer frames
13:25   Lots of speedups to hash and array access and storage, 
        giving some notable benchmark wins
5:19    Refactor guards so we can track facts before/after, 
        allowing for redundant guard elimination
1:47    Pass on facts from return types when inlining, 
        allowing further guard elimination
0:23    Generate less decont instructions when compiling calls
0:25    Tweak code-gen for calls to put callee lookup closer to call when possible, 
        shortening lifetime of it
1:11    Research and write up proposal for changing the default scoping of $_ in v6.d
7:42    Improve performance of common box/unbox operations, including to a Perl 6 Int
8:15    Improvement of addition, subtraction, and multiplication of Perl 6 Int, 
        removing indirections and improving JIT for the non-bigint case
9:38    Implement a spesh plugin for return type checks, 
        reducing code size and allowing some speedups
0:25    Fix a DEPRECATED crash when a backtrace misses some frames
0:58    Track down and fix a SEGV when a debug type name was set from 
        multiple threads
1:02    Fix a segfault in basic block elimination's annotation movement
1:10    Improve GC handling of programs placing objects into long-lived aggregates
6:23    Fix various deoptimization bugs
0:58    Correct how log guards are handled at a PHI to avoid incorrect deletions 
        of guards that were depended on beyond the PHI
1:02    Optimize takedispatcher into an nqp::null on inlining where possible
2:32    Investigate a module regression due to serialization and repossession 
        related changes; fix it
3:42    More aggressive optimization of inlines and better size handling
1:52    Assorted small optimizations in Rakudo and with hllize elimination
8:32    Other assorted debugging, fixing, and testing

Work in the latter part of July, falling under the current round of the grant, focused mostly on hunting down and fixing bugs. A memory leak bug involving the thread pool scheduler was fixed; it was only problematic in some programs so far, but would have become far more noticeable with the new 6.d await semantics. Others fixes addressed regressions arising from recent optimization work, although in most cases the cause was more aggressive optimizations uncovering existing issues.

Last but not least, I started work on an important new optimization: partial escape analysis. This will, when working, enable us to elimiante many GC allocations and perform better analysis.

3:23    Fix a thread pool memory leak due to continuation 
1:06    Re-instate Proxy return handling bug for the sake 
        of the module ecosystem
2:41    Complete first-class frame reference refactors to 
        make them better handle inlining
3:48    Hunt down a deoptimization bug; fix it by correcting 
        uninlining instruction motion
1:00    Fix return handlers of void calls
2:04    Fix a couple of GC rooting/barrier issues
5:03    Start design and implementation exploration for 
        partial escape analysis

This is a monthly report by Dave Mitchell on his grant under Perl 5 Core Maintenance Fund. We thank the TPF sponsors to make this grant possible.

I didn't spend all that many hours during August on perl work.

I spent most of my time looking at a bug related to restoring of captures
within regex repeats. During the course of that, I took the opportunity to
simplify and cleanup some of the code in S_regmatch() which deals with
captures, and in particular, make it consistently use macros which produce
debugging output when opening or closing or restoring capture indices.

     18:03 RT #133352 Ancient Regex Regression
      1:30 RT #133429 Time-HiRes/t/itimer.t: intermittent failures
      1:36 RT #133441 no assignment to "my" variable  
     21:09 TOTAL (HH::MM)  

 254.7 weeks
3151.4 total hours
  12.4 average hours per week

There are 315 hours left on the grant

Call for Grant Proposals (September 2018 Round)

The Grants Committee is accepting grant proposals all the time. We evaluate them every two months and another evaluation period is upon us.

The first public release! Code is now hosted in GitHub. Please see the instructions on how to install and run.

The release features a renewed "Routines" tab. Please read Timo's blog post to know how it compares to the previous profiler: The first public release!.

The first public release!

Hello esteemed readers, and thank you for checking in on my progress. Not a full month ago I showed off the GC tab in the previous post, titled "Wow check out this garbage". Near the end of that post I wrote this:

I had already started the Routine Overview tab, but it's not quite ready to be shown off. I hope it'll be pretty by the time my next report comes out.

Well, turns out I got a whole lot of work done since then. Not only is the Routine tab pretty, but there's also a Call Graph explorer beside it. This post has the details, and at the end I'll give you the link to the github repository where you can get the code and try it out for yourself!

Routine Tab

Routine Overview

Now, the simplest part of the Routine tab is its overview function. Every routine that has been called while the profiler was recording shows up here. Here's a screenshot so you can understand what I'm referring to:


For comparison, here's the old profiler's Routine Tab


There's actually a lot of differences. Going from the old to the new, the Interp / Spesh / Jit column has disappeared, and with it the OSR badges. Also, the table headers are no longer clickable (to change the sorting order) and there is no filter search field. Both of these features will make a come-back in the new profiler's UI as well, though!

There are also additions, though: There's now a column labelled "Sites", some of the filename + line texts are clickable (they take you directly to github, though opening the files locally in an editor is a feature on my wish-list), and there's a column of mysterious buttons. On top of that, you can now see not only the exclusive / inclusive times for each routine, but also how much time that is when divided by the number of entries.

I wonder what happens when I click one of these!



Neat, clicking the button expands an extra section below the routine. It has three tabs: Callees, Paths, and Allocations. Let's go through them one by one.


Listed here are all routines that got called by the parent routine (in this case ACCEPTS from the Regex source file). They are ordered by inclusive time, as opposed to the outer list which is ordered by exclusive time. [1]

Since there is now a parent/child relationship between the routines, there's also the number of entries per entry in the entries column. That's simply the entries of the child divided by the entries of the parent. This number can tell you how often another routine is mentioned, or what the probability for the child being called by the parent is.

Paths, and what's this about "Sites"?


The next tab you can open in the expanded view is called "Paths". Here you can see a vaguely tree-shaped table with four rows. These rows actually correspond to the Sites column that I have not explained yet. That's because the number of Sites corresponds directly to the number of rows in this table, or the number of leafs in the tree it displays.

The same Routine can behave in different ways depending on where it was called from. Normally, a difference in arguments passed is the main cause of different behaviour, but it's very likely that each unique location in the program will call the Routine the same way every time. Such a "location" is sometimes called a "Call Site", i.e. the site where a call resides. A Site in the profiler's nomenclature refers to one specific path from the outermost routine of the profile to the given Routine. In the screenshot above, all paths to ACCEPTS go through to-json either once, twice, or three times. And every path goes through str-escape.

The names in the table/tree (trable?) are all clickable. I can tell you right now, that they bring you right over to the Call Graph. More on that later, though.



This is a small one, at least for ACCEPTS. It has one row per type, and splits the number of objects created for each type into before spesh and after spesh/jit. Coincidentally, the ACCEPTS method is already a good example for having the split: BOOTHash is just a lower-level hash class used in some internals. Notably, BOOTHash is used to pass named arguments to methods. Of course, many method invocations don't actually pass named arguments at all, and many methods don't care about named arguments either. Thankfully, spesh is competent at spotting this situation and removes all traces of the hash ever existing. The Scalar on the other hand seems to be used, so it stays even after spesh optimized the code.

There's also a Sites column here. This lets you spot cases where one or two sites differ strikingly from the others.

Call Graph Tab


Here's the new Call Graph tab. It's similar to the old one with one major omission. The new version of the call graph explorer currently lacks a flame graph (or icicle graph in this case). It will return later, of course.

Until then, there's a few improvements to be enjoyed. One of them is that the breadcrumbs navigation now works reliably, whereas in the previous profiler it tended to lose elements near the beginning or in the middle sometimes. On top of that, your browser's back and forward buttons will work between nodes in the call graph as well as the different tabs!

Something that's completely new is the Allocations section at the bottom of the page, shown in the screenshot below:


Here you can see that the Routine in question (it's the body of this for loop allocates Str and Scalar objects. The Str objects seem to be optimized by spesh again.

There's still two unexplained buttons here, though. The one at the bottom is labelled "Load inclusive allocations". Clicking on it reveals a second table of allocations, which is quite a lot bigger:


This view is about everything allocated by anything in the call graph from the current node downwards (towards the leaves, not towards the root. You know, because trees hang from the ceiling, right?). For something so close to the root in a rather deep call graph, you'll get quite a big list, and it's not very helpful in finding where exactly individual types are being allocated.

That's where the second button comes in. It says "Show allocations for all children" on it, and clicking it expands every row in the table of routines:


This way you can drill down from the root towards nodes that interest you.

What's missing?

Here's a little overview of what I'd like to include in the near and medium future:

  • Features the old profiler had
    • An overview page that shows a lot of things at once
    • Sorting and filtering in the routine list, GC table, ...
    • An icicle graph or flame graph in the call graph tab
    • An allocation overview tab
    • An overview over deoptimizations and on-stack-replacements
    • A graph of cleared / retained / promoted bytes in the GC
  • Features that would be new
    • Searching deeply in the call graph
    • Sorting by fields that the old profiler didn't have, like time per entry
    • Filtering the GC view to show only major or only minor collections
    • Callers tab for expanded routines in the overview
    • Perhaps: An "inlining performance" tab for every routine, since inlining is such an important part of Perl 6 performance
  • Features requiring changes to the MoarVM and NQP parts of the profiler
    • Expose the actual size of types in the allocation displays
    • Split the GC's cleared / retained / promoted bytes into managed and unmanaged parts. [2]
    • Split time spent in routines into time spent in interpreted vs speshed/jit modes (could be difficult with OSR?)
    • Record the "earliest call" time for every call graph node

Where can I get it?

I just uploaded all the code to my github. You can find it here. To use it, you will have to run npm install in the source folder to get all the javascript dependencies, and npm run build to compile all the javascript bundles. It will run a watcher process that will update as soon as any sources change, but if you're not working on the moarperf source code, you can just kill it after it has done its thing once.

Next, you'll have to install the dependencies of the Perl 6 program that acts as the back-end. You can do that with zef --depsonly install in the same folder as the META6.json. Please note that I haven't prepared the app for actually being installed, so don't do that yet :)

You can then start the backend with perl6 -Ilib service.p6 /path/to/profile.sql, where profile.sql is a profile generated with a commandline like perl6 --profile --profile-filename=profile.sql my_script.p6. Passing the filename to service.p6 on the commandline is optional, you can also enter it in the web frontend. By default, it is reachable on http port 20000 on localhost, but you can set MOARPERF_HOST and MOARPERF_PORT in the environment using the env or export commands of your shell.

A word of warning, though: It still has a bunch of rough edges. In some parts, loading data isn't implemented cleanly yet, which can lead to big red blocks with frowny faces. A refresh and/or doing things more slowly can help in that case. There's places where "there is no data" looks a lot like "something broke". Little usability problems all around.

I would appreciate a bit of feedback either on the github issue tracker, on #perl6 on the freenode IRC server, on reddit if someone's posted this article in /r/perl6, or via mail to the perl6-users mailing list or to timo at this website's domain name.

Thanks again to The Perl Foundation for funding my grant, and to you for reading my little report.

Have fun with the program!
- Timo

  1. The reasoning behind the different default sorting modes is that in the first step you are likely more interested in routines that are expensive by themselves. Sorting by inclusive time just puts the entry frame first, and then gives you the outermost frames that quite often hardly do any work themselves. When you've found a routine that has a lot of exclusive time, the next step is - at least for me - to look at what routines below that take up the most time in total. That's why I prefer the inner routines list to be sorted by inclusive time. Once the user can change sorting in the UI i'll also offer a way to set the selected sorting as the default, i think. ↩︎

  2. The difference between managed and unmanaged bytes is that the managed bytes are what lands in the actual nursery, whereas unmanaged bytes refers to all kinds of extra data allocated for an Object. For example, an Array, Hash, or String would have a memory buffer somewhere in memory, and a header object pointing to that buffer in the nursery or gen2 memory pools. The managed size doesn't change for objects of the same type, which is why it's fine to put them in the allocations tabs. ↩︎

Maintaining Perl 5 (Tony Cook): August 2018 Grant Report

This is a monthly report by Tony Cook on his grant under Perl 5 Core Maintenance Fund. We thank the TPF sponsors to make this grant possible.

Approximately 44 tickets were reviewed, and 4 patches were applied.

[Hours]         [Activity]
  1.23          #p5p discussion with khw on shared hash problem
  4.73          #132158 testing, debugging, work on a fix, more testing
                #132158 consider other fixes, testing, comment with
  1.60          #132655 (sec) work on test code, produce a fix and a
                patch, comment with patch
  0.72          #132683 re-check patch, test and apply to blead
  0.35          #133002 review, make public, merge into 132609
  1.71          #133204 (sec) research, comment
                #133204 (sec) comment
  0.30          #133250 (sec) review, comment
  0.37          #133292 review and briefly comment
  1.87          #133314 work on test code
                #133314 more testing, apply to blead
  8.73          #133326 debugging, work on fixes
                #133326 more work on fixes, testing
                #133326 debugging, clean up, testing
                #133326 polish, commit message, comment with patch
                #133326 re-test, apply to blead
  0.40          #133331 (sec) review, ask for more information
                #133331 (sec) briefly comment
  2.45          #133334 (sec) debugging
                #133334 (sec) debugging
                #133334 (sec) comment
  0.05          #133335 (sec) brief testing and comment
  0.95          #133345 (sec) testing, comment
  0.95          #133376 review patch, test build, testing, apply to blead
  0.07          #133406 (sec) comment and close
  0.55          #133417 review patches, original discussion, comment about
                the commit messages
                #133417 review, research and comment
  2.29          #133422 review ticket, look over code
                #133422 work on a fix
                #133422 more work on fix, testing
  0.38          #133423 (sec) private IRC discussion with khw
  0.30          #133431 review and briefly comment
  0.55          apm821xx cross build thread, research, comment
  0.47          Briefly review khw’s khw-core branch
  0.95          hacking on feature.h thread, review code, testing, comment
  2.07          reply sawyer email on unicode filenames on Win32
  1.88          review maint-votes, add some commits, cherry pick/test two
  0.37          review security issues, make 133345 public
  0.77          review security queue
  1.53          security queue review/clean up
  0.33          security ticket round up
  1.17          utf8_readline: consider optimizations
  1.38          utf8_readline: croak tests, add another test and find a
  1.45          utf8_readline: debug/fix surrogate bug, more tests
  2.98          utf8_readline: debugging, fix short handling bug, more
                testing, find surrogate handling issue, debugging, croak
                on surrogate bug
  1.87          utf8_readline: debugging, possible fix, testing
  1.72          utf8_readline: debugging, testing, debug a test failure
                and consider possible fixes
  1.87          utf8_readline: more optimizing non-mutating stream
  2.45          utf8_readline: more optimizing, building/testing + #p5p
                discussion of tainted \p{} property names with khw
  2.08          utf8_readline: more tests
  1.27          utf8_readline: polish, testing
  0.58          utf8_readline: rebase, testing, review
  1.83          utf8_readline: simplify some code based (oldish)
                discussions with khw, testing
  1.90          utf8_readline: work on fix for optimized code consuming
                input incorrectly
  1.53          utf8_readline: work on optimizing non-mutating stream
  2.63          utf8_readline: write more tests,clean up some of the
 65.63 hours total

July 2018 Grant Votes

The Grants Committee has concluded the voting of the July 2018 round. My apologies for the delay.

There was one proposal this round, which was not approved.

About TPF

The Perl Foundation - supporting the Perl community since 2000. Find out more at

About this Archive

This page is an archive of entries from September 2018 listed from newest to oldest.

August 2018 is the previous archive.

October 2018 is the next archive.

Find recent content on the main index or look in the archives to find all content.


OpenID accepted here Learn more about OpenID
Powered by Movable Type 6.2.2