Sun, 16-Sep-2018 by Alan Haggai Alavi edit post
The first public release! Code is now hosted in GitHub. Please see the instructions on how to install and run.
The release features a renewed "Routines" tab. Please read Timo's blog post to know how it compares to the previous profiler: The first public release!.
Hello esteemed readers, and thank you for checking in on my progress. Not a full month ago I showed off the GC tab in the previous post, titled "Wow check out this garbage". Near the end of that post I wrote this:
I had already started the Routine Overview tab, but it's not quite ready to be shown off. I hope it'll be pretty by the time my next report comes out.
Well, turns out I got a whole lot of work done since then. Not only is the Routine tab pretty, but there's also a Call Graph explorer beside it. This post has the details, and at the end I'll give you the link to the github repository where you can get the code and try it out for yourself!
Now, the simplest part of the Routine tab is its overview function. Every routine that has been called while the profiler was recording shows up here. Here's a screenshot so you can understand what I'm referring to:
For comparison, here's the old profiler's Routine Tab
There's actually a lot of differences. Going from the old to the new, the Interp / Spesh / Jit
column has disappeared, and with it the OSR badges. Also, the table headers are no longer clickable (to change the sorting order) and there is no filter search field. Both of these features will make a come-back in the new profiler's UI as well, though!
There are also additions, though: There's now a column labelled "Sites", some of the filename + line texts are clickable (they take you directly to github, though opening the files locally in an editor is a feature on my wish-list), and there's a column of mysterious buttons. On top of that, you can now see not only the exclusive / inclusive
times for each routine, but also how much time that is when divided by the number of entries
.
I wonder what happens when I click one of these!
Neat, clicking the button expands an extra section below the routine. It has three tabs: Callees, Paths, and Allocations. Let's go through them one by one.
Listed here are all routines that got called by the parent routine (in this case ACCEPTS
from the Regex
source file). They are ordered by inclusive time, as opposed to the outer list which is ordered by exclusive time. [1]
Since there is now a parent/child relationship between the routines, there's also the number of entries per entry
in the entries column. That's simply the entries of the child divided by the entries of the parent. This number can tell you how often another routine is mentioned, or what the probability for the child being called by the parent is.
The next tab you can open in the expanded view is called "Paths". Here you can see a vaguely tree-shaped table with four rows. These rows actually correspond to the Sites
column that I have not explained yet. That's because the number of Sites
corresponds directly to the number of rows in this table, or the number of leafs in the tree it displays.
The same Routine
can behave in different ways depending on where it was called from. Normally, a difference in arguments passed is the main cause of different behaviour, but it's very likely that each unique location in the program will call the Routine
the same way every time. Such a "location" is sometimes called a "Call Site", i.e. the site where a call resides. A Site
in the profiler's nomenclature refers to one specific path from the outermost routine of the profile to the given Routine
. In the screenshot above, all paths to ACCEPTS
go through to-json
either once, twice, or three times. And every path goes through str-escape
.
The names in the table/tree (trable?) are all clickable. I can tell you right now, that they bring you right over to the Call Graph
. More on that later, though.
This is a small one, at least for ACCEPTS
. It has one row per type, and splits the number of objects created for each type into before spesh
and after spesh/jit
. Coincidentally, the ACCEPTS
method is already a good example for having the split: BOOTHash
is just a lower-level hash class used in some internals. Notably, BOOTHash
is used to pass named arguments to methods. Of course, many method invocations don't actually pass named arguments at all, and many methods don't care about named arguments either. Thankfully, spesh
is competent at spotting this situation and removes all traces of the hash ever existing. The Scalar
on the other hand seems to be used, so it stays even after spesh optimized the code.
There's also a Sites
column here. This lets you spot cases where one or two sites differ strikingly from the others.
Here's the new Call Graph
tab. It's similar to the old one with one major omission. The new version of the call graph explorer currently lacks a flame graph (or icicle graph in this case). It will return later, of course.
Until then, there's a few improvements to be enjoyed. One of them is that the breadcrumbs navigation now works reliably, whereas in the previous profiler it tended to lose elements near the beginning or in the middle sometimes. On top of that, your browser's back and forward buttons will work between nodes in the call graph as well as the different tabs!
Something that's completely new is the Allocations section at the bottom of the page, shown in the screenshot below:
Here you can see that the Routine
in question (it's the body of this for loop allocates Str
and Scalar
objects. The Str
objects seem to be optimized by spesh again.
There's still two unexplained buttons here, though. The one at the bottom is labelled "Load inclusive allocations". Clicking on it reveals a second table of allocations, which is quite a lot bigger:
This view is about everything allocated by anything in the call graph from the current node downwards (towards the leaves, not towards the root. You know, because trees hang from the ceiling, right?). For something so close to the root in a rather deep call graph, you'll get quite a big list, and it's not very helpful in finding where exactly individual types are being allocated.
That's where the second button comes in. It says "Show allocations for all children" on it, and clicking it expands every row in the table of routines:
This way you can drill down from the root towards nodes that interest you.
Here's a little overview of what I'd like to include in the near and medium future:
I just uploaded all the code to my github. You can find it here. To use it, you will have to run npm install
in the source folder to get all the javascript dependencies, and npm run build
to compile all the javascript bundles. It will run a watcher process that will update as soon as any sources change, but if you're not working on the moarperf source code, you can just kill it after it has done its thing once.
Next, you'll have to install the dependencies of the Perl 6 program that acts as the back-end. You can do that with zef --depsonly install
in the same folder as the META6.json
. Please note that I haven't prepared the app for actually being installed, so don't do that yet :)
You can then start the backend with perl6 -Ilib service.p6 /path/to/profile.sql
, where profile.sql is a profile generated with a commandline like perl6 --profile --profile-filename=profile.sql my_script.p6
. Passing the filename to service.p6
on the commandline is optional, you can also enter it in the web frontend. By default, it is reachable on http port 20000 on localhost, but you can set MOARPERF_HOST
and MOARPERF_PORT
in the environment using the env
or export
commands of your shell.
A word of warning, though: It still has a bunch of rough edges. In some parts, loading data isn't implemented cleanly yet, which can lead to big red blocks with frowny faces. A refresh and/or doing things more slowly can help in that case. There's places where "there is no data" looks a lot like "something broke". Little usability problems all around.
I would appreciate a bit of feedback either on the github issue tracker, on #perl6
on the freenode IRC server, on reddit if someone's posted this article in /r/perl6
, or via mail to the perl6-users
mailing list or to timo
at this website's domain name.
Thanks again to The Perl Foundation for funding my grant, and to you for reading my little report.
Have fun with the program!
- Timo
The reasoning behind the different default sorting modes is that in the first step you are likely more interested in routines that are expensive by themselves. Sorting by inclusive time just puts the entry frame first, and then gives you the outermost frames that quite often hardly do any work themselves. When you've found a routine that has a lot of exclusive time, the next step is - at least for me - to look at what routines below that take up the most time in total. That's why I prefer the inner routines list to be sorted by inclusive time. Once the user can change sorting in the UI i'll also offer a way to set the selected sorting as the default, i think. ↩︎
The difference between managed and unmanaged bytes is that the managed bytes are what lands in the actual nursery, whereas unmanaged bytes refers to all kinds of extra data allocated for an Object. For example, an Array
, Hash
, or String
would have a memory buffer somewhere in memory, and a header object pointing to that buffer in the nursery or gen2 memory pools. The managed size doesn't change for objects of the same type, which is why it's fine to put them in the allocations tabs. ↩︎
The Perl Foundation - supporting the Perl community since 2000. Find out more at www.perlfoundation.org.