Jim Cromie
[hidden email]
How much is your project worth? $3000
Memory allocation enhancements in core (sv.c).
Perl's variable namespace model is very flexible, users can:
- create vars, in any package, or in my scope, by naming them; - give them complex values: my $foo = [ 1, { a => 2}, 3 ]; - share/assign/shallow-copy them: $main::bar = $foo; - crosslink or self ref them: $a[2] = [$a[2], $a[1]]; - other hairy stuff
This user data is all built on-demand from an inventory of sv-parts which is kept on the interpreter's freelists (sv_root, PL_body_roots). These are refilled periodically by S_more_bodies, which gets-an-arena, slices it into sv-parts, and threads them onto the freelist.
This can result in user data spread across memory like a spiderweb in a corner; its hard to clean the corner without destroying the web. IOW, it makes memory reclaim "hard", and probably ineffective. As a result I think, perl core has never really seen the need/benefit to bother reclaiming arenas.
One important workload however could benefit; Storable::freeze() uses a ptr-table to track SVs that it has %seen, but its PTEs hang off the interpreter until process termination. For a long-running process, this is clearly suboptimal.
1st, theres this in perltodo:
use less 'memory' Investigate trade offs to switch out perl's choices on memory usage. Particularly perl should be able to give memory back.
This task is incremental - even a little bit of work on it will help.
This is deep core work, benefits accrue to users of 5.14, which is eventual target. Since the interfaces changed are internal, it may be possible to get it into 5.12.x.
Currently, Storable::freeze() uses ptr-tables to track seen SVs as it freezes them, so that it honors shared linkages. Doing this on large datasets will allocate a huge ptr-table, which when freed, releases all those PTEs back to the interpreter-global freelist, where they hang uselessly until process death (or interpreter shutdown).
The work proposed below appears to provide a workable mechanism to implement the private-arenas that Tim Bunce expressed want/need for, with Nicholas Clark's comments, here:
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2009-12/msg00821.html
By my 1st reads, Tim wants to coax a set of SV allocations to be taken out of separate arenas, to protect them from others. Nick outlined a solution that largely fits with my earlier revision of this grant proposal (Aug, 09), but added a discussion of savestacks, and implied (to me, at any rate) a need for a robust underlying mechanism, prompting this revision of the proposal.
General benefits will likely flow from finding out what nytprof needs, and figuring out how to provide it :-D
here are the major elements
The short version: - adapt get-arenas(sig): sv_type arg2 -> (void*) reqid and track allocs by the reqid - propagate that to S_more_bodies and its macro wrappers - add release-arenas() stub 1st
With get-arenas(reqid), we can track arenas by its users, with S_more_bodies we can extend that tracking to the interpreter's svtype consumers individually. With unique tracking of arena users, we can offer release-arenas(reqid), and since we're an internal sub-system interface, expect them to use it properly.
S_more_bodies()
outer-users (disregarding the macro-wrapper) keep
their current interface, the arenas provisioned by it for each sv_type
are transparently tracked, and can soon be reclaimed.
get-arena/release-arena give a balanced api for clients to manage slabs of memory themselves. The api is minimal, allowing and requiring simply that callers of get-arena(reqid) do:
- call release-arenas(reqid) when done with mem. - know theyre not sharing parts of the arenas when done. - dont abuse the reqids of others, ref your own object. - users can create and abandon arenas (be careful!)
With this, users hacking in core can allocate many slabs, of various sizes, using just one reqid, assemble them with pointers into arbitrary structures, and when done, know that they're all cleaned up together. Users may also use multiple reqids to simplify their memory reclaim operations.
It should also be flexible and efficient enough for use by XS libraries, given their tolerance for newness.
Given the Storable use case, this has potential merit; being parsimonious with PTE mem by default will work for some users.
But for less specific cases, the global PTE freelist probably wins a
performance contest; the malloc demand is intrinsically less when PTEs
are reused, not only freeze()
uses ptr-tables, and its only
pathological cases that would even cause notice.
Nonetheless, it provides a test-case for 1st use of the new interface, and an alternate ptr-table implementation, possibly providing support for 'use less memory'
Note that with the stubbed release_arenas, we only pretend to free the
private allocations; this may cause problems in make test
, but the
overall demand for ptr-tables is quite limited (iirc the big user,
t/re/regexp_qr_embed_thr.t creates ~2000 ptr-tables), and on 1GB
machines, we may not run out of memory.
This has some probitive value for OOM handling also, especially in a setrlimit()d sandbox.
private arenas in ptr-tables provides a concrete basis to consider other resource reclaim strategies, narrowly 1st, but perhaps also broadly for other potential users.
When ptr_table_free is called, we know that:
- we start with an empty, private PTE freelist, fill it as needed - pt-store consumes PTEs from private PTE freelist - all PTEs in the table came from our arenas - all PTEs cleared back to it are from our arenas - no other users of those PTEs exist - all our arenas have our reqid
With this, we should be able to just whack the whole table (by finding and freeing the arenas with the reqid), skipping all the rethreading to the global freelist, and immediately releasing the memory back to the system. This sounds possibly useful later.
Ive separated this deliverable because private-arenas in ptr-tables can be mostly validated without it (using the stub) and because in some respects its our 1st new feature, where the previous focus was on refactoring the existing code to accommodate the feature.
The 1st test of this code will be in perl_destruct
release_arenas(&PL_body_roots[$_]) foreach @sv_types; release_arenas(&PL_sv_root);
Then we call it from ptr_table_clear.
Given the recent p5p traffic 12/20 (link above), I think this path to private arenas helps; it adds support needed beneath the fancy freelist pushing-and-popping briefly described there. What nytprof needs will take further study.
The design thus far does nothing to protect (or even advise) of reqid
trampling between 2 users, get_arena()
implicitly allows callers to
start new reservations with the given ID, which allows sharing amongst
knowing users. Formal registration will provide at least advisory
protection. This could be done with a flag too.
These have real merit in my estimation, but are rather speculative, and I'm reluctant to call them committable deliverables. I think think they help illustrate the potential of the above work.
One way to nudge this rock foward is to plug in a 2nd (private) ptr-table-* function set, addressing the Storable::freeze use case.
I suspect however that freelist pushing and popping, along with
get_arenas()
and release_arenas()
, will ultimately be a better tool
than this specialized fix for PTEs, but it serves as a point of
discussion (strawman); we dont even have decent terminology yet, let
alone a few paths forward.
Storable::thaw() might want to put the perl-data it vivifies into a constrained region of memory, as this may improve processor cache performance, especially with their modern prefetch systems. So would perl routines, such as parsers, data generators, etc.
# Doing it lexically would be nice; get_tight_hash { my $var; use my_arenas depth => 1, 'xs'; return { Storable::thaw($packet) }; }
Here, my_arenas seeks to capture only SVs in the contained xs scope (the thaw), and those in {} composition. depth => 1 sounds safest wrt the spiderweb problem, N might be nice if it makes sense (depth=>0 makes me nervous). I also suppose that xs might somehow be different than just depth => 1.
This doesnt attempt to migrate perl data into a container; that would be tantamount to lifting the spiderweb without damaging it, and is out of scope here. But this may shed some light in a dimly lit corner.
The deliverables above are largely self explanatory, but will also include responding and resolving issues; they're then largely defined by porters and particularly pumpkings.
Tim Bunce, given his interest for nytprof, will hopefully offer guidance as to what he needs, Id treat those as immediate goals.
There are no doubt numerous knock-on effects to the rest of core, some of these will be in-scope, though I hope not all.
setrlimit()d sandbox, oom tests. work this into fresh_perl, maybe
wrap this as sandboxed_perl()
.
p5p discussion, review, responses, revisions, variations, etc.
1-2 months
Ive been hacking in perl for a while
[jimc@groucho perl-git]$ git log blead | grep Cromie | wc -l 102
Ive also hacked in pertinent parts of core, ext/ code: - added arena-sets into the arena allocator - reworked the body-allocator around S_more_bodies - helped refactor sv_upgrade (Nick did the heavy lifting) - added struct body_details (says the blamelog) - extended B::Concise feature set - implemented OptreeCheck and tests using it
I believe that the proposed approach is wrong to the point of being counterproductive. Specifically, it won't help NYTProf at all, and it's likely to slow Storable down.
As I've already sent my reasoning to perl5-porters, I'll just give a link to it: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2010-02/msg00014.html
5 days later, and there has still been no reply to it.