Grant update: Persistent Data Structures for Raku, Daniel Sockwell

Tue, 09-Nov-2021 by Moritz Lenz edit post

Daniel Sockwell has provided two updates to his [Grant: Persistent Data Structures for Raku](https://news.perlfoundation.org/post/grant_proposal_persistent_data_structures_for_raku). The first update was already submitted in October, and not posted due to my own error. Enjoy the thorough description! --- ## Raku Persistent Data Structures Grant Report for September 2021 In September, I spent 28 hours on implementing Raku persistent data structures. That's a bit under the 10 hour/week I'd estimated that I'd spend, but I was still able to make a good start on the implementation. The main reason I didn't spend the full 40 hours on the grant – as well as the explanation for why you're getting this update now instead of closer to the beginning of the month – is that I decided to devote a large chunk of my Raku time to two other issues – both of which are directly related to implementing persistent data structures. (Though not _so_ related that I'm counting time spent on them towards the grant. Which is good, because doing so would mean that I _well_ exceeded my ~40/month budget!) I'm going to tell you a bit about these two projects and how each relates to Raku persistent data structures, and then I'll describe the progress I made during the 28 hours I'm actually counting. ### What even is immutability? The biggest selling point of persistent data structures is, of course, that they are immutable while also being fairly cheap to copy. But to deliver on that promise, it's important to have a very clear idea of what we mean by "immutability". I don't mean in some sort of fancy [philosophical sense](https://en.wikipedia.org/wiki/Heraclitus); I'm asking the very practical question "When we describe the `List` in `my $list = (1, 2, 3)` as immutable, what does that mean? What different _behavior_ should I expect when I know that the `List` is immutable?" (You could say "semantic difference" instead of "different behavior" if you want to sound a bit more technical.) Raku has some very clear answers to parts of this question but is also slightly confused (to say nothing of confusing) on some others. And, as it turns out, some of the less-clear areas of the question end up being pretty significant for the sort of immutability that give persistent data structures so much of their power. Let's start with an area where Raku is clear: the distinction between immutable **values** and immutable **bindings**. This is [well documented](https://docs.raku.org/language/containers) (and isn't what I spent time on; this is still background), but here's a recap: #### Values When we describe the list `(1, 2, 3)`, the string `codesections`, or the number `42` as immutable, we are saying that they are immutable _values_. That is, there's no way to take `(1, 2, 3)` and change the `2` into a `0` while still having the same list we started with – we can create a new list with those values, but that's not at all the same thing. Contrast that with the `Array` of `my @a = [1, 2, 3]`: here, the code `@a[1] = 0` literally changes the existing `Array`. Strings and numbers in Raku are immutable values in exactly the way `List`s are (and `Array`s aren't). That is, there's no way to transform a string or a number; all you can do is to return a modified version of the string or number. In contrast, C, C++, Rust, and many other languages allow you to mutate strings in-place; they essentially treat strings as arrays of characters or codepoints and let you modify that array in exactly the same way Rakoons modify an `Array`. (For C's `char*` strings, you're literally dealing with character arrays; for other languages that's more of an (over) simplification). I'm pretty sure that no programming language is crazy enough [to have mutable numbers](http://rigaux.org/language-study/various/mutability-and-sharing/) (math would get _weird_) but I'd love to learn that I'm wrong about that. Note that I'm discussing the semantics of immutable values that Raku (the language) provides, not implementation details of how any particular compiler stores those values. So, while `List`s and `Str`s provide immutable semantics, there's nothing preventing an implementation from storing different values in a more complex way. Put differently, when you say `my $a = 'foo' x 500; my $b = $a ~ ' and bar'`, you _conceptually_ have two separate and immutable strings, but implementations are free to let those strings share (or even mutate) some of their backing data. And, indeed, MoarVM uses a really amazing system for [storing strings](https://github.com/MoarVM/MoarVM/blob/master/docs/sts.asciidoc) that shares memory in a way that's not that different from the way the data structures I'm building do. #### Bindings Entirely separate from the question of mutable versus immutable values, programming languages can also constrain the mutability of variable bindings. That is: once the programmer has defined a variable as pointing at a particular value, can they later change it to point at a _different_ value? One non-Raku example that many programmers may be familiar with is JavaScript's `let` and `const`, especially when used with arrays or objects. In JavaScript, `const` enforces immutable bindings – you can't say `const a = [1, 2, 3]; a = [4, 5, 6]`. But it does nothing to make the _values_ immutable – and, in js, arrays are always mutable. This means that you can validly write `const a = [1, 2, 3]; a[1] = 99` to mutate the value that `a` points to. (This distinction, and the lack of immutable arrays in JavaScript, leads to [perennial debates](https://overreacted.io/on-let-vs-const/) about the advisability of using `const` in js.) Raku gives us accesses to the semantics of both mutable and immutable bindings, and lets us combine either with mutable or immutable values. Thus, we can write `my @a = [1, 2, 3]` - mutable binding to a mutable value. Or `my $l = (1, 2, 3)` – an mutable binding to an immutable value (we can't change the value of the `List`, but we *can* make `$l` point to a different value altogether). Or we can write `constant @a = [1, 2, 3]` – an immutable binding to a mutable value (`@a` will always point to that exact `Array`, but we can freely mutate that `Array` with code like `@a[1] = 99`). Or, finally, we can write `constant $l = (1, 2, 3)` – an immutable binding to an immutable value (`$l` will always point to the same `List`, and that `List` will always contain exactly the values `1`, `2`, and `3`). This last category – an immutable binding to an immutable value – best fits many peoples intuitions about what the terms "immutable" or "constant" mean – that variable always gives me the same value, no matter what. In addition to being more intuitive, this last category is notable for how much it can simplify reasoning about code. There's something very clarifying about knowing the value of a variable without needing to track whether that value has changed; just like the related idea of using [referentially transparent](https://en.wikipedia.org/wiki/Referential_transparency) functions, programming with variables that are immutably bound to immutable values can greatly simplify complex codebases. #### How immutable values and binding fit with persistent data structures The distinction between the mutability of values and bindings matters for the data structures I'm implementing in two ways. First, because the data structures I'm creating are immutable values, we'll be able to use them to be in that wonderful forth category – immutable binding to immutable values – if and only if Raku lets us bind to them immutably. So it's really important to be clear on what options Raku provides for immutable bindings. I've already shown one way that Raku supports immutable bindings: `constant $l = (1, 2, 3)`. But constants are evaluated at compile time and (even though Raku's definition of "compile time" is pretty expansive) the options for using constants are inherently limited. So what other options does Raku provide? Well, before the beginning of September, I thought that Raku provided three other options for immutable bindings: so-called [sigilless variables](https://docs.raku.org/language/variables#index-entry-\_(sigilless_variables)) (like `my \l = (1, 2, 3)`); function parameters that aren't declared as [`rw` or `copy`](https://docs.raku.org/type/Signature#index-entry-trait_is_rw) (like `sub fn($l) {…}`); and lists of variables that are destructured with the `:=` operator (which share the same semantics as parameters). Now, in mid-October, I **still** think that's the case, but I admit that the situation is more complex than I first thought and, at a minimum, existing compilers don't fully implement those semantics. (Note: some people don't like the term "sigilless variable", since the semantics I just described mean that they don't really vary. But it's the term the docs use, so I'm going to stick with it for this post.) I first realized something odd was going on when comment in this [docs.raku.org issue](https://github.com/Raku/doc/issues/3950) pointed out that, while you can't reassign a sigilless variable, you can sometimes re-[bind](https://docs.raku.org/language/containers#Binding) it: `my Int \x = 2; x := 4` changes the value bound to `x`. At first I thought that this issue only applied when certain type constraints were applied, but I soon realized that it applied whenever any type is mentioned, including `Any` or `Mu`. Moreover, it's not just sigilless variables that can be rebound – parameters can too: `sub f(Any $a) { $a := 42; $a}; f 0` returns 42. In the last step to make this pervasive, the `@` and `%` sigils add implicit type constraints, which means that any parameter with either of those sigils can be rebound. (As mentioned, destructuring with `:=` shares the semantics of parameter binding, all of that applies there as well.) Given that sigilless variables and parameters both allowed rebinding, all of the above lead me to wonder whether Raku is trying to provide immutable bindings at all. It turns out that there isn't a [Roast](https://github.com/Raku/roast) specification test that's 100% on point. And, when I [asked on StackOverflow](https://stackoverflow.com/questions/69231506/what-are-the-rules-for-re-binding) I mixed opinions about whether the current behavior (allowing rebinding) was intentional or a bug. If Raku does pervasively allow rebinding, that would be pretty bad news for the usefulness of the data structures I'm implementing. They'd still have some value – more than enough to be worthwhile. But it'd mean that we pretty much can never get to the immutable-bindings-to-immutable-values promised land that can have such a simplifying effect on code. Given that impact, I decided to research the issue pretty thoroughly. After doing so, I've concluded that Raku is designed to provide immutable bindings after all, and I produced a [pull request](https://github.com/rakudo/rakudo/pull/4536) that both explains my logic and provides a patch that gives Raku those semantics. That PR hasn't been merged yet (I still need to respond to at least one change request about what error to throw, and I'd like to see if anyone else weighs in). But I'm optimistic that it will be and that fixing that bug will make immutable bindings to immutable values much more practical in Raku. #### What's on the inside I mentioned earlier that the values-versus-binding distinction is important for the work I'm doing in two ways. The second way it matters is because it gives us the right framework to talk about "shallow" immutability (also sometimes called [interior mutability](https://stackoverflow.com/questions/69231506/what-are-the-rules-for-re-binding)). Lets look at a slightly different constant: `constant $l = (1, 2, $)`. What can we say about `$l`? Well, the variable `$l` is immutably bound to a `List`, and the `List` is comprised of three values, `1`, `2`, and … a [scalar](https://docs.raku.org/language/containers#Scalar_containers). But a scalar a container – specifically, a _mutable_ container. Thus, despite the immutability of `$l`'s binding and the immutability of `List`s, we can still make changes: `$l[2] = 99` is valid code. That change is allowed because `List`s are only _shallowly_ immutable. Or, to say the same thing in different words, `List`s are an immutable type that allows for interior mutability. One of the goals I mentioned in the [initial grant proposal for the persistent data structures](https://news.perlfoundation.org/post/grant_proposal_persistent_data_structures_for_raku) is to provide deeply immutable data types. I hope that the discussion so far gives some context to what I meant by that. But it's a point I'll return to in a just a minute. ### Meanwhile, in another thread altogether… The second related-but-distinct task has occupied a good deal of my time recently has involved concurrency and parallelism (which are famously [not the same thing](https://go.dev/blog/waza-talk)). One of the key selling points of immutable data – especially deeply immutable data – is that it can be more easily shared across threads. A lot of the headache of dealing with parallel code (locks, mutexes, atomics, cache misses, deadlocks, livelocks, etc) come from ways to prevent different threads from modifying data at the same time. But different threads will never modify (deeply) immutable data at the same time – because _nothing_ will ever mutate that data at any time. I'm very optimistic that the data structures I'm working on will have significant application for anyone writing multithreaded Raku. So I've been slightly concerned that I hadn't yet written any non-trivial parallel and concurrent code in Raku. Not _that_ concerned – I've written quite a bit of concurrent and parallel Rust code and concurrent JavaScript (JS is single-threaded, so it's inherently non-parallel). And I'd gotten to know Raku's concurrency model by watching some excellent [conference](https://www.youtube.com/watch?v=hGyzsviI48M&t=43s) [talks](https://www.youtube.com/watch?v=l2fSbOPeSQs&t=1184s) and by reading the [docs](https://docs.raku.org/language/concurrency)/[other](https://6guts.wordpress.com/2017/11/24/a-unified-and-improved-supply-concurrency-model/) [resources](https://stackoverflow.com/questions/65473206/what-concurrency-mechanisms-are-provided-by-raku-and-how-can-they-be-evaluated). Still, though, I was keeping my eye out for a medium-scale project that would let me confirm my understanding of Raku's approach to concurrency. In late September, I found the perfect project. Someone on the r/rakulang subreddit asked if there was a good way to [recreate a JavaScript exercise in Raku](https://www.reddit.com/r/rakulang/comments/pvdpd2/recreate_javascript_bouncing_balls_in_raku/). That exercise involved rendering balls to an HTML5 canvas in a browser – a task that's pretty challenging with a version of Raku running on MoarVM. (It would obviously be much easier using the JavaScript backend, but that wouldn't currently be my first choice for someone new to Raku.) So, I decided to write a simple [Cro](https://cro.services/) server that would let someone write Raku code to display balls on an HTML canvas. Specifically, I decided on the following architecture: A Cro server that would listen for commands from the user's Raku code, generate a set of balls to display, and then send a JSON representation of those balls to any connected clients via WebSockets. Oh, and then do that again with updated ball positions 16 milliseconds later – with the goal of maintaining a frame rate of 60 fps. It's this last bit would normally be a bit of an odd choice: WebSockets are typically used to communicate with a remote server, and thus it'd typically be a questionable choice to stream updates 60 times per second. But the program I had in mind would run locally, which removes network latency from consideration. And, besides, I figured that this design would provide good trial by fire for my understanding of Raku's concurrency model: if my program can stream balls to dozens of simultaneously connected clients at 60 fps, I must have a pretty decent understanding of how Raku handles parallelism and concurrency. You can see the [program I ended up with](https://github.com/codesections/LearnRakuWith) on GitHub. I'm pretty happy with how it turned out: in not much more than 100 lines of Raku, the program sets up exactly the architecture described above plus an [Erlang](https://en.wikipedia.org/wiki/Erlang_(programming_language))-style error recovery strategy for resetting to a known-good state when user input that doesn't produce valid output. But getting there took longer than I expected it to and, in the middle, had me seriously questioning whether I actually understood Raku's concurrency model well enough to be implementing new data structures that'd fit with it. One of the first things anyone learns about programming is that "[it is not a compiler error. It is never a compiler error.](https://blog.plover.com/2017/11/12/)" In other words, even if some issue _seems_ to be caused by a bug in the compiler/runtime/etc, it's far more likely to be caused by a bug in your own code. And when writing that module, I kept running into some crashes that made me think that I'd either run into the mythical compiler bug or – more likely – was seriously misunderstanding how to write concurrent code. After spending quite a bit of time double-checking my code, I determined that, as expected, I had _not_ run into a compiler bug. No, I'd run into [two](https://github.com/croservices/cro-websocket/issues/36) of [them](https://github.com/MoarVM/MoarVM/issues/1565). In fairness, neither was a _compiler_ bug; they were both issues with the MoarVM runtime. Moreover, both were fixed extremely quickly, and the second never even made it into production – I only ran into it because I was using a pre-release build to try out Rakudo's [new dispatch mechanism](https://6guts.wordpress.com/2021/09/29/the-new-moarvm-dispatch-mechanism-is-here/). The whole point of running pre-release builds is to help catch edge-case bugs like these, and I'm certainly not complaining – but it turns out that streaming 60 fps to dozens of clients was a good stress test for the new dispatch system and not just for my understanding of Raku's concurrency model. ### Lessons for persistent data structures Once I realized that I wasn't fundamentally misunderstanding concurrency in Raku, implementing Learn::Raku::With taught me a couple of lessons with important (though subtle) implication for the persistent data structures I'm building. #### "Concurrency control" ≠ "concurrency" The first lesson I learned is that I'd slightly misunderstood what's going on in code like this (adapted from [an example](https://docs.raku.org/routine/migrate) in the docs): ```raku # A simple stock ticker for Apple and Google stock react { whenever $stock-info-for-APPL { say "Apple's stock price is now: $_ " } whenever $stock-info-for-GOOG { say "Google's stock price is now: $_ " } } ``` My previous (**incorrect**) reading of that example was pretty much just an English version of the code: > Whenever we get an update about either Apple or Google's stock, print that update. We don't know > when these updates could come, but we want to react to them the instant they do, whether that > means reacting to the Apple one first, reacting to the Google one first, or reacting to both at > the same time. That's _so close_ to correct, but it goes wrong in the very last 7 words: we are guaranteed **not** to react to both updates at the same time. This is a consequence of Raku's [run-to-completion](https://6guts.wordpress.com/2017/11/24/a-unified-and-improved-supply-concurrency-model/#user-content-supplies-and-concurrency) semantics: the code inside a `whenever` block is guaranteed to finish executing (that is, to "run to completion") before the code in any other `whenever` block can start. This wouldn't matter much for the simple example above, but here are two modified versions in which it would: ```raku # Lets store our results in a Map instead of printing them react { my %prices is Map; react { my Map $prices = Map.new; whenever $stock-info-for-APPL { $prices = %(|$prices, :APPL($_)) } whenever $stock-info-for-GOOG { $prices = %(|$prices, :GOOG($_)) } } } ``` Under the incorrect understanding, we could've be in both `whenever` blocks at the same time and this would create a race condition – we might lose a price update if two came in at the same time. However, because the concurrency control Raku provides _prevents_ concurrent access here, this is actually 100% fine: we'll be in exactly one `whenever` block at a time. Raku's semantics protect us from any race condition. Here's an example where Raku's semantics would have a less ideal impact on naive code: ```raku # Now with buy/sell recomendations! react { whenever $stock-info-for-APPL { say "Apple's stock price is now: $_ "; say "Rating: " ~slow-fn-to-calculte-recomendation('APPL', $_); } whenever $stock-info-for-GOOG { say "Google's stock price is now: $_ "; say "Rating: " ~slow-fn-to-calculte-recomendation('GOOG', $_); } } ``` This code would _not_ behave as the author might hope, due to those same run-to-completion semantics. Specifically, while `slow-fn-to-calculate-recomendation` is running, other updates would be blocked. If you don't want this behavior, Raku offers several ways to schedule work outside of the run-to-completion guarantee of the `whenever` block; Raku's semantics aren't going to _stop_ you from doing anything here, but it's important to understand the behavior they provide by default. (It may be helpful to compare Raku's semantics to JavaScript, which guarantees that _every_ function will run to completion before any other work begins – a model I've [contrasted with Raku's before](https://stackoverflow.com/questions/67679309/raku-equivalent-to-javascripts-settimeoutfn-0).) So, how does all of this impact persistent data structures? Well, some of the performance optimizations for persistent data structures involve temporarily suspending the immutability guarantee when there's provably only one copy of the data (or part of the data). And whether that's provably the case or not depends crucially on whether other sections of the program can have concurrent access to the data. So being clear on all of the above is pretty essential for safe implementations of the data structures I'm working on. #### Under pressure I mentioned that implementing Learn::Raku::With helped me learn two lessons, but the second was really a reminder of a Raku concurrency feature that I already knew about/a good example of how important that feature can be. The feature I'm talking about is the way Raku manages backpressure. Raku's high level concurrency tools are carefully designed with backpressure in mind. Quoting directly from the [6guts blog post](https://6guts.wordpress.com/2017/11/24/a-unified-and-improved-supply-concurrency-model/) that I've already linked to: > Another interesting problem for any system processing asynchronous messages is that of > backpressure. In short, how do we make a source of messages emit them at a rate no greater than > that of the processing logic? The general principle with Supply is that the sender of a message > pays the cost of its processing. So, if I have `$source.map(&foo).tap(&bar)`, then whatever emits at > the source pays the cost of the map of that message along with the processing done by the tap callback. Backpreasure is extremely helpful because it can allow a program to handle arbitrary amounts of input without requiring a proportionally arbitrary (i.e., infinite) amount of memory. Of course, given Raku's commitment to flexibility and programmer control, it shouldn't come as any surprise that Raku also makes it easy to opt out of this automatic backpreasure when doing so makes sense. Learn::Raku::With presents a good example of a situation where it makes sense to opt out of Raku's default backpressure semantics. As I mentioned above, Learn::Raku::With generates a stream of frames and transmits those frames to all connected clients. If we applied backpressure to this system in the normal way, then transmitting the frames to the clients would exert backpressure on generating the frames – that is, we'd be guaranteed not to generate frames any faster than we can transmit them. This would have some benefits – most notably, we'd never need to buffer generated frames and our memory use would never grow. But, in the Learn::Raku::With context, it'd have one very notable flaw as well: the normal backpreasure system would mean that any time we can't transmit frames to clients at 60 fps, the actual _speed_ of Learn::Raku::With's balls would get reduced. This obviously isn't ideal – while 60 fps is a nice goal to shoot for, 45 fps is an absolutely fine frame rate and it'd be much better to drop 1 out of 4 frames and to display the balls at normal speed in 45 fps than to keep every frame but to display the balls in slow motion. Thus, Learn::Raku::With [pairs a Supply with a Channel](https://github.com/codesections/LearnRakuWith/blob/main/lib/HtmlBalls.rakumod#L70-L78) to allow it to drop frames instead of generating frames more slowly – but that, of course, means that it needs to store at least a few frames at least temporarily. But storing extra frames, in turn, means both that the program will take up more memory and, crucially, that there will be more work for the garbage collector to do. And that work will take some time – time that necessarily cannot be spent transmitting those frames to the client. But since the whole reason we need to store those extra frames was that we're generating them faster than we can ship them off to the client, it's possible for the GC-pause-induced extra delay to result in _more_ extra frames. Which can then result in more garbage to collect, and thus more delay, and thus more frames. I'm sure you can see where that sort of infinite loop heads and it's certainly not to anywhere good. In fact, a previous version of Learn::Raku::With could get itself into exactly that sort of situation pretty easily, consuming ever-more RAM and operating ever more slowly (I think I measured it as hitting at least 50 GB). The current version is much more resilient, but the risk remains. All of this is relevant to persistent data structures because one of the main benefits of these data structures is that the power of structural sharing makes copies vastly less expensive – verging on free, in fact. This benefit would go a long way towards avoiding the sort of issues I outlined above. I'd always known that these inexpensive copies are a key benefit of the persistent data structures I'm building, but I'd previously focused on this as a way to reduce the memory footprint of Raku programs. After building Learn::Raku::With, however, I recognize that structural sharing also reduces GC pressure and thus improves the speed at which Raku programs can execute – which can avoid the sort of death spirals described above. This means that having persistent data structures will not only make many Raku programs more memory efficient but will also certain of Raku programs much more feasible to write without spiraling into a need for ever-increasing amounts of RAM. ## Implementation progress So, having spent a few thousand words explaining things related to but not directly covered by the grant, let me say at least a bit about the implementation work itself. In the nature of implementation work, it's not nearly as interesting as more exploratory work and I don't have very much to say – I've very much stuck to the plan in my [grant proposal](https://news.perlfoundation.org/post/grant_proposal_persistent_data_structures_for_raku). Specifically, I started by implementing the persistent version of the `List` (still open to bikeshedding on the name!). I have made good progress on implementing a persistent `List` that with the structural sharing features described in my proposal but _without_ the bit-shifting indexing (I'm instead currently using the simplified indexing scheme that involves converting the key to a different base and calculating the relevant indices manually). This simplified indexing scheme will allow me to test the soundness of the list before switching to the bit-shifting index (which will be key to the performance of the persistent list). Additionally, I've written a set of tests to confirm the correctness of my implementation so far and a (much smaller and still preliminary) set of tests that measure the performance of deep copies of built in `List`s and `Array`s, which should provide a nice baseline to compare the persistent versions against. # Conclusion Over the course of September, I made a good start on implementing persistent data structures for Raku. This included significant, though limited, progress on actually implementing the persistent List and much more time-consuming work on auxiliary projects (not counted as part of my grant hours) that have given me a much better foundation for the remaining work. The downside of this auxiliary work is that it took up considerable time that I might have devoted to the actual implementation, both in September and in October. In fact, given the amount of time I've already spent on these research/side projects in October, it's possible that I won't have much or any grant progress to report at the end of this month. Nevertheless, I'm glad I invested the effort into those projects since I believe both left me significantly better equipped to carry out the remainder of the grant. I look forward to putting these lessons into practice as I continue to implement persistent data structures for Raku. --- ## Raku Persistent Data Structures Grant Report for October 2021 The expectation I mentioned in my September grant report came true in October: I had significantly less time to devote to work on this grant in October. This was due in large part to the conceptual work I previously reported on, which stretched well into October. Accordingly, I was only able to spend 5 hours on the grant in October; I used that time to continue work on the persistent List. Specifically, I nearly completed the basic API (without the bit-shifting optimizations). I was also able to add some initial test coverage. Given that progress, I am now nearing point where I'll be able to share the WIP code publicly. On the conceptual front (i.e., work that's relevant to the persistent data structures but not relevant enough that I'm counting it towards the hours funded by this grant), I've also made a fair bit of progress. Most significantly, I was able to complete the work to add fully immutable binding to Rakudo. Thanks to some helpful feedback from Vadim Belman, I also added significantly more detailed error messages, which should help explain the nature of these binding to Rakoons who aren't as used to programming with in an immutable style. As discussed in my last grant report, being able to immutably bind values to their names will mean that the persistent data structures implemented for this grant will be truly immutable – and thereby significantly easier to reason about. This work was merged in [rakudo#4536](https://github.com/rakudo/rakudo/pull/4536) and should be part of the 2021.11 Rakudo release. Additionally, I've been putting some thought into how Raku's notion of [value types](https://docs.raku.org/language/glossary#Value_type) fits with the persistent data structures I'm implementing. As I mentioned last month, one of the core motivations behind this grant proposal is to provide deeply immutable types, so it's clear that the types I'm implementing will be value types and thus will return a [ValueObjAt](https://docs.raku.org/type/ValueObjAt) from their `WHICH` method. But what should the persistent types do when the user adds a non-value type? I can see three options: 1. Throw an error/require the user to create a value type to pass in. This is basically the approach taken by Elizabeth Mattijsen's [ValueType](https://modules.raku.org/dist/ValueType:zef:zef:lizmat/) module. 2. Recursively copy each non-value type into an equivalent value type when it's added to the persistent type ("copy-on-read"). This is straightforward for some types (an `Array` becomes a `List`) but might involve more metaprogramming trickery for others (especially user classes). 3. Store the non-value type if we can prove that no other part of the code has access to it, and only make a copy if we need to hand it out to someone (basically copy-on-write, with some slight tweaks). Each of these has different pros and cons. In particular, the first is the easiest to implement, the most verbose, and gives the user the most control over/awareness of the performance costs. Conversely, the last is the most "magical" and the only one that risks introducing correctness bugs. Additionally, there could be ways to combine different aspects of these three approaches – for example, one combination of 1. and 2. would be to copy simple/built in types but to throw an error when passed a user type with a mutable field. I'm currently leaning towards the copy-on-read option #2 as a reasonable middle ground, but it's definitely something I'd be interested in hearing other thoughts on. November is already off to a good start, and I'm optimistic that I'll be able to make significant progress on this grant before the end-of-year business (and exciting things like the [Raku Advent Calendar](https://github.com/Raku/advent/blob/master/raku-advent-2021/authors.md)!) start to eat into my coding time.

Category: Grants , Raku Development

Grant update: Persistent Data Structures for Raku, Daniel Sockwell

Comments (0)

Sign in to add comment

About TPF

Categories

Popular Tags

Recent Tags

Get Perl

Links