Grant update: Persistent Data Structures for Raku, Daniel Sockwell
Tue, 09-Nov-2021 by
Moritz Lenz
edit post
Daniel Sockwell has provided two updates to his [Grant: Persistent Data Structures for Raku](https://news.perlfoundation.org/post/grant_proposal_persistent_data_structures_for_raku).
The first update was already submitted in October, and not posted due to my own error. Enjoy the thorough description!
---
## Raku Persistent Data Structures Grant Report for September 2021
In September, I spent 28 hours on implementing Raku persistent data structures. That's a bit under
the 10 hour/week I'd estimated that I'd spend, but I was still able to make a good start on the
implementation.
The main reason I didn't spend the full 40 hours on the grant – as well as the explanation for why
you're getting this update now instead of closer to the beginning of the month – is that I decided
to devote a large chunk of my Raku time to two other issues – both of which are directly related to
implementing persistent data structures. (Though not _so_ related that I'm counting time spent on
them towards the grant. Which is good, because doing so would mean that I _well_ exceeded my
~40/month budget!)
I'm going to tell you a bit about these two projects and how each relates to Raku persistent data
structures, and then I'll describe the progress I made during the 28 hours I'm actually counting.
### What even is immutability?
The biggest selling point of persistent data structures is, of course, that they are immutable while
also being fairly cheap to copy. But to deliver on that promise, it's important to have a very
clear idea of what we mean by "immutability".
I don't mean in some sort of fancy [philosophical sense](https://en.wikipedia.org/wiki/Heraclitus);
I'm asking the very practical question "When we describe the `List` in `my $list = (1, 2, 3)` as
immutable, what does that mean? What different _behavior_ should I expect when I know that the
`List` is immutable?" (You could say "semantic difference" instead of "different behavior" if you
want to sound a bit more technical.)
Raku has some very clear answers to parts of this question but is also slightly confused (to say
nothing of confusing) on some others. And, as it turns out, some of the less-clear areas of the
question end up being pretty significant for the sort of immutability that give persistent data
structures so much of their power.
Let's start with an area where Raku is clear: the distinction between immutable **values** and
immutable **bindings**. This is [well documented](https://docs.raku.org/language/containers) (and
isn't what I spent time on; this is still background), but here's a recap:
#### Values
When we describe the list `(1, 2, 3)`, the string `codesections`, or the number `42` as
immutable, we are saying that they are immutable _values_. That is, there's no way to take `(1, 2,
3)` and change the `2` into a `0` while still having the same list we started with – we can create a
new list with those values, but that's not at all the same thing. Contrast that with the `Array` of
`my @a = [1, 2, 3]`: here, the code `@a[1] = 0` literally changes the existing `Array`.
Strings and numbers in Raku are immutable values in exactly the way `List`s are (and `Array`s
aren't). That is, there's no way to transform a string or a number; all you can do is to return a
modified version of the string or number. In contrast, C, C++, Rust, and many other languages allow
you to mutate strings in-place; they essentially treat strings as arrays of characters or codepoints
and let you modify that array in exactly the same way Rakoons modify an `Array`. (For C's `char*`
strings, you're literally dealing with character arrays; for other languages that's more of an
(over) simplification). I'm pretty sure that no programming language is crazy enough [to have
mutable numbers](http://rigaux.org/language-study/various/mutability-and-sharing/) (math would get
_weird_) but I'd love to learn that I'm wrong about that.
Note that I'm discussing the semantics of immutable values that Raku (the language) provides, not
implementation details of how any particular compiler stores those values. So, while `List`s and
`Str`s provide immutable semantics, there's nothing preventing an implementation from storing
different values in a more complex way. Put differently, when you say `my $a = 'foo' x 500; my $b =
$a ~ ' and bar'`, you _conceptually_ have two separate and immutable strings, but implementations
are free to let those strings share (or even mutate) some of their backing data. And, indeed, MoarVM
uses a really amazing system for [storing
strings](https://github.com/MoarVM/MoarVM/blob/master/docs/sts.asciidoc) that shares memory in a way
that's not that different from the way the data structures I'm building do.
#### Bindings
Entirely separate from the question of mutable versus immutable values, programming languages can
also constrain the mutability of variable bindings. That is: once the programmer has defined a
variable as pointing at a particular value, can they later change it to point at a _different_
value?
One non-Raku example that many programmers may be familiar with is JavaScript's `let` and `const`,
especially when used with arrays or objects. In JavaScript, `const` enforces immutable bindings –
you can't say `const a = [1, 2, 3]; a = [4, 5, 6]`. But it does nothing to make the _values_
immutable – and, in js, arrays are always mutable. This means that you can validly write `const a =
[1, 2, 3]; a[1] = 99` to mutate the value that `a` points to. (This distinction, and the lack of
immutable arrays in JavaScript, leads to [perennial
debates](https://overreacted.io/on-let-vs-const/) about the advisability of using `const` in js.)
Raku gives us accesses to the semantics of both mutable and immutable bindings, and lets us combine
either with mutable or immutable values. Thus, we can write `my @a = [1, 2, 3]` - mutable binding
to a mutable value. Or `my $l = (1, 2, 3)` – an mutable binding to an immutable value (we can't
change the value of the `List`, but we *can* make `$l` point to a different value altogether). Or
we can write `constant @a = [1, 2, 3]` – an immutable binding to a mutable value (`@a` will always
point to that exact `Array`, but we can freely mutate that `Array` with code like `@a[1] = 99`).
Or, finally, we can write `constant $l = (1, 2, 3)` – an immutable binding to an immutable value
(`$l` will always point to the same `List`, and that `List` will always contain exactly the values
`1`, `2`, and `3`).
This last category – an immutable binding to an immutable value – best fits many peoples intuitions
about what the terms "immutable" or "constant" mean – that variable always gives me the same value,
no matter what. In addition to being more intuitive, this last category is notable for how much it
can simplify reasoning about code. There's something very clarifying about knowing the value of a
variable without needing to track whether that value has changed; just like the related idea of
using [referentially transparent](https://en.wikipedia.org/wiki/Referential_transparency) functions,
programming with variables that are immutably bound to immutable values can greatly simplify complex
codebases.
#### How immutable values and binding fit with persistent data structures
The distinction between the mutability of values and bindings matters for the data structures I'm
implementing in two ways.
First, because the data structures I'm creating are immutable values, we'll be able to use them to
be in that wonderful forth category – immutable binding to immutable values – if and only if Raku
lets us bind to them immutably. So it's really important to be clear on what options Raku provides
for immutable bindings.
I've already shown one way that Raku supports immutable bindings: `constant $l = (1, 2, 3)`. But
constants are evaluated at compile time and (even though Raku's definition of "compile time" is
pretty expansive) the options for using constants are inherently limited. So what other options
does Raku provide?
Well, before the beginning of September, I thought that Raku provided three other options for
immutable bindings: so-called [sigilless
variables](https://docs.raku.org/language/variables#index-entry-\_(sigilless_variables)) (like `my
\l = (1, 2, 3)`); function parameters that aren't declared as [`rw` or
`copy`](https://docs.raku.org/type/Signature#index-entry-trait_is_rw) (like `sub fn($l) {…}`); and
lists of variables that are destructured with the `:=` operator (which share the same semantics as
parameters). Now, in mid-October, I **still** think that's the case, but I admit that the situation
is more complex than I first thought and, at a minimum, existing compilers don't fully implement
those semantics.
(Note: some people don't like the term "sigilless variable", since the semantics I just described
mean that they don't really vary. But it's the term the docs use, so I'm going to stick with it for
this post.)
I first realized something odd was going on when comment in this [docs.raku.org
issue](https://github.com/Raku/doc/issues/3950) pointed out that, while you can't reassign a
sigilless variable, you can sometimes re-[bind](https://docs.raku.org/language/containers#Binding)
it: `my Int \x = 2; x := 4` changes the value bound to `x`. At first I thought that this issue only
applied when certain type constraints were applied, but I soon realized that it applied whenever
any type is mentioned, including `Any` or `Mu`. Moreover, it's not just sigilless variables that
can be rebound – parameters can too: `sub f(Any $a) { $a := 42; $a}; f 0` returns 42. In the last
step to make this pervasive, the `@` and `%` sigils add implicit type constraints, which means that
any parameter with either of those sigils can be rebound. (As mentioned, destructuring with `:=`
shares the semantics of parameter binding, all of that applies there as well.)
Given that sigilless variables and parameters both allowed rebinding, all of the above lead me to
wonder whether Raku is trying to provide immutable bindings at all. It turns out that there isn't a
[Roast](https://github.com/Raku/roast) specification test that's 100% on point. And, when I [asked
on StackOverflow](https://stackoverflow.com/questions/69231506/what-are-the-rules-for-re-binding) I
mixed opinions about whether the current behavior (allowing rebinding) was intentional or a bug.
If Raku does pervasively allow rebinding, that would be pretty bad news for the usefulness of the
data structures I'm implementing. They'd still have some value – more than enough to be
worthwhile. But it'd mean that we pretty much can never get to the
immutable-bindings-to-immutable-values promised land that can have such a simplifying effect on
code.
Given that impact, I decided to research the issue pretty thoroughly. After doing so, I've
concluded that Raku is designed to provide immutable bindings after all, and I produced a [pull
request](https://github.com/rakudo/rakudo/pull/4536) that both explains my logic and provides a
patch that gives Raku those semantics. That PR hasn't been merged yet (I still need to respond to
at least one change request about what error to throw, and I'd like to see if anyone else weighs
in). But I'm optimistic that it will be and that fixing that bug will make immutable bindings to
immutable values much more practical in Raku.
#### What's on the inside
I mentioned earlier that the values-versus-binding distinction is important for the work I'm doing
in two ways. The second way it matters is because it gives us the right framework to talk about
"shallow" immutability (also sometimes called [interior
mutability](https://stackoverflow.com/questions/69231506/what-are-the-rules-for-re-binding)).
Lets look at a slightly different constant: `constant $l = (1, 2, $)`. What can we say about `$l`?
Well, the variable `$l` is immutably bound to a `List`, and the `List` is comprised of three values,
`1`, `2`, and … a [scalar](https://docs.raku.org/language/containers#Scalar_containers). But a
scalar a container – specifically, a _mutable_ container. Thus, despite the immutability of `$l`'s
binding and the immutability of `List`s, we can still make changes: `$l[2] = 99` is valid code.
That change is allowed because `List`s are only _shallowly_ immutable. Or, to say the same thing in
different words, `List`s are an immutable type that allows for interior mutability.
One of the goals I mentioned in the [initial grant proposal for the persistent data
structures](https://news.perlfoundation.org/post/grant_proposal_persistent_data_structures_for_raku)
is to provide deeply immutable data types. I hope that the discussion so far gives some context to what I
meant by that. But it's a point I'll return to in a just a minute.
### Meanwhile, in another thread altogether…
The second related-but-distinct task has occupied a good deal of my time recently has involved
concurrency and parallelism (which are famously [not the same
thing](https://go.dev/blog/waza-talk)). One of the key selling points of immutable data –
especially deeply immutable data – is that it can be more easily shared across threads. A lot of
the headache of dealing with parallel code (locks, mutexes, atomics, cache misses, deadlocks,
livelocks, etc) come from ways to prevent different threads from modifying data at the same time.
But different threads will never modify (deeply) immutable data at the same time – because _nothing_
will ever mutate that data at any time. I'm very optimistic that the data structures I'm working on
will have significant application for anyone writing multithreaded Raku.
So I've been slightly concerned that I hadn't yet written any non-trivial parallel and concurrent
code in Raku. Not _that_ concerned – I've written quite a bit of concurrent and parallel Rust code
and concurrent JavaScript (JS is single-threaded, so it's inherently non-parallel). And I'd gotten
to know Raku's concurrency model by watching some excellent
[conference](https://www.youtube.com/watch?v=hGyzsviI48M&t=43s)
[talks](https://www.youtube.com/watch?v=l2fSbOPeSQs&t=1184s) and by reading the
[docs](https://docs.raku.org/language/concurrency)/[other](https://6guts.wordpress.com/2017/11/24/a-unified-and-improved-supply-concurrency-model/)
[resources](https://stackoverflow.com/questions/65473206/what-concurrency-mechanisms-are-provided-by-raku-and-how-can-they-be-evaluated).
Still, though, I was keeping my eye out for a medium-scale project that would let me confirm my
understanding of Raku's approach to concurrency.
In late September, I found the perfect project. Someone on the r/rakulang subreddit asked if there
was a good way to [recreate a JavaScript exercise in
Raku](https://www.reddit.com/r/rakulang/comments/pvdpd2/recreate_javascript_bouncing_balls_in_raku/).
That exercise involved rendering balls to an HTML5 canvas in a browser – a task that's pretty
challenging with a version of Raku running on MoarVM. (It would obviously be much easier using the
JavaScript backend, but that wouldn't currently be my first choice for someone new to Raku.) So, I
decided to write a simple [Cro](https://cro.services/) server that would let someone write Raku code
to display balls on an HTML canvas.
Specifically, I decided on the following architecture: A Cro server that would listen for commands
from the user's Raku code, generate a set of balls to display, and then send a JSON representation
of those balls to any connected clients via WebSockets. Oh, and then do that again with updated
ball positions 16 milliseconds later – with the goal of maintaining a frame rate of 60 fps. It's
this last bit would normally be a bit of an odd choice: WebSockets are typically used to communicate
with a remote server, and thus it'd typically be a questionable choice to stream updates 60 times
per second. But the program I had in mind would run locally, which removes network latency from
consideration. And, besides, I figured that this design would provide good trial by fire for my
understanding of Raku's concurrency model: if my program can stream balls to dozens of
simultaneously connected clients at 60 fps, I must have a pretty decent understanding of how Raku
handles parallelism and concurrency.
You can see the [program I ended up with](https://github.com/codesections/LearnRakuWith) on GitHub.
I'm pretty happy with how it turned out: in not much more than 100 lines of Raku, the program sets
up exactly the architecture described above plus an
[Erlang](https://en.wikipedia.org/wiki/Erlang_(programming_language))-style error recovery strategy
for resetting to a known-good state when user input that doesn't produce valid output.
But getting there took longer than I expected it to and, in the middle, had me seriously questioning
whether I actually understood Raku's concurrency model well enough to be implementing new data
structures that'd fit with it. One of the first things anyone learns about programming is that "[it
is not a compiler error. It is never a compiler error.](https://blog.plover.com/2017/11/12/)" In
other words, even if some issue _seems_ to be caused by a bug in the compiler/runtime/etc, it's far
more likely to be caused by a bug in your own code.
And when writing that module, I kept running into some crashes that made me think that I'd either
run into the mythical compiler bug or – more likely – was seriously misunderstanding how to write
concurrent code. After spending quite a bit of time double-checking my code, I determined that, as
expected, I had _not_ run into a compiler bug. No, I'd run into
[two](https://github.com/croservices/cro-websocket/issues/36) of
[them](https://github.com/MoarVM/MoarVM/issues/1565).
In fairness, neither was a _compiler_ bug; they were both issues with the MoarVM runtime. Moreover,
both were fixed extremely quickly, and the second never even made it into production – I only ran
into it because I was using a pre-release build to try out Rakudo's [new dispatch
mechanism](https://6guts.wordpress.com/2021/09/29/the-new-moarvm-dispatch-mechanism-is-here/). The
whole point of running pre-release builds is to help catch edge-case bugs like these, and I'm
certainly not complaining – but it turns out that streaming 60 fps to dozens of clients was a good
stress test for the new dispatch system and not just for my understanding of Raku's concurrency
model.
### Lessons for persistent data structures
Once I realized that I wasn't fundamentally misunderstanding concurrency in Raku, implementing
Learn::Raku::With taught me a couple of lessons with important (though subtle) implication for the
persistent data structures I'm building.
#### "Concurrency control" ≠ "concurrency"
The first lesson I learned is that I'd slightly misunderstood what's going on in code like this
(adapted from [an example](https://docs.raku.org/routine/migrate) in the docs):
```raku
# A simple stock ticker for Apple and Google stock
react {
whenever $stock-info-for-APPL { say "Apple's stock price is now: $_ " }
whenever $stock-info-for-GOOG { say "Google's stock price is now: $_ " }
}
```
My previous (**incorrect**) reading of that example was pretty much just an English version of the
code:
> Whenever we get an update about either Apple or Google's stock, print that update. We don't know
> when these updates could come, but we want to react to them the instant they do, whether that
> means reacting to the Apple one first, reacting to the Google one first, or reacting to both at
> the same time.
That's _so close_ to correct, but it goes wrong in the very last 7 words: we are guaranteed **not**
to react to both updates at the same time. This is a consequence of Raku's
[run-to-completion](https://6guts.wordpress.com/2017/11/24/a-unified-and-improved-supply-concurrency-model/#user-content-supplies-and-concurrency)
semantics: the code inside a `whenever` block is guaranteed to finish executing (that is, to "run to
completion") before the code in any other `whenever` block can start. This wouldn't matter much for
the simple example above, but here are two modified versions in which it would:
```raku
# Lets store our results in a Map instead of printing them
react {
my %prices is Map;
react {
my Map $prices = Map.new;
whenever $stock-info-for-APPL { $prices = %(|$prices, :APPL($_)) }
whenever $stock-info-for-GOOG { $prices = %(|$prices, :GOOG($_)) }
}
}
```
Under the incorrect understanding, we could've be in both `whenever` blocks at the same time and
this would create a race condition – we might lose a price update if two came in at the same time.
However, because the concurrency control Raku provides _prevents_ concurrent access here, this is
actually 100% fine: we'll be in exactly one `whenever` block at a time. Raku's semantics protect us
from any race condition.
Here's an example where Raku's semantics would have a less ideal impact on naive code:
```raku
# Now with buy/sell recomendations!
react {
whenever $stock-info-for-APPL {
say "Apple's stock price is now: $_ ";
say "Rating: " ~slow-fn-to-calculte-recomendation('APPL', $_);
}
whenever $stock-info-for-GOOG {
say "Google's stock price is now: $_ ";
say "Rating: " ~slow-fn-to-calculte-recomendation('GOOG', $_);
}
}
```
This code would _not_ behave as the author might hope, due to those same run-to-completion
semantics. Specifically, while `slow-fn-to-calculate-recomendation` is running, other updates would
be blocked. If you don't want this behavior, Raku offers several ways to schedule work outside of
the run-to-completion guarantee of the `whenever` block; Raku's semantics aren't going to _stop_ you
from doing anything here, but it's important to understand the behavior they provide by default.
(It may be helpful to compare Raku's semantics to JavaScript, which guarantees that _every_ function
will run to completion before any other work begins – a model I've [contrasted with Raku's
before](https://stackoverflow.com/questions/67679309/raku-equivalent-to-javascripts-settimeoutfn-0).)
So, how does all of this impact persistent data structures? Well, some of the performance
optimizations for persistent data structures involve temporarily suspending the immutability
guarantee when there's provably only one copy of the data (or part of the data). And whether that's
provably the case or not depends crucially on whether other sections of the program can have
concurrent access to the data. So being clear on all of the above is pretty essential for safe
implementations of the data structures I'm working on.
#### Under pressure
I mentioned that implementing Learn::Raku::With helped me learn two lessons, but the second was
really a reminder of a Raku concurrency feature that I already knew about/a good example of how
important that feature can be. The feature I'm talking about is the way Raku manages backpressure.
Raku's high level concurrency tools are carefully designed with backpressure in mind. Quoting
directly from the [6guts blog
post](https://6guts.wordpress.com/2017/11/24/a-unified-and-improved-supply-concurrency-model/) that
I've already linked to:
> Another interesting problem for any system processing asynchronous messages is that of
> backpressure. In short, how do we make a source of messages emit them at a rate no greater than
> that of the processing logic? The general principle with Supply is that the sender of a message
> pays the cost of its processing. So, if I have `$source.map(&foo).tap(&bar)`, then whatever emits at
> the source pays the cost of the map of that message along with the processing done by the tap callback.
Backpreasure is extremely helpful because it can allow a program to handle arbitrary amounts of
input without requiring a proportionally arbitrary (i.e., infinite) amount of memory. Of course,
given Raku's commitment to flexibility and programmer control, it shouldn't come as any surprise
that Raku also makes it easy to opt out of this automatic backpreasure when doing so makes sense.
Learn::Raku::With presents a good example of a situation where it makes sense to opt out of Raku's
default backpressure semantics. As I mentioned above, Learn::Raku::With generates a stream of
frames and transmits those frames to all connected clients. If we applied backpressure to this
system in the normal way, then transmitting the frames to the clients would exert backpressure on
generating the frames – that is, we'd be guaranteed not to generate frames any faster than we can
transmit them.
This would have some benefits – most notably, we'd never need to buffer generated frames and our
memory use would never grow. But, in the Learn::Raku::With context, it'd have one very notable flaw
as well: the normal backpreasure system would mean that any time we can't transmit frames to clients
at 60 fps, the actual _speed_ of Learn::Raku::With's balls would get reduced. This obviously isn't
ideal – while 60 fps is a nice goal to shoot for, 45 fps is an absolutely fine frame rate and it'd
be much better to drop 1 out of 4 frames and to display the balls at normal speed in 45 fps than to
keep every frame but to display the balls in slow motion.
Thus, Learn::Raku::With [pairs a Supply with a
Channel](https://github.com/codesections/LearnRakuWith/blob/main/lib/HtmlBalls.rakumod#L70-L78) to
allow it to drop frames instead of generating frames more slowly – but that, of course, means that
it needs to store at least a few frames at least temporarily.
But storing extra frames, in turn, means both that the program will take up more memory and,
crucially, that there will be more work for the garbage collector to do. And that work will take
some time – time that necessarily cannot be spent transmitting those frames to the client. But
since the whole reason we need to store those extra frames was that we're generating them faster
than we can ship them off to the client, it's possible for the GC-pause-induced extra delay to
result in _more_ extra frames. Which can then result in more garbage to collect, and thus more
delay, and thus more frames.
I'm sure you can see where that sort of infinite loop heads and it's certainly not to anywhere
good. In fact, a previous version of Learn::Raku::With could get itself into exactly that sort of
situation pretty easily, consuming ever-more RAM and operating ever more slowly (I think I measured
it as hitting at least 50 GB). The current version is much more resilient, but the risk remains.
All of this is relevant to persistent data structures because one of the main benefits of these data
structures is that the power of structural sharing makes copies vastly less expensive – verging on
free, in fact. This benefit would go a long way towards avoiding the sort of issues I outlined
above. I'd always known that these inexpensive copies are a key benefit of the persistent data
structures I'm building, but I'd previously focused on this as a way to reduce the memory footprint
of Raku programs. After building Learn::Raku::With, however, I recognize that structural sharing
also reduces GC pressure and thus improves the speed at which Raku programs can execute – which can
avoid the sort of death spirals described above. This means that having persistent data structures
will not only make many Raku programs more memory efficient but will also certain of Raku programs
much more feasible to write without spiraling into a need for ever-increasing amounts of RAM.
## Implementation progress
So, having spent a few thousand words explaining things related to but not directly covered by the
grant, let me say at least a bit about the implementation work itself. In the nature of
implementation work, it's not nearly as interesting as more exploratory work and I don't have very
much to say – I've very much stuck to the plan in my [grant
proposal](https://news.perlfoundation.org/post/grant_proposal_persistent_data_structures_for_raku).
Specifically, I started by implementing the persistent version of the `List` (still open to
bikeshedding on the name!). I have made good progress on implementing a persistent `List` that with
the structural sharing features described in my proposal but _without_ the bit-shifting indexing
(I'm instead currently using the simplified indexing scheme that involves converting the key to a
different base and calculating the relevant indices manually). This simplified indexing scheme will
allow me to test the soundness of the list before switching to the bit-shifting index (which will be
key to the performance of the persistent list).
Additionally, I've written a set of tests to confirm the correctness of my implementation so far and
a (much smaller and still preliminary) set of tests that measure the performance of deep copies of
built in `List`s and `Array`s, which should provide a nice baseline to compare the persistent
versions against.
# Conclusion
Over the course of September, I made a good start on implementing persistent data structures for
Raku. This included significant, though limited, progress on actually implementing the persistent
List and much more time-consuming work on auxiliary projects (not counted as part of my grant hours)
that have given me a much better foundation for the remaining work. The downside of this auxiliary
work is that it took up considerable time that I might have devoted to the actual implementation,
both in September and in October. In fact, given the amount of time I've already spent on these
research/side projects in October, it's possible that I won't have much or any grant progress to
report at the end of this month. Nevertheless, I'm glad I invested the effort into those projects
since I believe both left me significantly better equipped to carry out the remainder of the grant.
I look forward to putting these lessons into practice as I continue to implement persistent data
structures for Raku.
---
## Raku Persistent Data Structures Grant Report for October 2021
The expectation I mentioned in my September grant report came true in October: I had significantly
less time to devote to work on this grant in October. This was due in large part to the conceptual
work I previously reported on, which stretched well into October.
Accordingly, I was only able to spend 5 hours on the grant in October; I used that time to continue
work on the persistent List. Specifically, I nearly completed the basic API (without the
bit-shifting optimizations). I was also able to add some initial test coverage. Given that
progress, I am now nearing point where I'll be able to share the WIP code publicly.
On the conceptual front (i.e., work that's relevant to the persistent data structures but not
relevant enough that I'm counting it towards the hours funded by this grant), I've also made a fair
bit of progress. Most significantly, I was able to complete the work to add fully immutable binding
to Rakudo. Thanks to some helpful feedback from Vadim Belman, I also added significantly more
detailed error messages, which should help explain the nature of these binding to Rakoons who aren't
as used to programming with in an immutable style. As discussed in my last grant report, being able
to immutably bind values to their names will mean that the persistent data structures implemented
for this grant will be truly immutable – and thereby significantly easier to reason about. This
work was merged in [rakudo#4536](https://github.com/rakudo/rakudo/pull/4536) and should be part of
the 2021.11 Rakudo release.
Additionally, I've been putting some thought into how Raku's notion of [value
types](https://docs.raku.org/language/glossary#Value_type) fits with the persistent data structures
I'm implementing. As I mentioned last month, one of the core motivations behind this grant proposal
is to provide deeply immutable types, so it's clear that the types I'm implementing will be value
types and thus will return a [ValueObjAt](https://docs.raku.org/type/ValueObjAt) from their `WHICH`
method. But what should the persistent types do when the user adds a non-value type? I can see
three options:
1. Throw an error/require the user to create a value type to pass in. This is basically the approach taken by Elizabeth Mattijsen's [ValueType](https://modules.raku.org/dist/ValueType:zef:zef:lizmat/) module.
2. Recursively copy each non-value type into an equivalent value type when it's added to the persistent type ("copy-on-read"). This is straightforward for some types (an `Array` becomes a `List`) but might involve more metaprogramming trickery for others (especially user classes).
3. Store the non-value type if we can prove that no other part of the code has access to it, and only make a copy if we need to hand it out to someone (basically copy-on-write, with some slight tweaks).
Each of these has different pros and cons. In particular, the first is the easiest to implement, the most verbose, and gives the user the most control over/awareness of the performance costs. Conversely, the last is the most "magical" and the only one that risks introducing correctness bugs. Additionally, there could be ways to combine different aspects of these three approaches – for example, one combination of 1. and 2. would be to copy simple/built in types but to throw an error when passed a user type with a mutable field. I'm currently leaning towards the copy-on-read option #2 as a reasonable middle ground, but it's definitely something I'd be interested in hearing other thoughts on.
November is already off to a good start, and I'm optimistic that I'll be able to make significant
progress on this grant before the end-of-year business (and exciting things like the [Raku Advent
Calendar](https://github.com/Raku/advent/blob/master/raku-advent-2021/authors.md)!) start to eat
into my coding time.
Comments (0)