Edmund Smith
September 27, 2023
Reading time:
In the first part of this series, we looked at a basic pattern, where two types of objects refer to one another, and claimed that it is hard to model in Rust. Patterns like these are common when modeling things from the real world. In this part we'll follow up in more detail and discuss a wider range of approaches.
We can solve problems like this easily in C++, it seems clear and straightforward how to do so. Let's go back to the C++ example we gave:
struct group; struct user { std::vector<std::shared_ptr> groups; }; struct group { std::optional<std::shared_ptr> leader; };
Why doesn't a Rust analogue of this exist? Look at this snippet of compilable C++ which uses these types:
auto u = std::make_shared(); auto g1 = std::make_shared(); u->groups.push_back(g1); auto & first_group = u->groups[0]; auto g2 = std::make_shared(); u->groups.push_back(g2); first_group->leader = u;
First we take a reference to g1
via the shared_ptr
held inside u
, and call it first_group
. Then we push to the vector of groups inside u
, which might reallocate. Now our reference to first_group
may dangle, and then using it on the next line will crash (or even worse, corrupt memory). Rust will not allow this to happen: mutable references to data are only ever available when no other references exist, so the call to push_back
on line 6, requiring mutable access, would not compile.
But when we interact with an object in a soup, each object is likely to have at least one other handle (something like a reference or a pointer) pointing to it from other objects. That in turn makes it very difficult for Rust to provide a mutable reference to the object, because nothing prevents you from obtaining a second reference to it from one of the other handles.
As we showed in the previous part, you could use std::sync::Arc` as
the closest correlate of C++'s std::shared_ptr
:
use std::sync::Arc; struct User { groups: Vec<Arc<Group>>, } struct Group { leader: Option<Arc<User>>, }
Using Arc
on its own, we can represent immutable trees of objects, provided we construct the nodes in the right order, because making trees doesn't require previously constructed objects to be edited. With care we can do slightly more, because there's the possibility of mutation of an object when only one Arc
pointing to that object exists.
So, to use Arc
, we may have to remove some links between objects, like removing the leader
field in Group
for example. The more links we try to save, the more careful we'll have to be about the order of construction. In the end, we'll still have a largely immutable graph, and no amount of care will allow us to represent everything we might like.
Trees of objects are far more amenable to the sort of safety guarantees that Rust provides. There's one clear owner of a value, and you can only access data through a reference to that value. So we could write:
struct User { groups: Vec<Group> } struct Group { leader: Option<User> }
Now we have expanded our graph into a tree, by duplicating User
s and Group
s as needed. Mutation is allowed, but different copies may skew.
In Rust, a Mutex
permits converting a shared reference to _itself_ into a mutable reference to its _contents_. A Mutex
will block the thread rather than return a mutable reference when other references exist. This also explains why a Rust Mutex
is not reentrant: if it were, we could create dangling references.
struct User { groups: Vec<Arc<Mutex<Group>>>, } struct Group { user: Option<Arc<Mutex<User>>> }
This approach gives a fully mutable collection of objects, with arbitrary links between them. There will be no unchecked memory errors. So why not use this?
Let's imagine traversing our collection. To look at each node's contents, even to read it, we need to obtain its mutex. That means during traversal we'll build up a chain of mutexes that we hold. If we visit the same node twice, we'll get deadlock. If we have two threads traversing the collection and they obtain locks in different orders, we'll get deadlock. What's more, in a soup there is no obvious lock order we can try to maintain.
The final basic approach to this problem is to use *interior mutability* via `RefCell`. If you think of `Mutex` blocking threads to enforce Rust's safety rules, then `RefCell` crashes your program to achieve the same thing. The code might look like this:
struct User { groups: Vec<Rc<RefCell<Group>>>, } struct Group { user: Option<Rc<RefCell<User>>> }
Just like with a Mutex
, with RefCell
we can ask for references to the data wherever we wish, and the code will compile. But if we ask for a mutable reference to an object when there are other references in existence, our program will panic. Like everything well-formed in Rust, RefCell
cannot permit a mutable reference to data to co-exist with other references to that data.
Once again, there will be no unchecked errors. But now we are responsible for ensuring that we never try to create a mutable reference to data that is referenced elsewhere. This will not be checked for us by the compiler. We have again committed ourselves to a difficult contract to follow in our code.
In the next part, we'll look in more detail at persian-rug
: how it works and the trade-offs it offers at present. After that, we'll talk about ways to improve upon its limitations in the future.
15/01/2025
With VirGL, Venus, and vDRM, virglrenderer offers three different approaches to obtain access to accelerated GFX in a virtual machine. Here…
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
Comments (0)
Add a Comment