We're hiring!
*

Persian Rug, Part 2 - Other ways to make object soups in Rust

Edmund Smith avatar

Edmund Smith
September 27, 2023

Share this post:

Reading time:

In the first part of this series, we looked at a basic pattern, where two types of objects refer to one another, and claimed that it is hard to model in Rust. Patterns like these are common when modeling things from the real world. In this part we'll follow up in more detail and discuss a wider range of approaches.

Why is this hard?

We can solve problems like this easily in C++, it seems clear and straightforward how to do so. Let's go back to the C++ example we gave:

struct group;
struct user {
    std::vector<std::shared_ptr> groups;
};
struct group {
    std::optional<std::shared_ptr> leader;
};

Why doesn't a Rust analogue of this exist? Look at this snippet of compilable C++ which uses these types:

auto u = std::make_shared();
auto g1 = std::make_shared();
u->groups.push_back(g1);
auto & first_group = u->groups[0];
auto g2 = std::make_shared();
u->groups.push_back(g2);
first_group->leader = u;

First we take a reference to g1 via the shared_ptr held inside u, and call it first_group. Then we push to the vector of groups inside u, which might reallocate. Now our reference to first_group may dangle, and then using it on the next line will crash (or even worse, corrupt memory). Rust will not allow this to happen: mutable references to data are only ever available when no other references exist, so the call to push_back on line 6, requiring mutable access, would not compile.

But when we interact with an object in a soup, each object is likely to have at least one other handle (something like a reference or a pointer) pointing to it from other objects. That in turn makes it very difficult for Rust to provide a mutable reference to the object, because nothing prevents you from obtaining a second reference to it from one of the other handles.

Approaches

Use shared references

As we showed in the previous part, you could use std::sync::Arc` as the closest correlate of C++'s std::shared_ptr:

use std::sync::Arc;
struct User {
    groups: Vec<Arc<Group>>,
}
struct Group {
    leader: Option<Arc<User>>,
}

Using Arc on its own, we can represent immutable trees of objects, provided we construct the nodes in the right order, because making trees doesn't require previously constructed objects to be edited. With care we can do slightly more, because there's the possibility of mutation of an object when only one Arc pointing to that object exists.

Decimating a graph into a frozen DAG


So, to use Arc, we may have to remove some links between objects, like removing the leader field in Group for example. The more links we try to save, the more careful we'll have to be about the order of construction. In the end, we'll still have a largely immutable graph, and no amount of care will allow us to represent everything we might like.

Duplicate your data

Trees of objects are far more amenable to the sort of safety guarantees that Rust provides. There's one clear owner of a value, and you can only access data through a reference to that value. So we could write:

struct User {
    groups: Vec<Group> 
}
struct Group {
    leader: Option<User>
}

Now we have expanded our graph into a tree, by duplicating Users and Groups as needed. Mutation is allowed, but different copies may skew.

Use Mutexes

In Rust, a Mutex permits converting a shared reference to _itself_ into a mutable reference to its _contents_. A Mutex will block the thread rather than return a mutable reference when other references exist. This also explains why a Rust Mutex is not reentrant: if it were, we could create dangling references.

struct User {
    groups: Vec<Arc<Mutex<Group>>>,
}
struct Group {
    user: Option<Arc<Mutex<User>>>
}

This approach gives a fully mutable collection of objects, with arbitrary links between them. There will be no unchecked memory errors. So why not use this?

Let's imagine traversing our collection. To look at each node's contents, even to read it, we need to obtain its mutex. That means during traversal we'll build up a chain of mutexes that we hold. If we visit the same node twice, we'll get deadlock. If we have two threads traversing the collection and they obtain locks in different orders, we'll get deadlock. What's more, in a soup there is no obvious lock order we can try to maintain.

Try interior mutability

The final basic approach to this problem is to use *interior mutability* via `RefCell`. If you think of `Mutex` blocking threads to enforce Rust's safety rules, then `RefCell` crashes your program to achieve the same thing. The code might look like this:


struct User {
    groups: Vec<Rc<RefCell<Group>>>,
}
struct Group {
    user: Option<Rc<RefCell<User>>>
}

Just like with a Mutex, with RefCell we can ask for references to the data wherever we wish, and the code will compile. But if we ask for a mutable reference to an object when there are other references in existence, our program will panic. Like everything well-formed in Rust, RefCell cannot permit a mutable reference to data to co-exist with other references to that data.

Once again, there will be no unchecked errors. But now we are responsible for ensuring that we never try to create a mutable reference to data that is referenced elsewhere. This will not be checked for us by the compiler. We have again committed ourselves to a difficult contract to follow in our code.

Next

In the next part, we'll look in more detail at persian-rug: how it works and the trade-offs it offers at present. After that, we'll talk about ways to improve upon its limitations in the future.

Comments (0)


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Mesa CI and the power of pre-merge testing

08/10/2024

Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…

A shifty tale about unit testing with Maxwell, NVK's backend compiler

15/08/2024

After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…

A journey towards reliable testing in the Linux Kernel

01/08/2024

We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…

Building a Board Farm for Embedded World

27/06/2024

With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…

Smart audio filters with WirePlumber 0.5

26/06/2024

WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…

The latest on cmtp-responder, a permissively-licensed MTP responder implementation

12/06/2024

Part 3 of the cmtp-responder series with a focus on USB gadgets explores several new elements including a unified build environment with…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2024. All rights reserved. Privacy Notice. Sitemap.