Demystifying the complexity of RAII

#rust #cpp #java #python

The idiom

Whenever I look at an article explaining Resource Acquisition Is Initialization (RAII) idiom, I feel uncertain if I would understand it if I did not already know it, so here is my attempt to make it easy.

First, let us roughly group the mainstream languages by the memory management model.

Manual: C, Assembly
Garbage-collected (GC): Java, Python, Javascript, Kotlin, C#
Automatic Reference Counting (ARC): Swift, Objective-C
Resource Acquisition Is Initialization (RAII): C++, Rust

There are numerous differences between ARC and tracing garbage collection, but we are interested in a particular one. Unlike tracing garbage collection, there is no background process deallocating the objects asynchronously in ARC. Therefore the objects are destroyed within the same reference decrement invocation where the reference count reaches zero.

One of the ways to see RAII is as ARC, with excess freedoms removed. If we take ARC as a baseline, then RAII would be different in the following ways:

Reference count can only be equal to 0 or 1. In other words, every resource can only be owned by a single variable.
A reference can be explicitly moved from one variable to another using features built into the language.
A reference can be borrowed when providing it as a function argument without increasing the reference count.

Let us take a look at some RAII pseudocode.

// RAII pseudocode

// Create new object and assign it to the variable.
var a0 = A(); // 🟢 OK

// This wouldn't build.
// An object can only be held by one variable.
var a1 = a0; // ❌ ERROR

// Move the object to another variable.
// a1 is uninitialized after that (known at compile-time).
var a1 = move a0; // 🟢 OK

// Copy the object into another variable.
// a1 keeps its original value.
var a2 = copy a1; // 🟢 OK

// This function requires ownership of an instance of type A.
fun foo(A) : B;

// Provide the object as a function argument, moving it.
// Just like when we moved the object to initialize a
// variable, this leaves the source variable uninitialized.
var b1 = foo(a1); // ❌ ERROR
var b2 = foo(borrow a1); // ❌ ERROR
var b3 = foo(move a1); // 🟢 OK

// This function does not require a dedicated instance.
// We use & to signal that a borrowed object is enough.
fun bar(A&) : B;

// Provide the object as a function argument, borrowing it.
// Although function bar does not require a dedicated
// instance, it can work with one.
var с1 = bar(a2); // ❌ ERROR
var с2 = bar(borrow a2); // 🟢 OK
var с3 = bar(move a2); // 🟢 OK

Languages & features

In RAII languages, every line is basically a try-with-resource block.

RAII originated in C++, but do not let yourself be intimidated by that. If you code in Java 7 or above, you have already glanced at certain aspects of this idiom – try-with-resource blocks.

// Java

// Java standard interface
public interface AutoCloseable { 
    void close() throws Exception;
}

try (var res = obtain()) {
    // The resource, and the variable referencing it, both
    // only exist inside this block.
}

The variable only exists inside the block.
Lifetime of the resource is bound to the lifetime of the variable.
Any usage of the res variable is effectively borrowing. No matter what happens inside the block (within reason), it will not change the scope of res, and, therefore, the lifetime of the resource it holds.

If multiple resources are required, blocks can be stacked.

// Java

public static void main() {
    try (var a = obtain()) {
        try (var b = obtainDependent(a)) {
            try (var c = obtainDependent(b)) {
            }
        }
    }
}

Furthermore, multiple resources can be declared in a single block. Once control exits the block, the resources are destroyed in reverse order of declaration, mimicking stacked blocks behaviour.

// Java

public static void main() {
    try (var a = obtain();
         var b = obtainDependent(a);
         var c = obtainDependent(b)) {
    }
}

In RAII languages, every line of code is basically a try-with-resource block. Not just every line of the function body but also every member variable declaration in a class or a structure. Destruction in reverse order of declaration mentioned above is the key to making this possible because subsequently declared resources can be dependent on one or multiple previously declared ones.

// RAII pseudocode

fun foo() {
    var a = obtain(); // Destructed third
    var b = obtainDependent(a); // Destructed second
    var c = obtainDependent(b); // Destructed first
}

// RAII pseudocode

struct Foo {
    ResourceTypeA a; // Destructed third
    ResourceTypeB b; // Destructed second
    ResourceTypeC c; // Destructed first
};

fun bar() {
    var a = obtain();
    var b = obtainDependent(a);
    var c = obtainDependent(b);

    // After this line, the a, b and c variables are empty.
    // Destroying d will trigger the destruction of all Foo
    // member variables in reverse order of declaration.
    var d = Foo(move a, move b, move c);
}

The variables, and the resources they hold, declared in either function body or as a member variable in a class or a structure, are destructed in reverse order of declaration, similar to the multiple resources in a single try-with-resource block mentioned above.

Python, too has a similar with-as feature, allowing to specify multiple resources.

# Python

with (obtain() as a, 
      obtainDependent(a) as b, 
      obtainDependent(b) as c):
    # The resources are owned by a, b, and c variables.
    # The resources and the variables only exist inside
    # this block.

RAII origins

C++ first appeared in 1985; however, if we look at the design patterns and common practices that have developed within the community pre C++11, it becomes clear that the community has not been sticking to the core idiom of the language, and for a good reason.

Before C++11, the language lacked a critical feature. There was simply no idiomatic way to move a resource (object) from one variable to another or pass the ownership in a function call. That left C++ engineers of the time with very limited options: pass raw pointers around like there is no resource management built into the language or wrap most objects in std::auto_ptr. A fallback to any of those substantially defeats the purpose of the RAII idiom.

Thoughts

I believe it is time to reevaluate the contract we signed when Java, and other languages with tracing garbage collection, appeared. Is the benefit of being able sometimes to disregard the scope of a lifetime of an object worth having to design the language around this commodity? The more experience I get with Java/C++/Rust, the more I find the language design sacrifices made in Java harder and harder to justify. To name just a few:

No real constants semantics like in Rust, C++, or even C.
We still care about memory management, but the problem moved into another perspective. At some point, any application, client or server starts having performance issues if the behaviour of the garbage collector is not taken into consideration. Java engineers have to go through a tremendous amount of learning to predict and fight the GC behaviour.
The problems above lead to having to make some classes immutable by design, which puts engineers in a weird position of having to make this choice ahead of time.

Those sacrifices may be a result of a conscious choice. However, we know way more about their consequences today than when we agreed to them in the 90s.

Although Swift does not have these problems because it uses ARC instead of a tracing garbage collector, it is still vulnerable to the reference loops. Rust, on the other hand, is not just immune to the reference loops but will not even let us make one. Except for when we use shared pointers. A strict resource management model helps to design good software by making the engineer think about the relationships.

With this in mind, I would like to invite every sceptic to have a serious look at RAII, and more specifically, Rust. As we pointed out earlier, this is not only about performance. In fact, performance is entirely secondary: the critical reasoning for giving RAII a better look is the language design decisions it allows us to make.