Leapcell

Posted on Mar 27

Rust Concurrency: Atomic Explained

#webdev #programming #backend #rust

Atomic Types and Atomic Operations

An atom refers to a series of machine instructions that cannot be interrupted or context-switched by the CPU. These instructions, when grouped together, form atomic operations. On a multi-core CPU, when a core starts executing an atomic operation, it will pause memory operations on other CPU cores to ensure that the atomic operation is not interfered with.

An atomic operation refers to one or more operations that are indivisible and uninterruptible. In concurrent programming, certain guarantees must be provided at the CPU level to ensure a sequence of operations is atomic. An atomic operation may consist of a single step or multiple steps, but the sequence of these steps must not be disrupted, and their execution must not be interrupted by any other mechanism.

Note: Since atomic operations are supported directly by CPU instructions, they generally perform much better than locks or message passing. Compared to locks, atomic types do not require developers to manage lock acquisition and release and also support operations such as modification and reading, with higher concurrent performance. Almost all programming languages support atomic types.

Atomic types are data types that help developers more easily implement atomic operations. Atomic types are lock-free, but lock-free doesn’t mean wait-free. Internally, atomic types use a CAS loop, so when there is a lot of contention, waiting is still necessary! Nevertheless, they are generally better than locks.

Note: CAS stands for Compare and Swap. It reads a specific memory address using a single instruction and checks whether its value matches a given expected value. If so, it updates the value to a new one.

As a concurrency primitive, atomic operations are the cornerstone for implementing all other concurrency primitives. Almost all programming languages support atomic types and operations. For example, Java provides many atomic types in java.util.concurrent.atomic, Go provides support via the sync/atomic package, and Rust is no exception.

Note: Atomic operations are a CPU-level concept. In programming languages, there is a similar concept called concurrency primitives. These are functions provided by the kernel to be invoked externally, and such functions are not allowed to be interrupted during execution.

Atomic Primitives in Rust

In Rust, atomic types are located in the std::sync::atomic module.

The documentation for this module describes atomic types as follows: Atomic types in Rust provide primitive shared-memory communication between threads and serve as the foundation for building other concurrency types.

The std::sync::atomic module currently offers the following 12 atomic types:

AtomicBool
AtomicI8
AtomicI16
AtomicI32
AtomicI64
AtomicIsize
AtomicPtr
AtomicU8
AtomicU16
AtomicU32
AtomicU64
AtomicUsize

Atomic types are not significantly different from regular types—for example, AtomicBool and bool—except that the former can be used in multithreaded contexts, while the latter is more suitable for single-threaded usage.

Take AtomicI32 as an example. It is defined as a struct and includes the following methods related to atomic operations:

pub fn fetch_add(&self, val: i32, order: Ordering) -> i32 - Performs addition (or subtraction) on the atomic type
pub fn compare_and_swap(&self, current: i32, new: i32, order: Ordering) -> i32 - CAS (deprecated in Rust 1.50, replaced by compare_exchange)
pub fn compare_exchange(&self, current: i32, new: i32, success: Ordering, failure: Ordering) -> Result<i32, i32> - CAS
pub fn load(&self, order: Ordering) -> i32 - Reads the value from the atomic type
pub fn store(&self, val: i32, order: Ordering) - Writes a value to the atomic type
pub fn swap(&self, val: i32, order: Ordering) -> i32 - Swaps values

As you can see, each method takes an Ordering parameter. Ordering is an enum that represents the strength of the memory barrier for that operation and is used to control the memory ordering of atomic operations.

Note: Memory ordering refers to the order in which the CPU accesses memory, which may be affected by:

The order of statements in code

Compiler optimizations that reorder memory access at compile time (memory reordering)

CPU-level caching mechanisms at runtime that may disrupt access order

pub enum Ordering {
    Relaxed,
    Release,
    Acquire,
    AcqRel,
    SeqCst,
}

In Rust, the enum values in Ordering represent:

Relaxed – The loosest rule, imposes no restrictions on the compiler or CPU, allowing maximum reordering
Release – Sets a memory barrier to ensure all operations before it happen-before this one. Operations after it may be reordered before it (used for writes)
Acquire – Sets a memory barrier to ensure all operations after it happen-after this one. Operations before it may be reordered after it. Commonly paired with Release in other threads (used for reads)
AcqRel – A combination of Acquire and Release, ensuring both directions of memory ordering. For load, it behaves as Acquire; for store, as Release. Often used in methods like fetch_add
SeqCst (Sequentially Consistent) – A stronger version of AcqRel. No reordering of operations around a SeqCst atomic operation is allowed within a thread. It also guarantees a consistent global ordering across all threads for all SeqCst operations. Though it offers lower performance, it is the safest option.

With the Ordering enum, developers can customize the underlying memory ordering behavior.

Note: What is Memory Ordering? From Wikipedia:
Memory ordering is the order in which a CPU accesses main memory. It can be determined at compile time by the compiler or at runtime by the CPU. It reflects memory operation reordering and out-of-order execution, designed to maximize bus bandwidth usage between different memory components. Most modern processors execute instructions out of order. Therefore, memory barriers are needed to ensure synchronization between threads.

To better understand memory ordering, imagine two threads operating on an AtomicI32. Suppose the initial value is 0. One thread performs a write, updating the value to 10, and the other performs a read. If the write completes before the read, will the reading thread definitely see 10? The answer is not necessarily. Due to compiler optimizations and CPU strategies, the updated value might still be in the register and not yet flushed to memory. To ensure register-to-memory synchronization, memory ordering is required.

Release ensures that the register value is written to memory. Acquire ignores the local register and directly reads from memory. For example, when calling store with Ordering::Release, followed by a load using Ordering::Acquire, we can ensure that the reading thread will read the most recent value.

Using Atomic in Multithreading

Because all atomic types implement the Sync trait, sharing atomic variables across threads is safe. However, since atomic types themselves do not provide a sharing mechanism, the common approach is to place them inside an atomically reference-counted smart pointer, Arc. Below is a simple spinlock example from the official documentation:

use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;

fn main() {
    // Create a lock using an atomic type and share ownership via Arc
    let spinlock = Arc::new(AtomicUsize::new(1));
    // Increase the reference count
    let spinlock_clone = spinlock.clone();

    let thread = thread::spawn(move || {
        // SeqCst ordering: store (write) operation uses release semantics
        // meaning operations before the store cannot be reordered after it
        spinlock_clone.store(0, Ordering::SeqCst);
    });

    // Use a while loop to wait for the critical section to become available
    // SeqCst ordering: load (read) operation uses acquire semantics
    // meaning operations after the load cannot be reordered before it
    // The write instruction from the thread above ensures that
    // subsequent reads/writes will not be reordered before it
    while spinlock.load(Ordering::SeqCst) != 0 {}

    if let Err(panic) = thread.join() {
        println!("Thread had an error: {:?}", panic);
    }
}

Note: A spinlock refers to a locking mechanism where, when a thread tries to acquire a lock that is already held by another thread, it cannot acquire it immediately. Instead, the thread waits and retries after some time. The term "spin" comes from the fact that the CPU enters a busy-wait loop (as in the while loop above) to wait for the critical section to become available.

Spinlocks reduce the cost of thread blocking and are suitable for scenarios with low contention and very short lock durations. However, in cases of high contention or long critical section execution, the CPU cost of spinning can outweigh the cost of suspending the thread. This could waste CPU cycles and degrade system performance, as spinning threads prevent other threads from getting CPU time.

The example above shows how to implement a spinlock using Ordering::SeqCst for memory ordering. Now let's try implementing a custom spinlock:

use std::sync::{
    atomic::{AtomicBool, Ordering},
    Arc,
};
use std::thread;
use std::time::Duration;

struct SpinLock {
    lock: AtomicBool,
}

impl SpinLock {
    pub fn new() -> Self {
        Self {
            lock: AtomicBool::new(false),
        }
    }

    pub fn lock(&self) {
        while self
            .lock
            .compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed)
            .is_err()
        // Attempt to acquire lock; if it fails, keep spinning
        {
            // Because CAS is expensive, on failure we simply load the lock status
            // and retry CAS only when we detect the lock has been released
            while self.lock.load(Ordering::Relaxed) {}
        }
    }

    pub fn unlock(&self) {
        // Release the lock
        self.lock.store(false, Ordering::Release);
    }
}

fn main() {
    let spinlock = Arc::new(SpinLock::new());
    let spinlock1 = spinlock.clone();

    let thread = thread::spawn(move || {
        // Child thread acquires lock using compare_exchange
        spinlock1.lock();
        thread::sleep(Duration::from_millis(100));
        println!("do something1!");
        // Child thread releases lock
        spinlock1.unlock();
    });

    thread.join().unwrap();

    // Main thread acquires lock
    spinlock.lock();
    println!("do something2!");
    // Main thread releases lock
    spinlock.unlock();
}

In the above custom spinlock implementation, the lock is essentially a single atomic type: AtomicBool, with an initial value of false.

When calling the lock method to acquire the lock, it uses the atomic operation compare_exchange (CAS). If CAS fails, the thread will spin in a while loop. There’s a small performance optimization here: since CAS is relatively expensive, after a failure, the thread enters a lightweight loop using a simple load to check the lock status. Only when the lock is detected to be released does it retry CAS. This approach is more efficient.

When calling unlock, it simply sets the AtomicBool to false using store with Ordering::Release, which flushes the value from the register to memory. If a thread is spinning on the lock method and performs compare_exchange with Ordering::Acquire, it will ignore its current register value and fetch the latest value from memory. If it sees false, CAS will succeed, and the thread will acquire the lock.

Can Atomic Replace Locks?

Given how powerful atomic types are, can they completely replace traditional locks? The answer is: no.

Here are the reasons:

For complex scenarios, using locks is simpler and less error-prone.
The std::sync::atomic module only provides atomic operations for numeric types, such as AtomicBool, AtomicIsize, AtomicUsize, etc., while locks can be applied to any kind of data.
In some situations, locks are necessary to coordinate with other primitives, such as Mutex, RwLock, Condvar, etc.

Use Cases for Atomic

In practice, although Atomic types may not be frequently used by everyday application developers, they are very commonly used by high-performance library developers and standard library maintainers. Atomic operations are the foundation of concurrency primitives, and beyond that, there are several applicable scenarios:

Lock-free data structures
Global variables, such as a global auto-increment ID (to be discussed in a later section)
Cross-thread counters, e.g., for collecting metrics

The above are just some examples of where Atomic types can be used. In real-world scenarios, it’s up to the developer to evaluate and decide based on actual needs.

Summary

An atom is like the indivisible unit in biology—the smallest unit that cannot be further divided. An atomic operation is "an operation or series of operations that cannot be interrupted." Atomic types are data types that help developers implement such atomic operations more easily. Concurrency primitives are kernel-level functions that can be called externally, and their execution must not be interrupted.

Atomic types are lock-free; they internally use a CAS loop and do not require the developer to handle locking and unlocking. They support atomic operations such as modifying and reading values. Since these operations are supported by CPU instructions, they perform much better than locking or message passing.

Atomic operations must be used together with memory ordering (Ordering). This enum allows developers to customize the underlying memory ordering behavior. Because Atomic types offer better performance than locks in many scenarios, they are widely used in Rust—for example, as global variables or shared variables across threads. However, they cannot completely replace locks, because locks are simpler and more broadly applicable.

Atomic operations can be grouped into the following five categories:

fetch_add – Performs addition (or subtraction) on the atomic type
compare_and_swap and compare_exchange – Compare values and swap if equal
load – Reads the value from the atomic type
store – Writes a value into the atomic type
swap – Swaps values

We are Leapcell, your top choice for hosting Rust projects.

Leapcell is the Next-Gen Serverless Platform for Web Hosting, Async Tasks, and Redis:

Multi-Language Support

Develop with Node.js, Python, Go, or Rust.

Deploy unlimited projects for free

pay only for usage — no requests, no charges.

Unbeatable Cost Efficiency

Pay-as-you-go with no idle charges.
Example: $25 supports 6.94M requests at a 60ms average response time.

Streamlined Developer Experience

Intuitive UI for effortless setup.
Fully automated CI/CD pipelines and GitOps integration.
Real-time metrics and logging for actionable insights.

Effortless Scalability and High Performance

Auto-scaling to handle high concurrency with ease.
Zero operational overhead — just focus on building.

Explore more in the Documentation!

Read on our blog

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

DEV Community