Sachin Tolay

Posted on Jun 28

Memory Models Explained: How Threads Really See Memory

#memory #programming #performance #multithreading

Modern processors and compilers aggressively reorder instructions to improve performance → a behavior we explored in detail in my previous article: Instruction Reordering: Your Code Doesn’t Always Run in the Order You Wrote It.
To write correct concurrent code or to understand why it breaks, we need to explore memory models: the formal rules that define how threads see and interact with memory operations.
This article explains:

What memory models are and how they work
The main types of memory models, including Sequential consistency, Total Store Order and relaxed/weak models

What Is a Memory Model?

A memory model is a contract between your program, the compiler, and the CPU that defines:

Which memory operations - loads (reads) and stores (writes) can be reordered
When the effects of a write become visible to other threads
How multiple threads observe reads and writes performed by others

Without a memory model, there’s no way to reason about multithreaded programs → each thread could see operations in any order, leading to unpredictable behavior.

Sequential Consistency: The Intuitive Model

The simplest memory model is Sequential Consistency, defined by Leslie Lamport as:

The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order issued by that processor.

It defines a system where:

All threads see memory operations (reads and writes) in the same global order.
Each thread sees its own operations occur in the same order as written in the program.

In other words, the execution behaves as if there is a single shared timeline, and all operations from all threads are placed on that timeline in a way that respects each thread’s original instruction order.

This model is easy for programmers to reason about because it matches what we typically expect: operations happen one after another, and everyone sees the same thing.

However, enforcing this strict order requires coordination between cores and often prevents performance optimizations like instruction reordering, store buffering, and speculative execution. That’s why modern hardware typically implements weaker memory models that are more relaxed, but harder to reason about.

Example

Consider two threads sharing variables x and y initialized to 0:

// Thread 1
x = 1;
r1 = y;

// Thread 2
y = 1;
r2 = x;

Under sequential consistency, the result where both r1 == 0 and r2 == 0 is impossible. At least one thread should see the other’s write.

Total Store Order (TSO): Strong but Practical

The x86 architecture (used in Intel and AMD CPUs) follows the Total Store Order (TSO) memory model. It’s stronger and easier to reason about than many weak/relaxed models, while still allowing one key optimization to improve performance. Here’s how it works:

Stores (writes) happen in order

For example, if Thread 1 executes:

x = 1;  
y = 2;

Any other thread that observes these values will always see x = 1 before y = 2.

Loads (reads) happen in order

This means if your code says:

r1 = x;  
r2 = y;

Then the CPU will load x before y, just as written.

But loads and stores can be reordered with respect to each other.

A later load can be executed before an earlier store, as long as they access different variables. For example:

x = 1;     // Store to x  
y = 2;     // Store to y  
r1 = z;    // Load from z. Note -  z is not accessed above.

Even though x = 1 and y = 2 come first, the CPU might delay committing those stores while performing the load from z early. As a result, r1 might see an outdated value of z, and other threads might not yet observe the updated values of x or y.

Relaxed/Weak: Performance First, Predictability Later

Modern architectures like ARM, POWER etc. implement relaxed memory models. These models give the CPU more freedom to reorder instructions for maximum performance, but they also make it harder for programmers to reason about how memory behaves in concurrent programs.

Unlike TSO (which only reorders loads with earlier stores), relaxed models allow:

Stores to be reordered with other stores
Loads to be reordered with other loads
Stores and loads to be reordered with each other, in both directions That means almost any combination of reordering is allowed → unless the programmer uses explicit memory barriers or synchronization instructions to enforce ordering.

Example

In a relaxed model, this code in Thread 1:

x = 1;
y = 2;

Might be observed by another thread as:

y = 2 happening before x = 1
Or only one of the stores being visible
Or even both stores being delayed entirely

Similarly, two loads:

r1 = x;
r2 = y;

May execute in reverse order internally, and r1 might see an older value while r2 sees a newer one → depending on what the hardware decides.

Programmers can no longer assume that memory behaves “as written.” Writing correct concurrent code now depends on understanding language-level memory models, atomic operations, and memory fences.

In the next article, we’ll explore how synchronization mechanisms (like locks, atomics, and memory barriers) help us write correct concurrent code → even under relaxed memory models, and dive into how they work under the hood.

If you have any feedback on the content, suggestions for improving the organization, or topics you’d like to see covered next, feel free to share → I’d love to hear your thoughts!

DEV Community

Memory Models Explained: How Threads Really See Memory

What Is a Memory Model?

Sequential Consistency: The Intuitive Model

Example

Total Store Order (TSO): Strong but Practical

Stores (writes) happen in order

Loads (reads) happen in order

But loads and stores can be reordered with respect to each other.

Relaxed/Weak: Performance First, Predictability Later

Example

Top comments (0)