DEV Community

Sachin Tolay
Sachin Tolay

Posted on

Memory Models Explained: How Threads Really See Memory

Modern processors and compilers aggressively reorder instructions to improve performance → a behavior we explored in detail in my previous article: Instruction Reordering: Your Code Doesn’t Always Run in the Order You Wrote It.
To write correct concurrent code or to understand why it breaks, we need to explore memory models: the formal rules that define how threads see and interact with memory operations.
This article explains:

  • What memory models are and how they work
  • The main types of memory models, including Sequential consistency, Total Store Order and relaxed/weak models

What Is a Memory Model?

A memory model is a contract between your program, the compiler, and the CPU that defines:

  • Which memory operations - loads (reads) and stores (writes) can be reordered
  • When the effects of a write become visible to other threads
  • How multiple threads observe reads and writes performed by others

Without a memory model, there’s no way to reason about multithreaded programs → each thread could see operations in any order, leading to unpredictable behavior.

Sequential Consistency: The Intuitive Model

The simplest memory model is Sequential Consistency, defined by Leslie Lamport as:

The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order issued by that processor.

It defines a system where:

  • All threads see memory operations (reads and writes) in the same global order.
  • Each thread sees its own operations occur in the same order as written in the program.

In other words, the execution behaves as if there is a single shared timeline, and all operations from all threads are placed on that timeline in a way that respects each thread’s original instruction order.

This model is easy for programmers to reason about because it matches what we typically expect: operations happen one after another, and everyone sees the same thing.

However, enforcing this strict order requires coordination between cores and often prevents performance optimizations like instruction reordering, store buffering, and speculative execution. That’s why modern hardware typically implements weaker memory models that are more relaxed, but harder to reason about.

Example

Consider two threads sharing variables x and y initialized to 0:

// Thread 1
x = 1;
r1 = y;

// Thread 2
y = 1;
r2 = x;
Enter fullscreen mode Exit fullscreen mode

Under sequential consistency, the result where both r1 == 0 and r2 == 0 is impossible. At least one thread should see the other’s write.

Total Store Order (TSO): Strong but Practical

The x86 architecture (used in Intel and AMD CPUs) follows the Total Store Order (TSO) memory model. It’s stronger and easier to reason about than many weak/relaxed models, while still allowing one key optimization to improve performance. Here’s how it works:

Stores (writes) happen in order

For example, if Thread 1 executes:

x = 1;  
y = 2;
Enter fullscreen mode Exit fullscreen mode

Any other thread that observes these values will always see x = 1 before y = 2.

Loads (reads) happen in order

This means if your code says:

r1 = x;  
r2 = y;  
Enter fullscreen mode Exit fullscreen mode

Then the CPU will load x before y, just as written.

But loads and stores can be reordered with respect to each other.

A later load can be executed before an earlier store, as long as they access different variables. For example:

x = 1;     // Store to x  
y = 2;     // Store to y  
r1 = z;    // Load from z. Note -  z is not accessed above.
Enter fullscreen mode Exit fullscreen mode

Even though x = 1 and y = 2 come first, the CPU might delay committing those stores while performing the load from z early. As a result, r1 might see an outdated value of z, and other threads might not yet observe the updated values of x or y.

Relaxed/Weak: Performance First, Predictability Later

Modern architectures like ARM, POWER etc. implement relaxed memory models. These models give the CPU more freedom to reorder instructions for maximum performance, but they also make it harder for programmers to reason about how memory behaves in concurrent programs.

Unlike TSO (which only reorders loads with earlier stores), relaxed models allow:

  • Stores to be reordered with other stores
  • Loads to be reordered with other loads
  • Stores and loads to be reordered with each other, in both directions That means almost any combination of reordering is allowed → unless the programmer uses explicit memory barriers or synchronization instructions to enforce ordering.

Example

In a relaxed model, this code in Thread 1:

x = 1;
y = 2;
Enter fullscreen mode Exit fullscreen mode

Might be observed by another thread as:

  • y = 2 happening before x = 1
  • Or only one of the stores being visible
  • Or even both stores being delayed entirely

Similarly, two loads:

r1 = x;
r2 = y;
Enter fullscreen mode Exit fullscreen mode

May execute in reverse order internally, and r1 might see an older value while r2 sees a newer one → depending on what the hardware decides.

Programmers can no longer assume that memory behaves “as written.” Writing correct concurrent code now depends on understanding language-level memory models, atomic operations, and memory fences.

Memory Models Summary

In the next article, we’ll explore how synchronization mechanisms (like locks, atomics, and memory barriers) help us write correct concurrent code → even under relaxed memory models, and dive into how they work under the hood.


If you have any feedback on the content, suggestions for improving the organization, or topics you’d like to see covered next, feel free to share → I’d love to hear your thoughts!

Top comments (0)