Instruction Reordering: Your Code Doesn’t Always Run in the Order You Wrote It

#programming #tutorial #coding #memory

When writing code, you naturally expect instructions to run one after the other in the exact order they appear. For example:

x=1;
y=2;

You’d expect x = 1 to complete before y = 2 starts.

But in reality, modern CPUs and compilers don’t always execute instructions in the exact sequence you wrote them. Instead, they reorder instructions internally to improve performance. While this might sound risky, it’s a core optimization that enables today’s processors to run billions of instructions per second.

To fully appreciate why these reorderings happen, it helps to first understand the parallel execution techniques CPUs use that I have explained in detail here: Superscalar vs SIMD vs Multicore: Understanding Modern CPU Parallelism.

This article explains:

What instruction reordering is.
Why do CPUs and compilers perform it.
How it affects multithreaded programs.
And why understanding it is critical for writing correct concurrent code.

What is Instruction Reordering?

Instruction reordering means the order in which instructions are executed can differ from the order they appear in your source code. There are two main types of reordering:

Compiler Reordering — The compiler rearranges instructions as part of the code generation process to produce faster machine code.
CPU Reordering (Out-of-Order Execution) — CPUs execute instructions out of their original order internally to better utilize available execution units and reduce pipeline stalls.

Both types of reordering are done transparently to the programmer in single-threaded programs, so your code behaves as expected. However, when multiple threads interact via shared memory, these reorderings can cause subtle and hard-to-debug bugs.

Why Do CPUs and Compilers Reorder Instructions?

Both CPUs and compilers reorder instructions primarily to improve performance by making better use of hardware resources and minimizing delays.

Improving CPU Utilization

Modern CPUs have multiple execution units per core (such as ALUs, FPUs, and load/store units) that can operate in parallel. To keep these units busy, the CPU issues and executes multiple independent instructions simultaneously, even if they appear sequentially in your code.

a = b + c; // Instruction 1
x = y + z;  // Instruction 2 (independent of Instruction 1)

Here, the CPU can execute both instructions at the same time in different execution units, rather than waiting to finish Instruction 1 before starting Instruction 2. This parallelism boosts throughput.

Hiding Memory Latency

Memory access can be slow compared to CPU speeds. When an instruction needs data from memory, the CPU doesn’t just wait idly → it reorders instructions to execute other independent instructions that are ready to run.

x = 1; // Instruction 1
y = slowLoad(); // Instruction 2 (memory access, slower)
z = 2;          // Instruction 3

While Instruction 2 waits for the memory load, the CPU can execute Instruction 3 immediately, avoiding pipeline stalls and improving efficiency.

Compiler Optimizations

Compilers reorder instructions during code optimization to produce faster, more efficient machine code. This includes:

Reordering independent instructions to improve scheduling.
Moving calculations that don’t change out of loops.
Eliminating repeated computations by reusing previously computed values.

Consider the following code snippet inside a loop:

for (int i = 0; i < 1000; i++) {
  int a = 5 * 2; // same calculation every iteration
  int b = a + i;
  int c = 5 * 2; // repeated calculation
  array[i] = b + c;
}

After optimization, the generated code could look like:

int a = 5 * 2; // computed once before the loop
for (int i = 0; i < 1000; i++) {
  int b = a + i;
  int c = a; // reuse computed value
  array[i] = b + c;
}

Why Instruction Reordering Matters in Multithreaded Programs

When multiple threads access shared memory without proper synchronization, instruction reordering can lead to unexpected behaviors.

// Shared variables
int data = 0;
int flag = 0;

// Thread 1
data = 42; // Step 1
flag = 1; // Step 2

// Thread 2
if (flag == 1) { // Step 3
  print(data); // Step 4
}

You’d expect that if Thread 2 sees flag == 1, it should also see data = 42. But if the compiler or CPU reorders flag = 1 before data = 42, Thread 2 might read flag == 1 but data = 0. This kind of subtle bug is caused by instruction reordering combined with visibility issues in multithreaded memory.

How CPUs & Compilers Avoid Breaking Single-Threaded Programs

Even though CPUs and compilers reorder instructions to run faster, they make sure your program still behaves as if the instructions ran exactly in the order you wrote them → when you’re running a single thread.

They do this by carefully tracking data dependencies between instructions. For example, if one instruction needs the result of another, it won’t be moved before that instruction.

CPUs also use special hardware mechanisms, like reorder buffers, to keep track of the original program order and only commit results in that order. So, even if instructions execute out of order internally, the program’s visible behavior stays consistent.

This means you don’t have to worry about your code acting strangely due to instruction reordering in normal, single-threaded programs.

How to Write Correct Multithreaded Programs Despite Reordering

As explained before, Instruction reordering can lead to subtle and unpredictable bugs in multithreaded programs. One thread might observe memory updates from another in an unexpected order, breaking the intended logic of your program.

Because CPUs and compilers apply many different optimizations based on context, hardware, and surrounding code, the specific ordering of operations can vary in ways you might not expect. This makes reasoning about shared memory behavior tricky without proper safeguards.

To write correct multithreaded code, you need to use synchronization tools that control visibility and ordering between threads:

Locks (mutexes): Prevent simultaneous access to shared data.
Atomic operations: Ensure safe, indivisible updates.
Memory barriers (fences): Stop certain reorderings from happening across critical instruction boundaries.

These tools help establish happens-before relationships → ensuring that the operations in one thread become visible to another in a predictable and controlled manner. To apply these tools correctly, it’s important to understand:

What memory models are, and how they define visibility and ordering guarantees
How different synchronization mechanisms enforce those guarantees

These advanced topics will be explored in upcoming articles, as they are essential for writing safe and efficient concurrent systems.

If you have any feedback on the content, suggestions for improving the organization, or topics you’d like to see covered next, feel free to share → I’d love to hear your thoughts!