DEV Community

Cristian Sifuentes
Cristian Sifuentes

Posted on

C# Loops — From `for` and `foreach` to CPU Pipelines and LLM‑Ready Code

C# Loops — From  raw `for` endraw  and  raw `foreach` endraw  to CPU Pipelines and LLM‑Ready Code

C# Loops — From for and foreach to CPU Pipelines and LLM‑Ready Code

Most developers use loops every day.

Very few truly understand what happens below the syntax.

Why does one for loop fly while another crawls?
Why can foreach be free… or secretly expensive?
Why does the same loop get faster after it runs for a while?
And how can understanding loops help you write LLM‑friendly, performance‑predictable code?

This article is a mental model upgrade — from beginner syntax to processor‑level reality, modern .NET JIT behavior, and how to reason about loops like a scientist.

If you can write for (int i = 0; i < n; i++), you’re ready.


Table of Contents

  1. The Mental Model: What a Loop Really Is
  2. CPU Pipelines & Branch Prediction
  3. Roslyn vs JIT: Who Optimizes Your Loop
  4. for, while, do/while: What Actually Changes
  5. foreach Under the Hood (Arrays vs List vs IEnumerable)
  6. Bounds‑Check Elimination (The Hidden Superpower)
  7. Branch Prediction: Predictable vs Random Data
  8. Span and Zero‑Allocation Iteration
  9. Vectorized Loops (SIMD Taste)
  10. yield return: The Hidden State Machine
  11. World‑Class Loop Heuristics
  12. Why This Matters for LLM‑Assisted Code

1. Mental Model: A Loop Is a Control‑Flow Machine

Every loop is just:

Check
Execute
Jump back

At the CPU level, a loop becomes:

L0:
  compare condition
  branch if true → L0
Enter fullscreen mode Exit fullscreen mode

That back‑edge branch is one of the most optimized patterns in modern CPUs.

Loops are not slow.
Bad memory access and unpredictable branches are slow.


2. CPU Pipelines & Branch Prediction

Modern CPUs:

• Execute instructions speculatively
• Predict branches before knowing the result
• Flush the pipeline on misprediction (~10–20 cycles)

Why loops are special

Loop back‑edges are extremely predictable:

• Taken many times
• Not taken once (exit)

So the loop branch itself is usually cheap.

What actually hurts

• Cache misses (100+ cycles)
• Pointer chasing
• Allocations inside loops
• Interface dispatch
• Bounds checks that weren’t eliminated

👉 Memory beats syntax every time.


3. Roslyn vs JIT — Who Does the Work?

Roslyn (C# compiler)

• Emits IL
• Lowers foreach
• Inserts branches

RyuJIT (runtime)

• Generates machine code
• Removes bounds checks
• Hoists invariants
• Specializes hot loops
• Uses Tiered Compilation + PGO

💡 The same loop may get re‑compiled after warming up.

This is why microbenchmarks need warmup.


4. for, while, do/while — Real Differences

Loop Difference
while Condition first
do/while Body executes at least once
for Same machine shape, clearer intent

Performance difference is usually noise.
Choose based on correctness and readability.


5. foreach Under the Hood

Array

foreach (var x in array)
Enter fullscreen mode Exit fullscreen mode

➡ lowered to a for loop

➡ bounds checks often eliminated

very fast

List

• Uses struct enumerator
• No allocation
• Still very fast

IEnumerable

⚠️ Potential performance cliff:

• Interface dispatch
• Possible allocation
• No bounds‑check elimination

👉 Avoid IEnumerable<T> in hot loops.


6. Bounds‑Check Elimination (BCE)

This loop:

for (int i = 0; i < arr.Length; i++)
  sum += arr[i];
Enter fullscreen mode Exit fullscreen mode

Can become:

no bounds checks inside the loop
Enter fullscreen mode Exit fullscreen mode

Because the JIT can prove safety.

But weird indexing patterns can break BCE.

Rule of thumb:
• Linear access
• Single index
• Cached length


7. Branch Prediction: Data Beats Code

Two loops. Same code. Different data.

• 99% predictable → fast
• 50/50 random → slower

The branch predictor learns data patterns, not syntax.

Sometimes:

Sorting data beats rewriting code.


8. Span: Zero‑Allocation Iteration

Span<int> slice = array.AsSpan(1, 3);

foreach (ref var x in slice)
  x++;
Enter fullscreen mode Exit fullscreen mode

• No allocations
• Stack‑only
• Cache‑friendly
• Safe

Span is one of the most important performance tools in modern .NET.


9. Vectorized Loops (SIMD Taste)

Vector<float> v1, v2;
acc += v1 * v2;
Enter fullscreen mode Exit fullscreen mode

• Uses SIMD when available
• Processes multiple elements per instruction
• Great for numeric workloads

Data layout matters more than loop shape.


10. yield return: The Hidden State Machine

IEnumerable<int> Evens()
{
  yield return 2;
}
Enter fullscreen mode Exit fullscreen mode

This creates:

• A compiler‑generated state machine
• Often a heap allocation
• Extra indirections

Great for clarity.
Avoid in ultra‑hot paths.


11. World‑Class Loop Heuristics

✔ Prefer contiguous memory

✔ Avoid allocations inside loops

✔ Avoid interface dispatch in hot paths

✔ Let the JIT eliminate bounds checks

✔ Measure with BenchmarkDotNet

✔ Optimize memory before branches

Most performance bugs are memory bugs.


12. Why This Matters for LLM‑Assisted Code

LLMs:

• Generate correct syntax

• Do not understand cache lines

• Do not feel branch misprediction

• Do not see GC pressure

Your mental model is the safety net.

If you understand loops at this level, you can:

• Guide LLMs

• Review generated code intelligently

• Predict performance before profiling

• Write code that scales under real load


Final Thought

A loop is not a construct.

It is a contract between your data, the JIT, and the processor.

Once you understand that, you stop guessing — and start engineering.

Happy looping

Top comments (1)

Collapse
 
shemith_mohanan_6361bb8a2 profile image
shemith mohanan

This is a great read 👍
Love how it goes beyond syntax and explains why loops behave the way they do — especially the parts on branch prediction, bounds-check elimination, and foreach pitfalls. The connection to JIT warm-up and LLM-generated code is spot on. Clear mental model, very practical for writing performance-predictable .NET code.