Mostafa Magdy

Posted on Mar 24

Go Doesn’t Protect You From Your Own Memory: The Hidden OOM Bug in Production

#go #backend #performance

Go is fast until your service suddenly gets OOM-killed in production.

Not because of a bug.
Not because of bad infrastructure.

But because of how you process data in memory.

The silent killer: loading everything before doing anything

There’s a pattern that shows up everywhere in Go code:

// 1. Collect everything
var items []Item
for rows.Next() {
    var item Item
    rows.Scan(&item)
    items = append(items, item)
}

// 2. Then process it
for _, item := range items {
    process(item)
}

This is called eager evaluation (or collect-then-process).

It’s simple.
It’s readable.
It works perfectly… until it doesn’t.

What actually happens in production

Let’s say one request loads 2 GB of data.

Now imagine:

Request 1 → +2 GB
Request 2 → +2 GB (total: 4 GB)
Request 3 → 💥 OOM Kill

Your service didn’t “crash”.

It just ran out of memory.

Why you don’t catch this early

Dev data is small
Test data is small
Everything looks fast and clean

Then a real customer hits your endpoint with real data and suddenly your pod is gone.

The real problem

The issue is not Go.

The issue is materializing the entire dataset in memory before doing any work.

You’re turning a stream into a big in-memory blob.

The better model: stream, don’t store

Instead of:

[data source] → [collect everything] → [process]

Think:

[data source] → [one item] → [transform] → [process] → repeat

Or even simpler:

Think of it like a conveyor belt, not a warehouse.

One item flows through
Gets processed
Gets discarded
Next item comes

👉 Memory stays constant
👉 Processing starts immediately
👉 No more “dataset size” problems

A simple example

Even with basic data:

// bad
nums := []int{1,2,3,4,5}
var result []int

for _, n := range nums {
    if n > 2 {
        result = append(result, n)
    }
}

You’re still collecting before finishing processing.

A lazy approach avoids that pattern entirely.

Real-world example: CSV export

This is where things usually break.

❌ The typical implementation

func exportHandler(w http.ResponseWriter, r *http.Request) {
    rows, _ := db.QueryContext(r.Context(), "SELECT ...")
    defer rows.Close()

    var orders []Order
    for rows.Next() {
        var o Order
        rows.Scan(&o.ID, &o.Name)
        orders = append(orders, o)
    }

    writer := csv.NewWriter(w)
    for _, o := range orders {
        writer.Write([]string{o.ID, o.Name})
    }
    writer.Flush()
}

Problems:

Loads everything into memory
Delays response until full load
Scales poorly with data size

Streaming approach

rows    := sources.DBRows(ctx, db, query, scanOrder)
active  := iterx.Filter(ctx, rows, isActive)
csvRows := iterx.Map(ctx, active, toCSV)

iterx.Drain(ctx, csvRows, func(row []string) error {
    return csvWriter.Write(row)
})

What changed?

No slice
No full dataset in memory
Starts sending response immediately
Stops instantly if client disconnects

👉 Peak memory: a few KB

This problem shows up everywhere

File uploads

// bad
content, _ := io.ReadAll(file)
records, _ := csv.NewReader(bytes.NewReader(content)).ReadAll()

→ duplicates data in memory

Streaming version

rows    := sources.CSVRows(ctx, file)
valid   := iterx.Filter(ctx, rows, isValid)
cleaned := iterx.Map(ctx, valid, normalize)

iterx.Drain(ctx, cleaned, process)

👉 Constant memory regardless of file size

Log processing

Processing a 4GB .jsonl file:

❌ naive: ~12GB memory (multiple copies)
✅ streaming: ~1MB

Benchmarks

CSV export (1,000,000 rows)

Approach	Peak Memory	Time to First Byte
Eager	287 MB	After full load
Streaming	3 MB	Immediate

JSONL processing

Approach	Peak Memory	Time
Eager	194 MB	~909 ms
Streaming	1 MB	~24 ms

Why it’s faster (not just smaller)

Streaming improves:

Memory usage
Cache locality
Latency (starts immediately)
Cancellation (stop early)

What about parallel processing?

You can still scale:

results := parallel.OrderedParallelMap(ctx, rows, enrich, 8)

iterx.Drain(ctx, results, writeOutput)

8 workers
Preserves order
Context-aware cancellation

So where does vortex fit in?

Go 1.23 introduced iter.Seq, which enables lazy evaluation.

But building full pipelines with:

error handling
context propagation
cancellation
parallelism

…quickly becomes repetitive.

vortex solves this

Composable lazy pipelines
Zero dependencies
Built-in resilience (retry, circuit breaker)
Works with DB, CSV, JSONL, streams

go get github.com/MostafaMagdSalama/vortex@latest

When you should NOT use this

Be pragmatic.

Don’t use this approach when:

Dataset is small (<10k rows)
You’re writing simple scripts
Readability matters more than performance

👉 A slice is totally fine in those cases.

The real takeaway

This isn’t about a library.

It’s about how you think about data flow.

Most production issues here come from:

“Load everything → then process”