Frozen Blood

Posted on Mar 24

The API Was “Fast” — Until One User Made It Slow for Everyone

#backend #webdev #database #javascript

We had an API that was consistently fast.
~80ms average response time. No complaints.

Then one day, everything slowed down.
Not crashed. Not broken. Just… slow.

And the weird part?

It was caused by one user.

The Symptom: Latency Gradually Creeping Up

At first:

p95 latency increased slightly
CPU was fine
Memory was stable
No obvious errors

Then:

Requests started taking 500ms+
Some hit 2–3 seconds
But only during certain times

It wasn’t global traffic.
It was something more subtle.

The Clue: One Endpoint, One Pattern

After digging into logs, we noticed:

Almost all slow requests hit the same endpoint
Same query pattern
Same user ID appearing frequently

That user had… a lot of data.

Way more than anyone else.

The Root Cause: “Works Fine” Query That Didn’t Scale

Here’s the query (simplified):

SQL:

SELECT *
FROM orders
WHERE user_id = $1
ORDER BY created_at DESC;

Looks harmless, right?

Except:

No pagination
No limit
That user had 120,000+ rows

Every request:

Pulled all rows
Sorted them
Serialized them into JSON
Sent them over the network

For one user.

Now imagine multiple requests hitting that at once.

Why It Slowed Down Everyone

Node.js is non-blocking… but not magic.

What actually happened:

Huge DB query time increased
Large JSON serialization blocked the event loop
Response size increased network time
Other requests waited behind it

One “heavy” request created backpressure for everything else.

The Fix (That Took 10 Minutes)

1️⃣ Add Pagination (Always)

SQL:

SELECT *
FROM orders
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT 50 OFFSET $2;

Or even better: cursor-based pagination.

2️⃣ Add Proper Index

CREATE INDEX idx_orders_user_created
ON orders(user_id, created_at DESC);

This alone drastically reduced query time.

3️⃣ Reduce Payload Size

Instead of SELECT *:

SELECT id, total, status, created_at
FROM orders
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT 50;

Less data → faster everything.

4️⃣ Optional: Protect the API

We added a soft guard:

if (limit > 100) {
  throw new Error("Limit too high");
}

No more accidental “give me everything.”

The Lesson That Stuck With Me

The API wasn’t fast.

It was fast for the average case.

The moment one user had “edge-case data,” the system showed its real behavior.

The Dangerous Assumption

“It works fine with my test data.”

Test data is small. Clean. Predictable.

Production data is:

messy
uneven
sometimes extreme

Systems fail at the edges — not the average.

Conclusion / Key takeaway

Your backend performance isn’t defined by your average user. It’s defined by your heaviest one.

If one user can slow everyone else down, it’s not a user problem — it’s a system design problem.

What’s the most unexpected “edge case” in your data that caused real performance issues?

DEV Community