Forem

Cover image for The API Was “Fast” — Until One User Made It Slow for Everyone
Frozen Blood
Frozen Blood

Posted on

The API Was “Fast” — Until One User Made It Slow for Everyone

We had an API that was consistently fast.
~80ms average response time. No complaints.

Then one day, everything slowed down.
Not crashed. Not broken. Just… slow.

And the weird part?

It was caused by one user.


The Symptom: Latency Gradually Creeping Up

At first:

  • p95 latency increased slightly
  • CPU was fine
  • Memory was stable
  • No obvious errors

Then:

  • Requests started taking 500ms+
  • Some hit 2–3 seconds
  • But only during certain times

It wasn’t global traffic.
It was something more subtle.


The Clue: One Endpoint, One Pattern

After digging into logs, we noticed:

  • Almost all slow requests hit the same endpoint
  • Same query pattern
  • Same user ID appearing frequently

That user had… a lot of data.

Way more than anyone else.


The Root Cause: “Works Fine” Query That Didn’t Scale

Here’s the query (simplified):

SQL:

SELECT *
FROM orders
WHERE user_id = $1
ORDER BY created_at DESC;
Enter fullscreen mode Exit fullscreen mode

Looks harmless, right?

Except:

  • No pagination
  • No limit
  • That user had 120,000+ rows

Every request:

  • Pulled all rows
  • Sorted them
  • Serialized them into JSON
  • Sent them over the network

For one user.

Now imagine multiple requests hitting that at once.


Why It Slowed Down Everyone

Node.js is non-blocking… but not magic.

What actually happened:

  • Huge DB query time increased
  • Large JSON serialization blocked the event loop
  • Response size increased network time
  • Other requests waited behind it

One “heavy” request created backpressure for everything else.


The Fix (That Took 10 Minutes)

1️⃣ Add Pagination (Always)

SQL:

SELECT *
FROM orders
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT 50 OFFSET $2;
Enter fullscreen mode Exit fullscreen mode

Or even better: cursor-based pagination.


2️⃣ Add Proper Index

CREATE INDEX idx_orders_user_created
ON orders(user_id, created_at DESC);
Enter fullscreen mode Exit fullscreen mode

This alone drastically reduced query time.


3️⃣ Reduce Payload Size

Instead of SELECT *:

SELECT id, total, status, created_at
FROM orders
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT 50;
Enter fullscreen mode Exit fullscreen mode

Less data → faster everything.


4️⃣ Optional: Protect the API

We added a soft guard:

if (limit > 100) {
  throw new Error("Limit too high");
}
Enter fullscreen mode Exit fullscreen mode

No more accidental “give me everything.”


The Lesson That Stuck With Me

The API wasn’t fast.

It was fast for the average case.

The moment one user had “edge-case data,” the system showed its real behavior.


The Dangerous Assumption

“It works fine with my test data.”

Test data is small. Clean. Predictable.

Production data is:

  • messy
  • uneven
  • sometimes extreme

Systems fail at the edges — not the average.


Conclusion / Key takeaway

Your backend performance isn’t defined by your average user. It’s defined by your heaviest one.

If one user can slow everyone else down, it’s not a user problem — it’s a system design problem.

What’s the most unexpected “edge case” in your data that caused real performance issues?

Top comments (0)