Launch day.
Traffic was peaking.
Dashboards were green.
Everything looked perfect… for about 10 minutes.
Then—boom.
Our site slowed to a crawl and went completely offline.
🚨 The Villain? A Single SQL Query
Here’s the innocent-looking line that brought our system to its knees:
SELECT * FROM orders WHERE status = 'pending';
In our development setup, this ran instantly.
Why wouldn’t it? We only had about 500 dummy records.
But in production, the story was very different:
- Over 60,000+ orders
- Each row around 12 KB (including customer info, items, metadata, and logs)
- That’s roughly 720 MB being pulled into memory
And that one query triggered a chain reaction:
- 🧠 Memory pressure: DB tried to load hundreds of MB into RAM
- 🌐 Network choke: 700MB+ transferred across the wire = 30–40s latency
- 🔒 Connection lock: Slow queries kept DB connections occupied
- 💥 Concurrency collapse: Dozens of users triggered the same query = total overload
All because of one tiny
SELECT *.
Why SELECT * Is a Production Trap
It feels convenient — but it’s a ticking bomb under load.
Here’s why:
You fetch way more than needed: Images, logs, metadata — all pulled even if you don’t use them.
Schemas evolve silently: Add a new column later, and every query gets heavier without warning.
Indexes become less effective: The query planner can’t optimize wide rows efficiently.
Memory + bandwidth waste: Every unnecessary byte eats CPU cycles and RAM.
Concurrency death spiral: Multiply those inefficiencies by hundreds of simultaneous users = meltdown.
How We Fixed It (and What You Should Do Instead)
1. Select only what you need
SELECT id, customer_id, total_price
FROM orders
WHERE status = 'pending'
LIMIT 50 OFFSET 0;
Always be explicit. Pull only the fields you’ll actually use.
2. Paginate aggressively
Never fetch thousands of rows in a single request.
Use LIMIT + OFFSET or better — keyset pagination for large tables.
3. Cache frequent queries
If certain data is read-heavy (e.g., pending orders, popular products), store it in Redis or Memcached.
You’ll offload 70–80% of traffic from your database instantly.
4. Inspect with EXPLAIN
Before deploying, always check query plans:
EXPLAIN SELECT ...
You’ll quickly spot missing indexes or full table scans.
5. Enable slow query logs
Set up slow query logging in MySQL/Postgres.
Anything over 200ms deserves your attention.
6. Test at scale
Your dev DB with 500 records won’t expose real bottlenecks.
Clone anonymized production data or use synthetic generators.
7. Enforce query timeouts
Never allow one rogue query to monopolize resources forever.
Pre-Deployment Checklist 🧩
Before any SQL hits production, ask yourself:
- Are you selecting only the columns you need?
- Is pagination in place?
- Are your WHERE/JOIN clauses indexed?
- Have you tested on production-scale data?
- Do you have slow query logs + timeouts enabled?
Bonus tip → Cache hot queries and watch your DB load drop by 80%+.
🧠 Key Takeaway
In production, every byte matters.
That one harmless-looking SELECT * might seem fine locally...
…but at scale, it can choke your entire system.
Build and test as if you already have a million users.
That mindset will save you more than any optimization later.
Have you ever faced a similar “small bug, massive blast radius” moment?
Drop your story below — let’s learn from each other’s war stories. 💬
Top comments (9)
Great post, and a classic "rite of passage" for many developers! We had a nearly identical incident a few years back. A dashboard for internal analytics used a SELECT * on a user_events table. It worked fine for months until the table grew large enough that the query would time out, taking the dashboard down every morning when the first manager logged in.
Haha yes — a rite of passage indeed! 😅
Dashboards are especially sneaky since they start small and “just work”... until the data snowballs.
It’s wild how invisible performance debt can be until it suddenly costs uptime.
Curious — did you end up fixing it by optimizing the query, caching results, or redesigning the dashboard?
been there with a JOIN that wasn't indexed -- took down checkout for 8min during Black Friday. 2,300 orders queued up, boss wasn't happy. now I basically panic-test everything against production-sized data before shipping.
Oof, I can feel that pain. 😅
Unindexed JOINs under heavy traffic are nightmare fuel.
Testing with production-sized data honestly changes everything — it’s the only way to catch those hidden time bombs before they explode.
Glad to hear you made that part of your process!
Quite right! SELECT COUNT(*) is the only query where the asterisk is canonical and most performant, as the engine is uniquely optimized just for row counting.
100% true — great point!
COUNT(*)is the one exception where the engine is optimized internally, especially in PostgreSQL and MySQL.I should’ve mentioned that nuance in the post — thanks for highlighting it! 🙌
Particularly in PostgreSQL and MySQL: Very true! Each engine has its own optimizations and its own way of working.
Noted. Will avoid this thing. Thanks
Awesome! Glad it helped 🙂
Once you start being intentional with SELECT columns, you’ll notice how much faster your queries get — especially at scale.