DEV Community

⚠️ The SQL Query That Nearly Crashed Our Production Server

Chirag Patel on October 28, 2025

Launch day. Traffic was peaking. Dashboards were green. Everything looked perfect… for about 10 minutes. Then—boom. Our site slowed to a crawl a...

Read full post

Igor Nosatov • Oct 28

Author brilliantly breaks down a real-world case where one "innocent" SELECT * nearly took down the entire production system. The launch day story with green dashboards and a sudden crash — it’s like a devops thriller. From euphoria to panic in 10 minutes? Classic!
What really stood out:

Gold-tier practical advice: From explicitly selecting only needed columns to pagination, Redis caching, and EXPLAIN — all crystal clear with examples. I can already picture applying this in my own project.
Pre-deployment checklist: A must-have for any team. "Test like you already have a million users" — golden rule! 📋
Humor and relatability: "Ticking bomb under load" — laughed out loud. And the bonus about dropping DB load by 80%+ with caching? Instant motivation for tomorrow’s refactoring!

This article is the perfect reminder that in production, every byte counts. That one harmless-looking SELECT * might seem fine locally... but at scale, it can choke your entire system.
Thanks for the lesson without the pain (well, almost 😉). Definitely following for more war stories!
Ever had a similar “tiny bug, massive blast radius” moment? Drop your story below! 💬 #database #webdev #productivity

Chirag Patel • Oct 29

Wow — really appreciate the thoughtful breakdown and encouragement 🙌
Love how you phrased it: “devops thriller” — that’s exactly what it felt like in the moment 😅

Totally agree: these kinds of issues feel tiny in dev, invisible in staging, and then become boss-summoning emergencies in production.

And yes — testing like you already have a million users is such an underrated mindset shift. The earlier we think about scale, the fewer panic-patches we end up shipping later.

Thanks again for the kind words — feedback like this fuels writing more engineering war stories!

By the way, I’m curious — what’s the most “small bug, huge blast radius” issue you’ve run into?

Randall • Nov 1 • Edited

This contains a lot of good advice. I have a few critiques and I think there may be some more good lessons you can still learn from this experience.

An orders table shouldn't contain "customer info, items, metadata, and logs", that suggests the schema design can benefit from normalization.
Generally speaking, for row-oriented databases, SELECTing specific columns doesn't reduce how much data the database reads into RAM. The database always pulls entire pages into RAM, even if you only select a few columns in one row. However, there are exceptions, and indeed your use case may have triggered one. PostgreSQL for example uses "TOAST (The Oversized-Attribute Storage Technique)" for large columns, and this can lead to some column values not being read into memory, depending on the columns you select (but the pointer to the actual value still gets read into memory).
I think what really killed you here was trying to pull 60k rows across the network. The LIMIT 50 alone probably would have fixed the issue. Limiting the SELECTed columns, while helpful, probably doesn't make as big of a difference. But this can all be tested. You can import the database into a local instance and write some test scripts against it to measure the effects of changes.
About "indexes become less effective", this deserves some elaboration. The major factor here will be whether or not the query is covered by the index. A covered query will allow a fast index-only scan. But a SELECT * can still be covered by an index (if every column in the table is in the index) and allow an index-only scan. So as a binary decision, whether or not you limit the columns selected doesn't improve index performance per-se. To benefit from that optimization you have to make sure you only select columns that are in the index, and understanding "covered queries" is the key to using that optimization effectively.

Chirag Patel • Nov 3

Thanks for the solid breakdown — really appreciate it.

Agreed on normalization; that was definitely a lesson learned.
Good call on row-store behavior — my point was based on TOAST-style cases, but you're right that it's not generally true.
Fair point on the network cost being the real bottleneck; limiting rows was the biggest win.
Index coverage explanation was helpful — I oversimplified that part.

Overall, thanks — this adds valuable clarity to the nuances I glossed over.

Neurolov AI • Oct 29

Such a relatable post a perfect reminder that works on my machine doesn’t mean ready for production. The breakdown of root cause and recovery steps is spot on practical, clear and packed with lessons every engineer should bookmark.

Chirag Patel • Oct 30

Thank you! 🙌

Exactly — “works on my machine” is one of the most dangerous forms of false confidence in engineering 😅
Local success ≠ production readiness, especially when data volume & traffic scale kick in.

Glad the breakdown felt practical — that’s the goal: real lessons that actually apply on day-to-day systems.

Curious — have you ever had a moment where everything worked locally but production said NOPE? 😄

Hashbyt • Oct 28

Great post, and a classic "rite of passage" for many developers! We had a nearly identical incident a few years back. A dashboard for internal analytics used a SELECT * on a user_events table. It worked fine for months until the table grew large enough that the query would time out, taking the dashboard down every morning when the first manager logged in.

Chirag Patel • Oct 28

Haha yes — a rite of passage indeed! 😅
Dashboards are especially sneaky since they start small and “just work”... until the data snowballs.
It’s wild how invisible performance debt can be until it suddenly costs uptime.
Curious — did you end up fixing it by optimizing the query, caching results, or redesigning the dashboard?

Augusts Bautra • Oct 30

Thanks for the top tier post, king. Kudos for mentioning keyset pagination. OFFSET is so 90s :D

Limiting the potential size of a query is a something every developer has to grok. I always ask myself "what if DB grows to 10 million rows of this?". Limiting, paging, batch-looping and, of course, targeted selecting are the ways. Plus, plentiful and smart indexing.

Chirag Patel • Oct 30

Thanks! 🙌 And yep — keyset pagination for the win 😄 Thinking in 10M-row scale really changes how we design queries. Smart limits + indexing = life saver.

grant horwood • Oct 29

i obsess over the slow query log for the first month or so a project is running on prod and it's one of the first things i check when performance issues arise.

i did do a writeup on some of the shell awk/sed kludgery i do to bet meaningful data out of a crowded slow log:
dev.to/gbhorwood/mysql-using-the-s...

Chirag Patel • Oct 30

That's awesome! 🔥 Slow query log obsession early in production is a superpower.
Thanks for sharing the link — will check it out! 🙌

Alex Chen • Oct 28

been there with a JOIN that wasn't indexed -- took down checkout for 8min during Black Friday. 2,300 orders queued up, boss wasn't happy. now I basically panic-test everything against production-sized data before shipping.

Chirag Patel • Oct 28

Oof, I can feel that pain. 😅
Unindexed JOINs under heavy traffic are nightmare fuel.
Testing with production-sized data honestly changes everything — it’s the only way to catch those hidden time bombs before they explode.
Glad to hear you made that part of your process!

bartue-dev • Oct 29

asdas

Chirag Patel • Oct 30

Looks like that might’ve been a typo 😄 but thanks for stopping by!

lulu rahma • Nov 1

jo777. gacor

Chirag Patel • Nov 3

Looks like that might’ve been a typo 😄 but thanks for stopping by!

Pascal CESCATO • Oct 28

Quite right! SELECT COUNT(*) is the only query where the asterisk is canonical and most performant, as the engine is uniquely optimized just for row counting.

Chirag Patel • Oct 28

100% true — great point!
COUNT(*) is the one exception where the engine is optimized internally, especially in PostgreSQL and MySQL.
I should’ve mentioned that nuance in the post — thanks for highlighting it! 🙌