Launch day.
Traffic was peaking.
Dashboards were green.
Everything looked perfect… for about 10 minutes.  
Then—boom.
Our site slowed to a crawl a...
              
        
    
  For further actions, you may consider blocking this person and/or reporting abuse
    
Author brilliantly breaks down a real-world case where one "innocent" SELECT * nearly took down the entire production system. The launch day story with green dashboards and a sudden crash — it’s like a devops thriller. From euphoria to panic in 10 minutes? Classic!
What really stood out:
Gold-tier practical advice: From explicitly selecting only needed columns to pagination, Redis caching, and EXPLAIN — all crystal clear with examples. I can already picture applying this in my own project.
Pre-deployment checklist: A must-have for any team. "Test like you already have a million users" — golden rule! 📋
Humor and relatability: "Ticking bomb under load" — laughed out loud. And the bonus about dropping DB load by 80%+ with caching? Instant motivation for tomorrow’s refactoring!
This article is the perfect reminder that in production, every byte counts. That one harmless-looking SELECT * might seem fine locally... but at scale, it can choke your entire system.
Thanks for the lesson without the pain (well, almost 😉). Definitely following for more war stories!
Ever had a similar “tiny bug, massive blast radius” moment? Drop your story below! 💬 #database #webdev #productivity
Wow — really appreciate the thoughtful breakdown and encouragement 🙌
Love how you phrased it: “devops thriller” — that’s exactly what it felt like in the moment 😅
Totally agree: these kinds of issues feel tiny in dev, invisible in staging, and then become boss-summoning emergencies in production.
And yes — testing like you already have a million users is such an underrated mindset shift. The earlier we think about scale, the fewer panic-patches we end up shipping later.
Thanks again for the kind words — feedback like this fuels writing more engineering war stories!
By the way, I’m curious — what’s the most “small bug, huge blast radius” issue you’ve run into?
This contains a lot of good advice. I have a few critiques and I think there may be some more good lessons you can still learn from this experience.
orderstable shouldn't contain "customer info, items, metadata, and logs", that suggests the schema design can benefit from normalization.SELECTing specific columns doesn't reduce how much data the database reads into RAM. The database always pulls entire pages into RAM, even if you only select a few columns in one row. However, there are exceptions, and indeed your use case may have triggered one. PostgreSQL for example uses "TOAST (The Oversized-Attribute Storage Technique)" for large columns, and this can lead to some column values not being read into memory, depending on the columns you select (but the pointer to the actual value still gets read into memory).LIMIT 50alone probably would have fixed the issue. Limiting theSELECTed columns, while helpful, probably doesn't make as big of a difference. But this can all be tested. You can import the database into a local instance and write some test scripts against it to measure the effects of changes.SELECT *can still be covered by an index (if every column in the table is in the index) and allow an index-only scan. So as a binary decision, whether or not you limit the columns selected doesn't improve index performance per-se. To benefit from that optimization you have to make sure you only select columns that are in the index, and understanding "covered queries" is the key to using that optimization effectively.Thanks for the solid breakdown — really appreciate it.
Overall, thanks — this adds valuable clarity to the nuances I glossed over.
Such a relatable post a perfect reminder that works on my machine doesn’t mean ready for production. The breakdown of root cause and recovery steps is spot on practical, clear and packed with lessons every engineer should bookmark.
Thank you! 🙌
Exactly — “works on my machine” is one of the most dangerous forms of false confidence in engineering 😅
Local success ≠ production readiness, especially when data volume & traffic scale kick in.
Glad the breakdown felt practical — that’s the goal: real lessons that actually apply on day-to-day systems.
Curious — have you ever had a moment where everything worked locally but production said NOPE? 😄
Great post, and a classic "rite of passage" for many developers! We had a nearly identical incident a few years back. A dashboard for internal analytics used a SELECT * on a user_events table. It worked fine for months until the table grew large enough that the query would time out, taking the dashboard down every morning when the first manager logged in.
Haha yes — a rite of passage indeed! 😅
Dashboards are especially sneaky since they start small and “just work”... until the data snowballs.
It’s wild how invisible performance debt can be until it suddenly costs uptime.
Curious — did you end up fixing it by optimizing the query, caching results, or redesigning the dashboard?
Thanks for the top tier post, king. Kudos for mentioning keyset pagination. OFFSET is so 90s :D
Limiting the potential size of a query is a something every developer has to grok. I always ask myself "what if DB grows to 10 million rows of this?". Limiting, paging, batch-looping and, of course, targeted selecting are the ways. Plus, plentiful and smart indexing.
Thanks! 🙌 And yep — keyset pagination for the win 😄 Thinking in 10M-row scale really changes how we design queries. Smart limits + indexing = life saver.
i obsess over the slow query log for the first month or so a project is running on prod and it's one of the first things i check when performance issues arise.
i did do a writeup on some of the shell awk/sed kludgery i do to bet meaningful data out of a crowded slow log:
dev.to/gbhorwood/mysql-using-the-s...
That's awesome! 🔥 Slow query log obsession early in production is a superpower.
Thanks for sharing the link — will check it out! 🙌
been there with a JOIN that wasn't indexed -- took down checkout for 8min during Black Friday. 2,300 orders queued up, boss wasn't happy. now I basically panic-test everything against production-sized data before shipping.
Oof, I can feel that pain. 😅
Unindexed JOINs under heavy traffic are nightmare fuel.
Testing with production-sized data honestly changes everything — it’s the only way to catch those hidden time bombs before they explode.
Glad to hear you made that part of your process!
asdas
Looks like that might’ve been a typo 😄 but thanks for stopping by!
jo777. gacor
Looks like that might’ve been a typo 😄 but thanks for stopping by!
Quite right! SELECT COUNT(*) is the only query where the asterisk is canonical and most performant, as the engine is uniquely optimized just for row counting.
100% true — great point!
COUNT(*)is the one exception where the engine is optimized internally, especially in PostgreSQL and MySQL.I should’ve mentioned that nuance in the post — thanks for highlighting it! 🙌
Particularly in PostgreSQL and MySQL: Very true! Each engine has its own optimizations and its own way of working.
Noted. Will avoid this thing. Thanks
Awesome! Glad it helped 🙂
Once you start being intentional with SELECT columns, you’ll notice how much faster your queries get — especially at scale.