surajrkhonde

Posted on Jul 4

The Database Architect Teaches His Nephew — Part 2: How Small Things Quietly Break Big Databases

#architecture #beginners #database #tutorial

Same uncle, same nephew. Part 1 was about designing and guarding data. Part 2 is about the small, easy-to-miss mistakes that quietly destroy a database over time — explained simply, no heavy jargon, just plain talk.

Part 0: The Danger Isn't the Big Mistake — It's the Small One

👦 Nephew: Uncle, in Part 1 we talked about designing things properly from the start. But I keep hearing stories of teams with a "properly designed" database still having disasters. How?

👨‍🦳 Uncle: Because most database disasters aren't caused by one big obvious mistake. They're caused by ten small, boring, easy-to-miss habits, each one harmless on its own, that quietly pile up until one day something breaks — and nobody can point to a single cause, because there wasn't one big cause. There were ten small ones. Today we hunt those down, one at a time.

Part 1: Two People Editing the Same Thing at the Same Time

👦 Nephew: Let's start somewhere real. What's a small thing that breaks databases that most people don't even think about?

👨‍🦳 Uncle: Two people trying to do the same thing, at the exact same moment.

Imagine an online movie ticket booking site. There's exactly one seat left. Two people click "Book" at almost the exact same second. If your system isn't careful, both bookings can succeed — because both requests checked "is this seat free?" at the same instant, both saw "yes," and both went ahead and booked it. Now you've sold one seat to two people, and someone's going to have a very bad evening at the cinema.

👦 Nephew: But isn't that rare? Two people clicking at the exact same second?

👨‍🦳 Uncle: It feels rare when you're testing alone at your desk. It happens constantly the moment you have real traffic — flash sales, popular concerts, the last item in stock. This kind of bug is invisible in testing and only shows up under real pressure, which is exactly why it's dangerous — you won't catch it until it's already caused a problem for a real user.

How we prevent it, in simple terms: we tell the database, "when you're checking and updating something important, don't let anyone else touch it until you're done." It's like a fitting room in a clothing store — while you're inside trying clothes on, the door is locked, and nobody else can walk in and use it until you're out. The database has a version of this lock built in — you just have to remember to actually use it for anything that's checked and then changed, like seat counts, stock counts, or account balances.

A simple rule to remember: anytime your code does "check something, then change it based on what you saw," ask yourself — what happens if two people did this at the exact same moment? If the answer scares you, that's exactly the place you need this kind of protection.

Part 2: Changing a Table That Already Has Millions of Rows in It

👦 Nephew: What about when we need to change something in a table that already has a lot of real data in it? Like adding a new column?

👨‍🦳 Uncle: This is one of those things that feels completely harmless in a small test database and becomes genuinely dangerous once real data piles up.

Imagine your table has 10 rows. Adding a new column takes less than a second, nobody even notices. Now imagine that same table has 50 million rows. That same "simple" change can take several minutes — and during those minutes, on some databases, the entire table can be locked, meaning nobody can read or write to it. Your whole app effectively freezes for those minutes, for real users, in production.

👦 Nephew: That's terrifying. How do people avoid this?

👨‍🦳 Uncle: By making changes in small, safe steps instead of one big risky jump. Here's the simple version of how it's done properly:

Add the new thing without removing the old thing yet. Both exist together for a while.
Quietly fill in the new thing in the background, a little at a time, without disturbing anyone using the app.
Only once everything's using the new thing safely, remove the old one.

Think of it like renovating a shop while it's still open for business. You don't knock down the whole front wall in one go and hope customers wait outside. You build the new section next to the old one, quietly move things over, and only then take the old section away — customers never notice anything happened.

The simple rule: never make a big, risky, one-shot change to a table that's already serving real users. Break it into small, safe, reversible steps instead.

Part 3: "I Updated It, But I Still See the Old Data" — Not a Bug, a Delay

👦 Nephew: Uncle, sometimes a user updates their profile, refreshes the page, and still sees the old info for a second. Is that a bug?

👨‍🦳 Uncle: Usually not a bug — it's a small delay that's easy to misunderstand if you don't know what's happening underneath.

Many production databases keep one "main" copy that handles all the changes (writes), and a few "helper" copies (replicas) that just handle reading, to spread out the load. When something updates, it updates the main copy first, and the helper copies catch up a tiny bit later — usually a fraction of a second, sometimes a little longer under heavy load.

If a user's very next read happens to land on a helper copy that hasn't caught up yet, they briefly see the old value. It's not lost, it's not corrupted — it just hasn't arrived there yet.

👦 Nephew: So how do we stop users from panicking and thinking their update "didn't work"?

👨‍🦳 Uncle: For anything sensitive to this — like a user immediately seeing their own recent update — we can specifically say "right after this person changes something, show them the result straight from the main copy, not a helper copy, just this once." It's like calling the main office directly for the latest news, instead of asking a branch office that hasn't gotten today's memo yet. You only need to do this for the few moments it actually matters — most reads can still safely use the helper copies, since they're only a blink of an eye behind.

Part 4: The Page That Secretly Runs 40 Small Questions Instead of One

👦 Nephew: We talked about this a bit before — one query turning into a hundred small ones. Is there a version of this that hides somewhere people don't expect?

👨‍🦳 Uncle: Yes, and it hides best in exactly the places nobody's watching closely — admin dashboards, internal reports, "just for the team" pages. Nobody optimizes those because "it's just internal," and that's exactly why they quietly become a problem.

Imagine a dashboard showing 20 products, and for each product, it separately asks the database "how many times was this sold?", "what's the average rating?", "how many are left in stock?" — three extra questions, per product, per page load. Twenty products means 60 extra questions, every single time someone opens that one page. Multiply that by however many people check that dashboard through the day, and you've quietly created a real load on your database — for a page most people considered "not important."

👦 Nephew: So the fix is the same idea as before — ask fewer, smarter questions?

👨‍🦳 Uncle: Exactly the same discipline. Instead of asking the same small question 20 times, ask one well-planned question that gets all 20 answers at once. And here's the important lesson — "it's just an internal page" is not an excuse to skip this. Internal pages have a habit of quietly growing more popular, more automated, and more frequently refreshed than anyone expected on day one — and by the time someone notices it's slow, it's already been silently straining the database for months.

Simple rule: any page that loops through a list and asks a question per item is a page worth double-checking, no matter how unimportant it seems today.

Part 5: Deleting Something — Softly or For Real?

👦 Nephew: When a user deletes something — their account, a post — do we actually remove it from the database?

👨‍🦳 Uncle: Good question, and there are two honest answers, depending on the situation.

A "soft delete" means we don't actually remove the row — we just mark it as deleted = true (or similar) and quietly hide it from normal views. The data is still physically there, just no longer shown.

A "hard delete" means the row is actually, permanently gone.

👦 Nephew: Why would we ever choose the soft version? Isn't "actually delete it" what the user asked for?

👨‍🦳 Uncle: Soft deletes are genuinely useful for everyday accidents — someone deletes something by mistake, support wants to "undo" it, or you want to understand later why something disappeared. It's a safety net.

But here's the catch nobody warns juniors about early enough: once you start soft-deleting things, every single query that reads from that table now has to remember to say "and don't show me the deleted ones." Forget that, even once, in even one query — and deleted things start quietly reappearing somewhere they shouldn't, like a user's old, deleted post showing up again in a report or a search result. It's an easy thing to forget, and a genuinely embarrassing thing to have happen in front of a user.

And there's a second, more serious catch: sometimes a user genuinely wants their data properly, permanently gone — for real privacy reasons, not just "hidden from the app." A soft delete alone doesn't satisfy that. If someone asks for their information to be truly removed, "hidden but still sitting in our database" isn't good enough — that needs an actual, real deletion process, on a proper schedule, not just a flag flipped to true.

Simple rule: use soft deletes for everyday, reversible "oops" moments. But have a real, separate process for permanently removing data when someone genuinely needs it gone — don't quietly assume the soft flag covers that need too.

Part 6: One Database, Many Customers — Making Sure Nobody Sees Someone Else's Data

👦 Nephew: What about apps where many different companies use the same product — like a tool that many businesses sign up for? How do we make sure Company A never accidentally sees Company B's data?

👨‍🦳 Uncle: This is one of the scariest small mistakes to make, because when it goes wrong, it doesn't just annoy one user — it can leak one company's private information to a completely different company, which is a very serious kind of mistake.

The usual setup: every single table has a column like companyId, and every single query is supposed to filter by "only show me rows belonging to this company." That sentence — "every single query is supposed to filter" — is exactly where the danger lives. It only takes one query somewhere in the codebase forgetting that one filter, and suddenly a user from Company A can see rows that actually belong to Company B.

👦 Nephew: That sounds like something that could happen way too easily by accident.

👨‍🦳 Uncle: It can, and it has, at real companies. Which is exactly why serious systems don't rely purely on "remembering to add the filter every time" — they add a safety net at the database level itself, so that even if a developer forgets, the database quietly refuses to hand back rows that don't belong to the requesting company, automatically, no matter which query asked. Think of it like a hotel key card — even if a staff member accidentally tries the wrong room number, the card physically won't open a door that isn't theirs. The protection doesn't depend on the staff member remembering correctly every single time — it's built into the lock itself.

Simple rule: for anything where different customers' data lives in the same database, don't just trust "we'll remember to filter it correctly every time." Put a real safety net at the database level too, so a forgotten filter is a non-event instead of a leak.

Part 7: Watching the Database Before It Becomes a Problem

👦 Nephew: Last one — how do we actually catch these small problems early, before they turn into a real incident?

👨‍🦳 Uncle: By actually looking, regularly, on purpose — not waiting for something to break and only then investigating. Two simple habits go a long way.

First — keep an eye on your slow questions. Every database can tell you, if you ask it, "which questions took the longest to answer recently?" Checking this list every so often — not just during an emergency — lets you catch a query quietly getting slower and slower over weeks, long before it becomes the thing that finally brings a page down during a busy afternoon.

Second — when you're not sure why a specific question is slow, ask the database to explain itself. Most databases have a way to say "don't just answer this question — tell me exactly how you're planning to go about answering it." It'll tell you things like "I'm going to look through every single row one by one" (slow, usually means a missing index) versus "I'm going to jump straight to the right rows" (fast, usually means a helpful index is being used). This turns "the query feels slow, not sure why" into a clear, specific answer you can act on.

👦 Nephew: So it's the same idea as our earlier debugging conversation — measure first, don't just guess?

👨‍🦳 Uncle: Exactly the same discipline, aimed specifically at the database this time. Checking in on your slow questions regularly, and asking the database to explain its own slow answers, turns "something feels off with the database" from a vague, scary feeling into a specific, fixable fact — caught quietly, early, before it ever becomes an incident somebody has to explain to the whole team.

Uncle's Closing Words

👨‍🦳 Uncle: Notice something about everything we discussed today — not one of these was a big, dramatic mistake. Two people clicking at the same moment. A schema change done in one risky step instead of small safe ones. A read landing on a copy that's a second behind. A dashboard quietly asking too many small questions. A forgotten filter in one query out of hundreds. Every single one of these looks completely harmless on its own, on a quiet Tuesday, with no one watching.

👦 Nephew: So the real skill isn't avoiding one big mistake — it's staying alert to a hundred small, boring ones?

👨‍🦳 Uncle: That's the whole lesson. A database rarely breaks because of one dramatic decision. It breaks because ten small, forgettable habits were each allowed to slide, quietly, one at a time, until the day they all mattered at once.

End of chat. Go check your slowest query today — before it decides to introduce itself to you at 2 AM.

DEV Community