The argument sounds reasonable: fewer lines of code mean fewer bugs. Simpler to review, easier to reason about, less surface area for defects. Sounds great. It's true. But it's also incomplete.
The problem starts when backend developers treat production systems like homework assignments. In a single-process app:
you control execution. You know the order. Threads might race, but at least they share the same memory and clock.
Once you have APIs talking to databases, webhooks firing at midnight, async jobs on a queue, and three replicas behind a load balancer, the failure modes multiply: connections drop, messages arrive out of order, clocks disagree, and partial failures show up at 3 AM on Tuesdays.
Trimming code doesn't make any of that go away. It just hides the complexity until something breaks.
When Minimal Code Meets Production
Consider what happens when your minimalist masterpiece meets reality:
Your service temporarily loses connection to the database for 30 seconds. Your code has no timeout logic. Requests hang. Users refresh. More requests queue up. Eventually something breaks.
Two instances process the same webhook because you thought "that probably won't happen." No idempotency key, so the charge runs twice. Your balance sheet now has an extra $50,000 in it. Your accountant is confused. Your manager is less confused.
A worker crashes mid-operation. There's no recovery mechanism. The transaction is abandoned in an inconsistent state. Your data is now in a state that violates every assumption you made about how it should look.
A retry storm after a downstream blip hammers your API because nothing backs off or deduplicates. Rate limits trip. Legitimate traffic gets dropped. You're debugging an outage caused by code that "handled errors" by logging and returning.
None of these are prevented by writing less. They're prevented by writing the boring safeguards you skipped because they looked redundant.
What Production Actually Requires
Modern backend systems need safeguards that simple applications never had to think about:
Idempotency. Every operation must be safe to retry. A payment webhook redelivered, a queue message processed twice, a client that retries on timeout—all of these need a way to recognize "already done." Operation IDs, version numbers, dedupe keys. Not glamorous. Required.
Timeouts. Requests to other services need deadlines. Without them, cascading failures happen silently and gradually consume all your resources. Your code will just sit there, waiting, like a phone call that never connects.
Compensation Logic. When a multi-step operation fails partway through, something has to undo the work already committed. You can't abandon a half-finished saga and hope nobody notices. That's more code than assuming success. People skip it anyway.
Conflict Detection. When two writers touch the same record—two API instances, a retry overlapping with the original request—you need version checks, timestamps, or optimistic locking. Pretending conflicts don't exist works until two updates land in the wrong order.
Observability. Logging, metrics, and traces that let you reconstruct what happened when something fails. At 3 AM, you'll wish this existed. When something breaks and you have no logs, you'll understand why this matters.
You can't delete these and call it simplification. You're just moving complexity from your editor into your on-call rotation.
Less Code vs. Less Noise
Kill redundant abstractions, dead logic, and speculative frameworks. That's good discipline.
Deleting retry wrappers, validation, circuit breakers, or idempotency checks because they "add noise" is a different move.
You're betting stability on dependencies you don't control. When the database hiccups, the partner API times out, or Kubernetes reschedules a pod mid-request, the system doesn't get simpler. It gets wrong.
The Test
If your app runs more than one instance, talks to other services, or processes work asynchronously, these questions will eventually matter:
- If a process dies mid-operation, can the system detect it and recover correctly?
- If a message is delayed several seconds, what actually happens?
- If two workers attempt the same operation at once, is the result deterministic or a coin flip?
If you can't answer all three with specific mechanisms—not vibes, not "we'll fix it in prod"—the codebase isn't simple. It's fragile.
Write the safeguards. Handle the failure modes. The goal isn't more lines for their own sake; it's making hidden complexity visible before production does it for you.
Top comments (0)