The argument sounds reasonable: fewer lines of code mean fewer bugs. Simpler to review, easier to reason about, less surface area for defects. Soun...
Some comments have been hidden by the post's author - find out more
For further actions, you may consider blocking this person and/or reporting abuse
It looks like the post mainly targets high volume systems and event driven systems.
I agree that simple code should not mean abandoning safeguards, but I wonder if there are people that think that way once scaling the system horizontally is not an option anymore?
That's a fair point. The examples intentionally focus on horizontally scaled and async systems because those failure modes are becoming increasingly normal in everyday backend development.
A lot of modern applications are distributed long before teams consciously think of them that way. Running multiple replicas behind a load balancer, background workers, queues, retries, webhooks, caches, autoscaling — this is becoming standard infrastructure even for relatively small products.
So this wasn't really aimed only at "massive scale" systems. The point was more that once your application runs across multiple processes, instances, or services, some of the "boring" safeguards stop being optional.
The interesting part is that modern tooling makes distribution feel deceptively invisible. You can deploy three replicas to Kubernetes in minutes, but the moment you do that, problems like duplicate execution, retries, partial failures, ordering issues, and race conditions become real whether you explicitly designed for them or not.
That's really the gap the article was trying to highlight.
It is true distribution is very easy with today's tools. But that doesn't mean when it happens the consequences should be ignored.
For me the high volume system and event driven system are two separate things. Most of the time high volume system are event driven, but event driven systems can be low volume.
The safeguards you mention have a lot of reach. For example the 30 second database connection loss should never happen. To fix that as soon as possible monitoring should be set up to bring the database back up. That is not in the scope of the application code, that is operations.
yup, agreed.
I think the key distinction is just that infra/ops and application correctness sit in different layers - both matter, but they solve different parts of the failure story.
And also fair point on event-driven vs high-volume —they’re orthogonal, but they tend to overlap in practice, which is probably why they get mixed in discussions like this.
Makes a lot of sense what you've said here. This is why most teams need to understand the amount of technical debt they are adding when going for so called "simple solutions". Sometimes, simple too can look complex or ugly, as long as this issues like race conditions and duplicate requests are solved properly.
Exactly, that’s pretty much the tradeoff.
“Simple” at the surface can still carry hidden complexity if those failure modes aren’t handled explicitly. And yeah, sometimes the correct solution looks a bit ugly precisely because it’s accounting for things like retries, races, and duplication.
The real technical debt usually isn’t in the extra safeguards themselves, it’s in pretending those problems won’t exist.
Wow, I just started exploring more into backend and I'm not expecting to learn great real-world topics here. But yeah, I agree. Less code is often good but we must also code what is required and what makes our codebase more robust. Thanks for this!
uh huh, “less code” is only good when it’s removing noise, not when it’s removing safeguards.
Once you start dealing with retries, failures, and multiple instances, robustness becomes part of the design, not an optional extra. And sometimes that naturally makes the code a bit heavier, even if the system is actually better engineered.
im glad it helped — this is one of those things that only really clicks once you’ve seen a few real production failures.
Minimal code reduces syntax. Stable systems survive reality.
Well said. Syntax is cheap. Reality has retries, timeouts, and partial failures at when ur asleep.
So true. There's no guarantee that short code is always stable. It's not a matter of length of code but stability itself.
I think readability also comes before the length of code just in case issues happen. Some people(even myself sometimes) think short code is more readable than long code but it's not always true.
Thank you for the insight.
Exactly this. Readability > length every time. Those "extra" lines often tell future-you what could go wrong. Appreciate you 🙏
Years ago I remember being very proud of myself for writing a recursive function which performed some complicated data sorting. It replaced maybe 400 lines of code with 30 lines, but unfortunately for devs to make changes to it, they had to spend frustrating time trying to figure out what was going on. Any savings in lines of code was lost in time and anger 😅
Yeah, I’ve definitely seen this too 😄
Less code feels great at first, but if nobody else can safely touch it later, the “win” disappears pretty fast.
I think that you’ve made a really important point. I think that this topic of developers removing or simply not writing code to handle edge cases (like two users doing the same operation at the same time, messaging delays, operations happening multiple times, like your post said) is a large issue. I think there are large and concerning similarities between what you talked about in your post and the recent massive movement to rewrite as many things as you can in Rust. Often times, developers remove edge cases just to rewrite something in Rust, and many times just completely AI generate the rewrite without time for human review (like the bun.js rewrite). In your opinion, do you think the strictness of the rustc compiler outweighs the often times decades of edge cases in large distributed programs, and do you think it outweighs it enough to justify total rewrites?
Man, that's a thoughtful take. The Rust angle is interesting.
I think the compiler catches memory bugs, not production failure modes. Rust won't save you from idempotency, clock skew, or a database going away for 30 seconds. Those are design problems, not language problems.
Rewrites are tempting because greenfield feels clean. But like you said — you're often throwing out years of edge case fixes that were paid for in pager duty time.
The Bun example is spot on. AI-generated rewrites with no human review? That's just moving the complexity somewhere else, not removing it.
So no — rustc's strictness doesn't justify most rewrites of large distributed systems. Would much rather see teams invest in observability and proper failure handling in whatever language they already have.
Appreciate you adding that layer to the discussion 🙏
Thanks for replying to my comment! When code is being blindly regenerated with AI without human review, the obervability and error handling that existed before can be removed during the rewrite.
I think that before a group or team goes ahead and rewrites everything in Rust, they usually first try to incorporate Rust in less dramatic ways than a full rewrite. The Linux Kernel is definitely in this stage now, and I hope that if they start to encourage people to rewrite things in Rust, they at least ensure that no edge cases are being removed, or simply forgotten. As a Linux user myself, this is very important for me.