DEV Community: Nahwin Rajan

The Real Cost of Technical Debt: How One Shortcut Became a $2M Problem

Nahwin Rajan — Sun, 28 Jun 2026 02:30:00 +0000

Originally published at spectredev.xyz. Cross-posted here for the Dev.to community.

Technical debt costs more than slow deployments. See how one bad architectural decision compounded into $2M in losses — and what founders can do about it.

The shortcut made complete sense at the time.

A fintech startup in Southeast Asia needed to launch their lending product fast. Investors were watching. A competitor had just announced a similar feature. So the engineering team hard-coded the interest rate calculation logic directly into the API layer no separate service, no abstraction, no configuration table. Just: here's the formula, ship it.

It worked. They launched on time. Investors were happy. Growth followed.

Eighteen months later, that single decision had contributed to over $2M in combined losses, remediation costs, and foregone revenue. Not because the formula was wrong. Because of where it lived.

Why Technical Debt Is a Finance Problem, Not Just a Technology Problem

Most founders think of technical debt in terms of developer frustration. Slow deployments. Long sprint cycles. Engineers who look slightly haunted every time someone asks for a new feature. Real enough but incomplete.

The actual cost of technical debt has four components that rarely all appear in the same conversation.

The first is direct remediation cost: the engineering time to fix what's broken or badly designed. This is the one everyone counts.

The second is velocity tax: the ongoing slowdown in feature delivery caused by navigating a complex, fragile codebase. Every sprint, a portion of your engineering capacity is consumed not by building new things but by managing the consequences of old decisions. Most companies underestimate this by a factor of two or three.

The third is incident cost. When technical debt contributes to a production outage or a data error, the bill includes engineering time to diagnose and fix, customer support volume, potential refunds or credits, and sometimes regulatory exposure depending on your industry.

The fourth and the one that's hardest to quantify is opportunity cost. Features you didn't build. Markets you couldn't enter. Partnerships you couldn't execute because the integration would have required touching parts of the system nobody wanted to touch. This is the silent cost. It doesn't show up in any incident report.

The fintech example above hit all four. Hard.

What Actually Happened: The Interest Rate Story

When the team hard-coded the interest rate logic into the API layer, they also without realising it embedded it into seven downstream processes that called the same endpoint: loan origination, repayment schedules, early settlement calculations, late payment penalties, regulatory reporting, customer-facing statements, and an internal dashboard used by the credit team.

None of these integrations were documented. They had grown organically as the product expanded.

Fourteen months in, the business needed to change the interest rate model. They were moving from a flat rate to a tiered structure based on borrower risk profile. A product change, not an engineering one the kind of thing a non-technical founder would reasonably expect to take a week or two.

It took three months. Because the logic was embedded in the API layer, changing it meant auditing every downstream process to understand what it expected, rewriting the calculation in multiple places, and running parallel testing across seven different flows to ensure consistency. Two of those flows had never had automated tests written for them.

During that three months, the product team couldn't launch the new rate model. The business was offering less competitive rates than it could have, losing loan applications to competitors who had already made this move. The direct engineering cost of the remediation was around $180,000 in team time. The foregone revenue from the delayed product launch modelled against the pipeline of applications that didn't convert was estimated at over $1.5M.

The compliance team also flagged that two historical regulatory reports had used an inconsistent version of the calculation. Resolving that took external legal review. Add another $90,000.

Total: comfortably over $2M from one architectural shortcut that saved, at the time, perhaps two weeks of engineering effort.

The Compounding Mechanism: Why Debt Gets Expensive Over Time

The dangerous thing about technical debt isn't the original shortcut. It's what grows on top of it.

Every system built on a flawed foundation inherits that flaw. Every engineer who joins the team and learns "this is how we do it" normalises the pattern. Every downstream integration that assumes the current architecture is correct becomes something that will need to be untangled when the architecture changes.

This is the compounding mechanism. The interest rate example didn't just cost money to fix it cost money because fixing it required touching seven systems that had been built on the assumption that the broken thing was correct. The technical debt had spread.

We've seen the same pattern in e-commerce platforms where session management logic was embedded in the front-end rather than the API, which seemed fine until they needed a mobile app and had to rebuild authentication from scratch. And in logistics systems where route optimisation was baked into the database as a stored procedure invisible to the application layer, unmaintainable, and impossible to test until a regulatory change required the algorithm to be auditable.

The common thread: a decision made under pressure, never revisited, that quietly became load-bearing infrastructure.

How to run a technical debt audit a guide for non-engineer founders

The Decisions Most Likely to Become Expensive

Not all shortcuts are equal. Some technical debt stays cheap forever a slightly inelegant function that nobody needs to change, a commented-out block that just lives there doing nothing. The debt that compounds is the debt embedded in high-traffic, frequently-changed parts of the system.

Business logic in the wrong layer. Calculation logic, pricing rules, eligibility criteria anything that will change as the business evolves does not belong hard-coded in API endpoints, database triggers, or front-end components. When business rules live in the wrong layer, every business change requires an engineering excavation.

No separation between services with different scaling needs. A database that handles both transactional writes and heavy analytics reads will eventually become a bottleneck for both. Separating them later when traffic is already high is expensive and risky. Designing for separation early costs almost nothing by comparison. What is database sharding and when does your startup actually need it

Authentication and session logic that isn't centralised. This starts as a convenience and ends as a security audit nightmare. When session handling is replicated across multiple parts of the codebase, a vulnerability in one place doesn't mean fixing one thing it means finding every place the pattern was repeated.

Integrations with no abstraction layer. When your application calls a third-party payment provider, logistics API, or data service directly from multiple places in the codebase, changing that provider later requires a codebase-wide search and replace. An abstraction layer even a thin one means the change happens in one place.

These aren't exotic architectural decisions. They're standard patterns that cost very little to implement correctly at the start and very much to retrofit later.

How to Estimate the Real Cost of Your Own Technical Debt

You don't need a consulting firm to put a number on this. The following is a rough framework your team can run in a day.

Start with velocity tax. Ask your engineering lead: what percentage of each sprint is consumed by work that's a direct consequence of the current architecture? Workarounds, investigations, fixing regressions caused by changes elsewhere in the system. If the honest answer is 30%, that's 30% of your engineering payroll being paid to manage debt rather than create value. For a team of eight engineers at a fully-loaded cost of $15,000 per head per month, that's $36,000 per month $432,000 per year in velocity tax alone.

Then model your incident cost. How many production incidents in the last six months were caused or worsened by known architectural weaknesses? What did each one cost in engineering time, customer support, and any commercial consequences? Average it, annualise it.

Then ask the harder question: what have you not built? Which features are sitting in the backlog because the team says "the current architecture doesn't support it well"? Work with your product team to estimate the revenue impact of those delays. This is where the real number usually lives.

Add those three figures together. That's the annual run rate of your technical debt. Compare it to the estimated cost of remediation which your engineering lead or an external architect can scope. In most cases, the remediation pays back within twelve to eighteen months.

How to rewrite your software system without stopping your business

FAQ

Q: Is all technical debt bad? Should we aim for zero?
A: No, and no. Some technical debt is a rational business decision taking a shortcut to hit a launch date, knowing you'll clean it up later, is fine if you actually track the debt and pay it back. The problem isn't debt itself; it's untracked, unplanned debt that compounds silently. A healthy engineering team carries some debt intentionally and services it regularly, the same way a healthy business carries some financial debt to fund growth.

Q: How do I convince my board or investors that paying down technical debt is worth the investment?
A: Don't frame it as a technology investment. Frame it as a risk and velocity problem. Show them the velocity tax number what percentage of engineering capacity is being consumed by debt rather than new product. Show them the incident history. Show them the features sitting in the backlog because the architecture can't support them. Investors understand ROI. "We're spending $400K per year managing architectural consequences, and a $600K remediation programme eliminates that cost within 18 months" is a business case, not a technology request.

Q: At what stage should a startup start taking technical debt seriously?
A: Earlier than most do. The common belief is that technical debt is a "later problem" something to worry about after product-market fit. The truth is that the decisions made in the first twelve months of building are the ones that become load-bearing. By the time you have product-market fit and real scale, the cost of changing foundational decisions is already high. You don't need to be perfect at the start. But you do need to be intentional about which shortcuts you're taking and why.

Q: Can technical debt cause a startup to fail outright?
A: Yes. Not often directly, but indirectly it happens more than people acknowledge. The mechanism is usually: technical debt slows velocity, slower velocity means the product falls behind competitors, the team becomes demoralised as engineers leave and new hires struggle to get up to speed, and eventually the business can't respond fast enough to market changes. It's a slow failure, not a dramatic one. Which makes it easy to ignore until it's too late.

Q: How often should we review and actively manage technical debt?
A: Build it into your regular rhythm. A lightweight quarterly review what debt did we take on intentionally this quarter, what are we planning to pay back next quarter keeps it visible without making it a crisis. The worst pattern is ignoring it for eighteen months and then doing a panic audit when something breaks. By that point, the debt has compounded and the options are more limited and more expensive.

Technical debt is borrowing from your future engineering capacity to pay for speed today. Like any debt, that's sometimes the right call. The problem is when it's invisible when nobody in the organisation knows what's owed, to whom, or at what interest rate.

The $2M story above wasn't caused by negligent engineers. It was caused by a decision that was never surfaced, tracked, or revisited. The shortcut got normalised, got built on top of, and got expensive. That's the pattern. And it plays out in some form at almost every company that scales fast without pausing to take stock of what the speed cost them.

The audit is the first step. If you haven't done one, that's where to start.

How to Rewrite Your Software System Without Stopping Your Business

Nahwin Rajan — Sun, 21 Jun 2026 02:30:00 +0000

Originally published at spectredev.xyz. Cross-posted here for the Dev.to community.

Planning a system rewrite? Learn the strategies that let you modernise your software without halting operations, losing data, or burning your team out. (159 chars)

At some point, the question stops being "should we rewrite this?" and becomes "how do we do it without the business dying in the process?"

That's the hard part. A greenfield rewrite sounds clean on a whiteboard. In practice, you're replacing the engine of a plane that's already in the air. Customers are still signing up. Revenue is still flowing. Your team is expected to keep shipping product while simultaneously dismantling and rebuilding the thing that powers it.

Most software rewrites that fail don't fail because of bad engineering. They fail because of bad strategy no clear boundary between old and new, no plan for the transition period, and no honest accounting of how long it will actually take. This post covers how to do it without stopping your business.

Why the "Big Bang" Rewrite Almost Always Goes Wrong

The instinct is understandable. The old system is a mess. Starting fresh sounds like relief. So the team scopes out a full rewrite, estimates six months, and gets executive sign-off.

Twelve months later, the rewrite isn't done, the old system has continued accumulating bugs that nobody's fixing, and the team is exhausted. This is not a hypothetical it's the most common rewrite story in the industry. Netscape famously did this in 2000, spent three years on it, and nearly destroyed the company. The lesson didn't stick.

The core problem with big bang rewrites is that the old system is a moving target. While your team builds the new one, the business keeps adding requirements to the old one. By the time the new system is "done," it's already behind. And you haven't had a single day of reduced risk in the interim you've had a year of double the operational surface area and half the engineering attention on each.

The alternative isn't to accept the old system forever. It's to replace it incrementally, in a way that lets you keep operating throughout.

The Strangler Fig Pattern: The Right Mental Model

There's a pattern in software architecture called the Strangler Fig. It's named after a tropical tree that grows around a host tree over decades, gradually replacing it until one day the host is gone and the strangler fig is standing on its own.

Applied to a software rewrite, it means this: you don't replace the old system all at once. You build the new system alongside it, migrate one piece of functionality at a time, and route traffic gradually from old to new. The old system slowly shrinks. The new one grows. At some point with much less drama than a big bang the old system handles nothing and can be decommissioned.

This approach works because it forces you to make decisions incrementally. Each migration is a discrete project with a clear scope, a clear test, and a clear rollback plan. You're never in a position where the new system has to be 100% complete before you get any value from it.

It also works because it keeps the business visible throughout. Users might not notice anything changing. Revenue keeps flowing. Engineers can still ship features on the new platform, as each piece migrates.

How to run a technical debt audit a guide for non-engineer founders

How to Sequence the Migration

The sequence matters more than most people realise. Get it wrong and you'll spend the first six months on the hardest, most interdependent parts of the system the ones that can't be migrated without touching everything else. You'll burn momentum and trust before you've shipped anything.

Start at the edges, not the core. The edges of your system are the parts with the fewest dependencies: background jobs, reporting pipelines, notification services, internal admin tools. These can often be migrated without touching the core application at all. They're lower risk, faster to move, and they give your team early wins that build confidence in the approach.

Identify your seams. A seam is a natural boundary in the system a place where one part of the software talks to another through a clean interface. These are your migration boundaries. If your payment processing already talks to the rest of the application through a well-defined API, it can be replaced independently. If everything is tangled together with no clear separation, you need to create the seam before you can migrate anything.

Tackle the data layer carefully. This is where rewrites most often go wrong. Moving application logic is relatively forgiving you can test it, run both versions in parallel, compare outputs. Moving data is not forgiving. A mistake in a data migration can mean lost transactions, corrupted records, or a state that can't be easily recovered.

For anything touching financial data, order history, or user accounts, the approach should be: write to both databases in parallel during the transition, validate consistency continuously, and only cut over reads once you're confident the new store is correct. It's slower. It's also the only safe way to do it.

Plan your traffic routing. As each component migrates, you need a way to control which traffic goes to which system. This is typically done with a feature flag or a routing layer at the API gateway level. It lets you send 1% of traffic to the new system, watch it, expand to 10%, watch it, and so on. It also gives you an instant rollback path if something goes wrong, you flip the flag, not the infrastructure.

The Staffing Trap Most Companies Fall Into

Here's a decision that will determine whether your rewrite succeeds or fails: do you use the same team that built the old system, or do you bring in people who will build the new one?

The honest answer is: you need both, used carefully.

The engineers who built the old system carry irreplaceable knowledge. They know why certain decisions were made. They know which parts of the system are actually stable and which ones are held together with intent and luck. Without them, the new team will repeat old mistakes or, worse, accidentally break things they didn't know existed.

But those same engineers are often the most resistant to the rewrite not out of ego, but because they understand the complexity better than anyone. They know how long things will actually take.

The pattern that works: keep your existing senior engineers as architects and domain experts. Let them define the interfaces, review the new system's design, and own the migration sequencing. Bring in additional capacity either new hires or an external team to build against those interfaces. This way, knowledge is transferred in the process of building, not lost.

What doesn't work: treating the rewrite as a separate project, staffing it with a parallel team that's never allowed to talk to the engineers who know the system, and calling it done when the new platform passes a test suite written by people who don't fully understand what the old system does.

How to build a backend that scales from 100 to 10 million users

A Concrete Example: Migrating a Monolithic Order System

A logistics platform we worked with had a classic problem. Their monolithic backend handled everything order intake, routing, driver assignment, status updates, invoicing in a single Rails application on a single database. It had been built fast in the early days and worked well until scale hit. At around 50,000 orders per day, the database started struggling. Deployments required full downtime windows. A bug in the invoicing logic once took down order routing.

They couldn't stop. Orders were coming in around the clock.

The migration started with invoicing the most isolated component, with clear inputs and outputs. We built a new invoicing service, deployed it alongside the monolith, and ran both in parallel for four weeks, comparing outputs on every invoice. When confidence was high, we cut the monolith's invoicing logic to read-only and switched live traffic to the new service. The monolith didn't notice. Customers didn't notice. But the team had their first working piece of the new architecture in production.

From there: driver assignment, then status updates, then order routing. Each migration took four to eight weeks. The core order intake the most complex, most interdependent part was last. By the time they got there, the team had run this process four times and were genuinely good at it. The final migration was the smoothest of all.

Total timeline: fourteen months. During that entire period, the business never had a planned downtime window. Order volumes grew 60% while the migration was underway. And when it was done, they had a system they could actually operate at scale.

What the Rewrite Will Cost Honest Numbers

This is where most rewrite plans fall apart: the estimate.

The mistake is calculating only engineering time. A rewrite costs engineering time, yes but it also costs product velocity during the transition (features you couldn't build because the team was migrating), operational overhead of running two systems simultaneously, and the management attention required to keep the business aligned through a multi-month architectural change.

A realistic rule: a rewrite of a system your team built over two to three years will take twelve to eighteen months done properly. If someone tells you six months, they're either planning a big bang (risky) or they haven't scoped it honestly.

Budget for the parallel period. Running two systems simultaneously means two infrastructure bills, two monitoring setups, two things that can break at 2am. It's not permanent, but it's not free either.

And protect feature velocity. If you tell the business "we're doing a rewrite, no new features for a year," you will either break the commitment or break the business. The strangler fig approach works in part because it lets you keep shipping features on the new platform as each component migrates. That's not an accident it's by design.

The real cost of technical debt how one architectural shortcut became a $2M problem

FAQ

Q: How do we know when a rewrite is actually necessary versus just refactoring?
A: The threshold is structural. If the current architecture makes it physically impossible to do what the business needs can't scale to required load, can't add a feature without breaking three others, can't deploy without a downtime window that's a rewrite signal. If the code is messy but the architecture is sound, refactoring is almost always the better answer. Don't rewrite because the code is embarrassing. Rewrite because the structure is a ceiling.

Q: Should we tell customers we're rewriting the system?
A: Generally, no. Customers care about reliability and uptime, not implementation details. If a migration goes wrong and causes an incident, be transparent about it. But announcing a multi-month rewrite to your users tends to create anxiety without giving them anything actionable. Internally, your key stakeholders investors, large customers with enterprise contracts, anyone with an SLA should know the roadmap.

Q: What's the biggest risk during a rewrite?
A: Data inconsistency during the transition period. When you're writing to two systems simultaneously, maintaining consistency takes active effort and a gap in that effort can mean real business consequences. The second biggest risk is timeline drift: the rewrite stretches, the old system deteriorates further, the team loses confidence. Both risks are managed the same way: short migration cycles, continuous validation, and a clear definition of "done" for each phase.

Q: How do we handle features that customers request during the rewrite?
A: Triage ruthlessly. Features that can be built on the new platform should be that's actually beneficial, because it accelerates validation of the new system. Features that would require deep work on the old system should be deferred or descoped unless they're genuinely business-critical. The mistake is adding significant new functionality to the old system mid-migration; you're increasing the surface area of what needs to be replicated.

Q: Can we run a rewrite with the same team that handles production support?
A: You can, but you need to protect the rewrite work from being constantly interrupted by support fires. That means at least a partial split: some engineers dedicated to the migration with protected time, others handling ongoing operations and bug fixes. If your entire team is permanently on-call for the old system, the rewrite will never get the sustained attention it needs.

How to Run a Technical Debt Audit (Guide for Non-Engineer Founders)

Nahwin Rajan — Sun, 14 Jun 2026 02:30:00 +0000

Originally published at spectredev.xyz. Cross-posted here for the Dev.to community.

Learn how to run a technical debt audit as a non-technical founder. Spot the warning signs, ask the right questions, and decide what to fix first.

Your system is slowing down. Deployments take longer than they used to. Engineers keep saying "it's complicated" when you ask why a simple feature is taking three weeks. You're not imagining it you've accumulated technical debt, and until you know what's in the pile, you can't make good decisions about it.

This guide walks you through running a technical debt audit without needing to read a single line of code. You'll know what to look for, what questions to ask your team, and how to prioritise what gets fixed first.

What Technical Debt Actually Is (And What It Isn't)

Technical debt isn't a sign that your engineers did a bad job. Most of the time, it's the opposite. It means your team moved fast when you needed to, made pragmatic decisions under pressure, and shipped. The problem is that those shortcuts compound.

Think of it like a bank loan. Taking on debt to grow isn't stupid not paying attention to what you owe is.

The mistake I see most often: founders treat "technical debt" as a single blob. Your team uses the phrase to mean everything from "this API is poorly documented" to "if this database goes down, we lose three days of orders." Those are not the same problem. The audit's job is to separate them.

The Warning Signs That Triggered This Conversation

Before you can audit, you need to recognise the symptoms. These are the patterns that usually push a founder to take this seriously:

Deployment frequency has dropped. You used to ship twice a week. Now it's twice a month, and every release feels like defusing a bomb.

Bug fix time is increasing. A change in one part of the system breaks something completely unrelated. Engineers spend more time on investigation than on the fix itself.

Onboarding new engineers takes months, not weeks. If a new hire can't make a meaningful contribution for their first six weeks because the codebase is "too complex to explain," that's a signal and a cost.

Estimates are consistently wrong. When engineers pad their timelines with large buffers "just in case," they're pricing in the cost of navigating bad architecture.

You're hearing phrases like "we need to rewrite this." When your senior engineers start using that sentence, pay attention. They're telling you the interest payments are getting unsustainable.

The Four Categories of Technical Debt Worth Auditing

Not everything your engineers flag is equally urgent. A useful audit groups debt into four buckets.

Structural debt is the most dangerous. This is where the architecture itself limits what you can build or scale. A monolith that can't be deployed independently. A database schema so tangled that adding a new field requires changing twelve tables. A single point of failure that nobody wants to talk about because the fix is enormous. Monolith vs Microservices the honest decision framework*

Operational debt covers the infrastructure and practices that keep the lights on. Missing monitoring. Deployments that require manual steps. No runbook for when the system goes down at 2am. This category tends to be invisible until it isn't then it becomes very expensive very fast.

Code quality debt is what most people picture when they hear "technical debt": messy code, no tests, inconsistent patterns across the codebase. It slows engineers down and raises the cost of every change. It's real, but it's rarely the category that breaks a business.

Knowledge debt is often overlooked entirely. It's what happens when the one engineer who understands how the payment integration works leaves the company. Or when there's no documentation for how to run the data migration scripts. This is a business risk, not just a technical one.

Your audit needs to touch all four. But your priorities should start with structural and operational.

How to Actually Run the Audit

You don't need to do this alone. You need to create the conditions for your engineering team to be honest with you which means removing the fear that naming problems will be used against them.

Step one: Set the framing. Tell your team this is a fact-finding exercise, not a performance review. You want to understand reality so you can make better decisions. Blame is the enemy of a good audit.

Step two: Run structured interviews. Sit down with your tech lead or senior engineers, individually. Ask these questions, and then stop talking:

If you could change one thing about our system architecture, what would it be?
What part of the codebase do you dread touching?
What would break first if our traffic doubled tomorrow?
What keeps you up at night from a technical standpoint?
What have we built that you're proud of?

The last question matters. It calibrates the conversation and tells you what to protect.

Step three: Request a systems map. Ask your team to draw literally draw, on a whiteboard or in a tool like Miro how data flows through your product. From the moment a user takes an action to when it's persisted, what happens? Where are the dependencies? Where are the single points of failure?

If your team can't produce this in an afternoon, that's already an answer. Knowledge debt is severe.

Step four: Look at the operational metrics. You don't need to understand the code to understand the numbers. Ask for: deployment frequency over the last six months, mean time to recovery when things break, percentage of releases that required a rollback, and the ratio of feature work to bug fixes. These tell you whether debt is accelerating.

Step five: Score what you find. For each issue surfaced, ask your team to give you three scores from 1 to 5: how frequently it impacts them (frequency), how bad it is when it does (severity), and how hard it would be to fix (effort). This gives you a rough priority matrix without requiring you to understand the underlying technology. How to rewrite your software system without stopping your business

A Real Example: What This Looks Like in Practice

We worked with a SaaS company in Jakarta mid-stage, around Series A that had been growing fast. Their product worked. Customers were paying. But their engineering team had quietly ballooned from four to fourteen people, and velocity had somehow gotten slower.

When we ran the audit, the structural debt was the issue. They'd built their entire platform on a single PostgreSQL database that was doing everything: application state, analytics, audit logs, background jobs. Every new feature query hit the same database. Every analytics report locked rows that the application layer needed.

The code quality was actually fine. Tests existed. Engineers were competent. But the architecture meant that every decision had a hidden cost and nobody had surfaced it clearly because it had happened gradually, one reasonable shortcut at a time.

The fix wasn't a full rewrite. It was six months of deliberate work: separate the analytics reads onto a read replica, move background jobs to a proper queue, introduce caching at the right layer. Velocity recovered before the structural work was even complete because just having a plan restored confidence in the team.

What to Do With the Results

An audit that produces a 40-item list and no decision framework is useless. You need to come out with three things.

A short list of fires. These are issues that represent active risk to the business: a single point of failure, a component with no monitoring, a dependency that hasn't been updated in three years with known vulnerabilities. These go on the roadmap immediately, not in a backlog.

A medium-term refactoring plan. Structural debt that's slowing the team down but isn't breaking things yet. This should be scheduled alongside product work not deferred indefinitely, not crammed into a single "tech debt sprint" that gets cancelled.

A prioritised backlog of everything else. Lower-severity code quality issues, documentation gaps, minor operational improvements. These get worked on steadily, not in a big bang.

The honest truth: most companies don't need to rewrite everything. What they need is a clear view of what's actually risky, a decision about what to fix in what order, and the discipline to protect time for it. How to build a backend that scales from 100 to 10 million users*

FAQ

Q: How long does a technical debt audit take?
A: For a startup at Seed to Series A, a focused audit interviews, systems mapping, metrics review can be done in one to two weeks. The output shouldn't be a 60-page report. It should be a prioritised list of risks and a rough remediation plan. If it's taking longer than two weeks, the scope has grown too large to be actionable.

Q: Should I hire an external consultant to run the audit?
A: Sometimes. An external engineering audit is valuable when there's political tension inside the team (engineers don't feel safe being honest with founders), when you're considering an acquisition or significant investment and need a third-party view, or when your team lacks the seniority to recognise structural problems. For most early-stage companies, a well-facilitated internal audit is enough to start.

Q: How do I know if the debt is "bad enough" to address now?
A: Ask one question: is technical debt limiting your ability to respond to the market? If a competitor ships a feature and it would take your team three months to match it because of architectural constraints, that's a business problem, not just a technical one. Debt that's slowing feature velocity at a growth stage is expensive regardless of how much it costs to fix.

Q: My engineers keep saying we need to rewrite the whole thing. Is that true?
A: Rarely. A full rewrite is almost always more expensive, more risky, and takes longer than engineers initially estimate. The impulse is understandable starting fresh feels cleaner. But the real question is whether there's a way to incrementally improve the system while keeping the business running. In most cases, targeted refactoring of the highest-debt components is more effective than a ground-up rebuild. That said, sometimes a rewrite is the right call particularly when the architecture makes it structurally impossible to build what the business needs.

Q: Who should own technical debt remediation engineering or the founder?
A: Engineering should own the execution. But the prioritisation decision is yours. Debt remediation competes with new feature work for engineering time. Only you can decide how much of that tradeoff is acceptable given your commercial situation. The mistake is leaving it entirely to engineers, who then either ignore it (because product pressure wins) or fixate on it (because it bothers them professionally). The founder's job is to make the tradeoff explicit and protect time for it.

The point of a technical debt audit isn't to produce a comprehensive report that lives in a Notion page nobody reads. It's to get clarity on where the real risk is and to give your engineering team the mandate and the time to address it before it addresses you.

SpectreDev works with funded startups and established SMEs who need that clarity fast, without disrupting the business that's already running. If your team is hitting growth walls and you're not sure what's actually causing it, the audit is the right place to start.

API Design for High-Throughput Systems: Rate Limiting, Versioning, Idempotency

Nahwin Rajan — Sun, 07 Jun 2026 02:30:00 +0000

Originally published at spectredev.xyz. Cross-posted here for the Dev.to community.

Building APIs that hold up under real traffic takes more than fast code. Here's how rate limiting, versioning, and idempotency work and when they matter most.

An API that works fine at 100 requests per second can become a liability at 10,000. Not because the logic changed, but because the assumptions baked into the design stop holding at scale. Clients retry aggressively. Traffic spikes unpredictably. Downstream services slow down and back-pressure propagates upstream. Payment confirmations arrive twice.

Most of these failure modes are predictable. The patterns that prevent them rate limiting, versioning, and idempotency aren't exotic engineering. They're table stakes for any API that handles real traffic. The problem is most teams implement them as afterthoughts, bolted on when something has already broken in production.

This post is about building them in from the start.

Rate Limiting: Protecting Your System From Yourself and Everyone Else

Rate limiting is often framed as a defence against malicious clients bots, scrapers, bad actors. That's part of it. But the more important use case is protecting your system from legitimate traffic that exceeds what your infrastructure can actually serve.

A flash sale on a regional e-commerce platform. A push notification that sends 2 million users to the same product page simultaneously. A third-party integration that has a bug causing it to retry in a tight loop. All of these are real traffic patterns, all of them are potentially legitimate, and all of them can take down an unprotected API.

Rate limiting is how you define the contract: here's what this system is designed to handle, and here's what happens when you exceed it.

The three most common algorithms:

Token bucket gives each client a bucket that fills with tokens at a fixed rate. Each request consumes a token. When the bucket is empty, requests are rejected or queued. The bucket has a maximum capacity, which means clients can "save up" for short bursts useful for APIs where occasional spikes are normal but sustained high volume is not.

Leaky bucket processes requests at a fixed output rate regardless of input rate. Excess requests queue (or are dropped). It smooths traffic more aggressively than token bucket and is useful when you need consistent downstream throughput for example, protecting a database that can't handle burst writes.

Fixed window counts requests in a fixed time window (say, 1,000 requests per minute) and resets at the window boundary. Simple to implement, but has an edge case: a client can send 1,000 requests at 11:59 and another 1,000 at 12:00, effectively hitting 2,000 requests in two minutes without technically violating the rule. Sliding window counters fix this but at higher implementation cost.

The choice between them depends on your traffic pattern and what you're protecting. For most external-facing APIs, token bucket with a sliding window variant is a reasonable default.

Where to implement it: as early in the request path as possible. An API gateway (Kong, AWS API Gateway, Nginx with rate limiting modules) handles this before your application code even sees the request. This matters because rate limiting at the application layer still consumes application resources to reject the request. At the gateway layer, you shed load before it reaches your compute.

The response matters too. A rejected request should return HTTP 429 with a Retry-After header telling the client when it can try again. A X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset header set tells well-behaved clients how to pace themselves. Design for good clients, not just bad ones.

How to build a backend that scales from 100 to 10 million users

API Versioning: Making Change Without Breaking Your Consumers

APIs are promises. The moment an external client a mobile app, a partner integration, a third-party developer starts depending on your API, changing it carries risk. Versioning is how you manage that risk without freezing your system in amber.

The hard truth: there is no perfect versioning strategy. Every approach involves trade-offs, and the right one depends on how your API is consumed.

URI versioning (/v1/orders, /v2/orders) is the most common and the most visible. The version is explicit in the URL, easy to route at the gateway level, and easy to document. The downside is it can encourage treating versions as separate products rather than as an evolving contract teams end up maintaining /v1 and /v2 as parallel codebases, which compounds maintenance burden quickly.

Header versioning (Accept: application/vnd.spectredev.v2+json) keeps URLs clean and is arguably more semantically correct the resource identity doesn't change, only the representation does. The trade-off is it's less visible, harder to test in a browser, and more complex to route at the infrastructure layer. It's the right approach for mature API programs; it's probably over-engineering for most startups.

Query parameter versioning (/orders?version=2) is easy to implement and easy to test, but mixes versioning concerns with resource-addressing concerns. Use it for internal tooling if it makes life easier. Don't use it for public APIs.

The versioning strategy matters less than the discipline around when you version. A change that adds a new optional field to a response is backwards compatible don't version it. A change that removes a field, renames a field, or changes a field's type is breaking version it. A change that alters the semantics of an existing field (same name, different meaning) is the most dangerous kind because it won't cause a client to fail immediately; it'll cause it to fail silently with wrong data.

Deprecation is part of the contract. When you release /v2, set a clear deprecation timeline for /v1 six months is common for external APIs, three months is often enough for internal ones. Send Deprecation and Sunset response headers on every /v1 request. Log which clients are still hitting deprecated versions. Reach out to those clients directly before you pull the plug. The teams that handle API versioning well treat it as a communication problem as much as a technical one.

Idempotency: The Pattern That Saves You When the Network Lies

Networks are unreliable. Clients time out and retry. Load balancers reroute mid-request. Mobile apps lose connectivity at exactly the wrong moment and come back online assuming the last request failed.

In a read-heavy API, this is mostly fine fetching the same resource twice is harmless. In a write-heavy API, it's a serious problem. A payment processed twice is a real financial error. An order created twice is a real fulfilment problem. A user created twice is a real data integrity problem.

Idempotency is the property that says: sending the same request multiple times has the same effect as sending it once. Implementing it correctly is one of the most valuable things you can do for an API that handles financial transactions, order management, or any operation where duplicates are costly.

The standard implementation uses an idempotency key a unique identifier generated by the client and sent with each request, typically as a header (Idempotency-Key: <UUID>). The server stores the key and the result of the first successful processing. On subsequent requests with the same key, it returns the stored result without re-executing the operation.

The storage mechanism is usually a fast key-value store (Redis works well here) with a TTL keys don't need to live forever, just long enough to cover the client's retry window. 24 hours is a common default for payment APIs; 7 days is more conservative for workflows with longer retry cycles.

A concrete example: a GoPay or OVO disbursement request that times out on the client side. Did the money move or not? Without idempotency, retrying is risky. With an idempotency key, the client retries with the same key, the server checks its store, sees the operation already completed, and returns the original successful response. No double disbursement. The client gets the confirmation it needed.

What to store: at minimum, the idempotency key, the response status code, and the response body. Some implementations also store the request body and validate that subsequent requests with the same key have the same body if a client sends different parameters with the same idempotency key, that's a client bug, and you should return a 422 rather than silently processing the new parameters.

Idempotency keys should be client-generated. The client owns the key because the client is the one recovering from failure. Server-generated idempotency would require the client to have already received the key, which assumes the first request succeeded defeating the purpose.

Stripe's API documentation on idempotency is one of the clearest practical references for this pattern, and worth reading if you're implementing payment-adjacent functionality.

What is database sharding and when does your startup actually need it

How These Three Patterns Work Together

Rate limiting, versioning, and idempotency are often treated as separate concerns. In a well-designed high-throughput API, they interact.

Rate limiting shapes the load your system accepts. Idempotency handles the safe retry behaviour when requests fail. Versioning ensures that as you improve both of those mechanisms over time, you can do so without breaking existing clients.

A practical scenario: you're running a B2B payments API used by Indonesian SME accounting software integrators similar to the kind of integrations built on top of platforms like Jurnal or Accurate. Your rate limits are per API key, not per IP, because your clients are businesses making requests on behalf of thousands of end users. Your idempotency implementation covers all POST and PATCH endpoints because those are the ones with real-world side effects. Your versioning is URI-based with a 6-month deprecation cycle because your clients are third-party developers who need predictability.

That's not a complex system. It's a coherent one. Each decision reinforces the others.

One thing to not overlook: documentation. An API with perfect rate limiting, versioning, and idempotency that is poorly documented will still fail in production because clients will implement integrations incorrectly, hit rate limits they didn't know existed, and retry without idempotency keys because they didn't know they needed them. The OpenAPI spec is not documentation. It's a schema. Documentation explains the why and the what-happens-when.

Monolith vs modular monolith vs microservices: the honest decision framework

A Note on When to Build This Versus When to Buy It

If you're building a public-facing API today, you probably don't need to implement rate limiting or versioning routing from scratch. API gateways AWS API Gateway, Kong, Apigee, or the gateway layer of a managed Kubernetes platform handle the infrastructure concerns and let your application focus on business logic.

What you do need to implement yourself is idempotency, because that's specific to your domain logic and your data model. No gateway can know whether a payment request has already been processed only your application can.

The mistake we see most often is teams building sophisticated custom rate limiting middleware in their application framework when a gateway would have served them at a tenth of the cost while simultaneously having no idempotency implementation at all for their payment endpoints, where the stakes are highest.

Spend your engineering effort where it can't be bought.

FAQ

Q: What HTTP status code should I return when a request is rate limited?

A: HTTP 429 (Too Many Requests). Always include a Retry-After header indicating when the client can next attempt the request either as a number of seconds or an HTTP date. Without this, well-behaved clients can't back off intelligently and you'll see retry storms that compound the load problem you were trying to prevent.

Q: How do I handle idempotency for operations that involve multiple steps or downstream service calls?

A: This is the hard case. If your operation involves multiple downstream calls update a record, charge a payment, send a notification idempotency needs to cover the entire sequence, not just individual steps. The safest pattern is to treat the whole operation as a saga: each step is idempotent individually, and the overall operation can be retried from any point of failure. This requires careful state tracking (typically in your database, not just a cache) and is a significant design investment. For most teams, the first step is making the critical path idempotent and accepting that edge cases in complex sagas require manual reconciliation until you've hit that problem enough times to justify the engineering cost.

Q: Should internal APIs services talking to each other within our own system also be versioned?

A: With less formality, yes. If two internal services are deployed independently, a breaking change in one can break the other mid-deployment. Contract testing (tools like Pact) is often a better fit for internal APIs than explicit versioning, because it catches breaking changes before deployment rather than managing them after. For services deployed together or tightly coupled by design, a shared contract in code (a shared types library, a protobuf schema) is usually cleaner than versioning the HTTP surface.

Q: What's the right granularity for rate limits per IP, per user, per API key?

A: It depends on who your clients are. Per-IP is appropriate for unauthenticated public endpoints where you don't yet know who the caller is. Per-user limits are right for authenticated user-facing endpoints where you're protecting against individual abuse. Per-API-key limits are right for B2B or developer APIs where the client is an organisation making requests on behalf of many end users throttling by IP would punish them for traffic that's legitimately spread across many users. Most mature APIs use a combination: unauthenticated requests rate-limited by IP, authenticated requests by API key or user ID, with different limits for different endpoint tiers.

Q: How long should idempotency keys be stored?

A: Long enough to cover your client's retry window with meaningful margin. For payment APIs, 24 hours is the industry norm Stripe uses this, for example. For longer-running async workflows where a client might retry over days, 7 days is more conservative. There's a storage cost to longer TTLs if you're storing full response bodies at volume, but at most scales it's negligible. Err on the side of longer and trim based on actual storage pressure, not upfront assumptions.

Rate limiting, versioning, and idempotency aren't the most glamorous parts of API design. They won't make it into your launch post. But they're the difference between an API that holds up when traffic gets real and one that becomes a source of production incidents at the worst possible moment. The patterns are well-understood. The implementation cost is manageable. The cost of not doing it is paid in pages, customer refunds, and emergency architecture work at 2am.

Build it in from the start. Your future on-call self will notice.

What Is Database Sharding — and When Does Your Startup Actually Need It

Nahwin Rajan — Sun, 31 May 2026 02:30:00 +0000

Originally published at spectredev.xyz. Cross-posted here for the Dev.to community.

Database sharding explained without the hype. Learn what it actually is, the real cost of implementing it, and whether your startup genuinely needs it yet.

Most startups don't need database sharding. There. That's the most useful thing this post can tell you upfront.

But the question of when you do need it and what it actually costs to implement is worth understanding before you hit the wall, not after. Because by the time sharding becomes urgent, you're usually operating under pressure, and pressure is a terrible time to make irreversible architectural decisions.

Here's what database sharding is, how it works, and the honest framework for deciding whether it belongs in your near-term roadmap.

What Database Sharding Actually Is

A database shard is a horizontal partition of your data. Instead of one database holding all your rows, you split the dataset across multiple database instances each instance (a "shard") holding a subset of the data.

The key word is horizontal. Sharding is not the same as replication, where you copy the same data to multiple servers for read scaling or redundancy. In sharding, each record lives in exactly one shard. The total dataset is distributed, not duplicated.

The mechanism that makes this work is the shard key the field you use to determine which shard a given record belongs to. A common example: shard by user_id. Users 1–1,000,000 go to Shard A. Users 1,000,001–2,000,000 go to Shard B. Your application (or a routing layer) knows which shard to query for a given user.

Simple in concept. Genuinely complex in practice.

How to build a backend that scales from 100 to 10 million users

The Problem Sharding Solves and Why Most Teams Don't Have It Yet

Sharding exists to solve one specific problem: a single database server that can no longer handle your write volume or data volume, and where vertical scaling (bigger hardware) is no longer a viable or cost-effective option.

Read scaling is a different problem with different solutions. If you have heavy read load, read replicas solve that cleanly and at a fraction of the operational complexity. Connection pooling, query optimisation, caching layers like Redis these address most database performance issues that startups encounter at early to mid-scale.

A useful heuristic: if your database is struggling, sharding is probably not the first answer. It's usually the last answer after you've exhausted the simpler ones.

The rough sequence most production systems follow before sharding becomes necessary:

First, query optimisation and proper indexing this alone fixes the majority of "the database is slow" problems we see in audit engagements. Second, a caching layer for frequently-read data. Third, read replicas to offload read traffic from the primary. Fourth, connection pooling. Fifth, vertical scaling (larger instance, more RAM, faster disks). Sixth and only after all of the above sharding for write-heavy workloads that have outgrown a single primary.

Most startups are stuck somewhere between step one and step three. Sharding is step six.

The Real Cost of Sharding

This is where the blog posts usually get dishonest. They explain how sharding works, show you the architecture diagrams, and stop there. The operational reality deserves more attention.

Cross-shard queries become painful. If your shard key is user_id and you need to run an analytics query across all users say, "show me all orders placed in the last 24 hours across all regions" you now have to query every shard and aggregate the results in your application layer. Either that, or you maintain a separate analytics database that aggregates across shards. Neither is free.

Transactions get complicated. In a single database, ACID transactions are straightforward. Across shards, any operation that touches records in two different shards requires a distributed transaction, which is a significantly harder problem to solve correctly. In practice, most teams redesign their data model to avoid cross-shard transactions rather than implement them which is often the right call, but it constrains how you can model your domain.

Rebalancing is not painless. Your shards will not stay balanced forever. If you shard by user_id range and your power users are all in the upper ID range, one shard gets hammered while the others sit idle a "hot shard" problem. Fixing it means rebalancing data across shards while the system is live. That's a non-trivial operational exercise.

Schema migrations get harder. Running a migration on one database is already a careful process on a production system. Running coordinated migrations across twelve shards, ensuring they complete consistently, is a different category of problem.

Your application layer has to know about it. Unlike read replicas (which are often transparent to the application), sharding usually requires the application to participate in routing decisions. That's code your team now owns and maintains.

None of this means sharding is the wrong choice when you actually need it. Tokopedia, Gojek, and Traveloka all run sharded databases at scale because they have the traffic and data volumes that genuinely require it. But they also have dedicated platform engineering teams managing that infrastructure. That context matters.

Monolith vs modular monolith vs microservices: the honest decision framework

Sharding Strategies: The Four Main Approaches

When sharding is the right call, how you shard matters as much as whether you shard. There are four primary strategies, and each has a different set of trade-offs.

Range-based sharding partitions data by a continuous range of the shard key value user IDs 1 to 1M on Shard A, 1M to 2M on Shard B, and so on. It's simple to understand and implement, but vulnerable to the hot shard problem if your data isn't evenly distributed across the range.

Hash-based sharding applies a hash function to the shard key and uses the result to determine placement. This distributes data more evenly, which reduces hot shards, but it destroys range locality you can no longer efficiently query "all users with IDs between X and Y" because those records are now scattered.

Directory-based sharding maintains a lookup table that maps shard keys to shards. This is the most flexible approach and allows you to rebalance shards without changing your hashing logic. The trade-off is the lookup table itself becomes a dependency a bottleneck and a single point of failure if not handled carefully.

Geographic sharding partitions data by region Southeast Asian users on one cluster, Australian users on another. This is particularly relevant for companies operating across multiple markets with data residency requirements. Indonesia's data localisation regulations under Government Regulation No. 71 of 2019 (PP 71/2019) require certain categories of personal data to be stored on infrastructure physically located in Indonesia. Geographic sharding can be part of how you comply with that, though the regulatory picture is more nuanced than just where the database sits.

When Your Startup Actually Needs to Start Thinking About Sharding

Specific signals matter more than vague thresholds, but here are the concrete ones worth paying attention to.

Your write throughput has exceeded what a single primary can handle even after connection pooling and hardware upgrades. You're seeing consistent replication lag on your read replicas that's impacting user experience. Your largest tables have grown past the point where a single-server B-tree index can serve queries within acceptable latency. Your data volume is approaching the practical storage limits of a single instance and vertical scaling costs have become disproportionate.

In terms of rough order of magnitude: for most well-optimised PostgreSQL or MySQL setups on decent hardware, you can handle tens of thousands of write transactions per second before you genuinely exhaust single-node capacity. Many startups that feel they need sharding are running at a fraction of that and their actual problem is unoptimised queries, missing indexes, or unnecessary write amplification in their application code.

A practical test: before pursuing sharding, run a proper database performance audit. Look at your slow query log. Examine your write patterns. Check whether your schema design is creating unnecessary lock contention. We've worked with teams who were convinced they needed sharding and found, after a structured audit, that three index changes and a query rewrite cut their database load by 60 percent. That bought them 18 months of headroom without touching the architecture.

How to run a technical debt audit (a guide for non-engineer founders)

A Concrete Example: Sharding Decision for a Payments Platform

Consider a fintech startup processing payments across Indonesia peer-to-peer transfers, bill payments, e-wallet top-ups. They come to us at around 500,000 active users and 200,000 transactions per day, worried about whether their PostgreSQL single-node setup will survive projected growth.

At 200,000 transactions per day, they're writing roughly 2–3 records per transaction (the transaction record, a ledger entry, a notification event). That's 400,000–600,000 writes per day, which averages to under 10 writes per second. A well-configured PostgreSQL instance can comfortably handle 5,000–10,000 writes per second. They have two to three orders of magnitude of headroom.

The right conversation isn't sharding it's ensuring their indexes are correct, their connection pooling is configured properly, and they have a read replica absorbing their reporting queries. That architecture will take them past 5 million users without fundamental change.

Now imagine they've grown to 5 million active users and are processing 50 million transactions per day the kind of volume GoPay was handling in its growth phase. At that scale, write throughput genuinely becomes a single-node constraint, and the case for sharding, probably by user_id with a consistent hash, becomes real and defensible.

The architecture decision should follow the traffic, not anticipate it by three years.

FAQ

Q: What's the difference between sharding and partitioning?

A: Partitioning is typically done within a single database instance PostgreSQL table partitioning, for example, splits one logical table into physical sub-tables on the same server. It improves query performance and manageability but doesn't distribute load across multiple servers. Sharding distributes data across multiple separate database instances. Partitioning is often a useful step before sharding and can buy you significant headroom on its own.

Q: Can managed databases like Amazon RDS or Google Cloud SQL handle sharding for me?

A: Not automatically, no. RDS and Cloud SQL manage replication, backups, failover, and vertical scaling, but they don't shard your data across instances on your behalf. Amazon Aurora has some features that push in this direction for read scaling, and Google Spanner is a distributed database that handles horizontal scaling transparently but Spanner is a different product category with different cost and complexity trade-offs. For most startups, managed databases like RDS are the right choice well before sharding is relevant.

Q: Is MongoDB or Cassandra easier to shard than PostgreSQL?

A: MongoDB and Cassandra have sharding (or in Cassandra's case, distributed architecture) built into their core design. PostgreSQL and MySQL require more explicit work to shard, whether through application-level routing, Citus, or tools like Vitess. That said, "easier to shard" shouldn't drive your database choice. The database that fits your data model and query patterns is more important than one that theoretically scales more easily because most teams never reach the scale where sharding is necessary regardless of which database they chose.

Q: If we shard now while we're small, won't that make scaling easier later?

A: This is the most common trap. Implementing sharding before you need it adds immediate complexity, slows down development, and optimises for a future scale problem that may never materialise in the form you anticipated. Your shard key choice may turn out to be wrong for your actual access patterns and changing a shard key on a live system is painful. Build with clean boundaries and a data model that could accommodate sharding later. Don't implement the sharding itself until the signals are there.

Q: We're a non-technical founder. How do we know if our CTO is recommending sharding prematurely?

A: Ask two questions. First: what have we already tried before reaching this conclusion? The answer should include read replicas, caching, query optimisation, and vertical scaling. If sharding is being proposed as a first response to performance issues, that's a flag. Second: what's our current write throughput and how does it compare to the limits of our current setup? If the answer is vague, push for numbers. Real performance problems have measurable symptoms.

Getting the database architecture wrong in either direction is expensive too early and you're carrying operational complexity that slows your team down; too late and you're doing emergency architecture work under production pressure. The honest answer is that most startups reading this are further from needing sharding than they think, and the simpler scaling levers are worth pulling first.

When you do hit genuine write-scale constraints, the decision about how to shard and what to extract into separate data stores is one worth getting external perspective on before committing. The choices you make at that stage are difficult to unwind.

Monolith vs Modular Monolith vs Microservices: The Honest Decision Framework

Nahwin Rajan — Sun, 24 May 2026 02:00:00 +0000

Originally published at spectredev.xyz. Cross-posted here for the Dev.to community.

Choosing between monolith, modular monolith, and microservices? Here's the honest, opinionated framework your startup actually needs. Stop copying Netflix.

Your architecture choice shouldn't be driven by what Netflix or Uber is doing. It should be driven by where you are right now your team size, your traffic, your deployment maturity, and your runway. The monolith vs microservices debate has a real answer for your situation. It's just not the one most blog posts give you.

Here's the framework we use when helping startups and growth-stage companies make this call.

Why Most Teams Get This Wrong From the Start

The mistake I see most often: founders read about how Airbnb or Grab migrated from a monolith to microservices, and they decide to build microservices from day one because that's what scale looks like.

It isn't. That's what post-scale looks like. There's a difference.

When those companies broke apart their monoliths, they had hundreds of engineers, mature CI/CD pipelines, dedicated platform teams, and years of operational experience with their own domain boundaries. They weren't starting fresh. They were solving a problem that emerged from growth, not anticipating one that might never arrive.

Starting with microservices before you have product-market fit is one of the fastest ways to burn engineering resources on infrastructure instead of product. We've seen it happen. It's painful to watch.

The Monolith: Unfairly Maligned

A monolith isn't a bad architecture. It's a starting point. And for most teams honestly, for teams up to 10–15 engineers with a single product it's the right one.

In a traditional monolith, all your application code lives in one deployable unit. One codebase, one database, one deployment pipeline. The benefits are real and often underappreciated.

Development velocity is genuinely faster early on. There's no network latency between services, no distributed transaction complexity, no service discovery overhead. You can refactor across the entire codebase in one shot. Debugging is straightforward because everything runs in one process.

The problems come later. As your team grows, the codebase becomes harder to navigate. Different teams start stepping on each other's work. Deployments get slow and risky because every change ships everything. That's when the architecture starts to fight you, not help you.

The real signal you've outgrown a monolith isn't traffic. It's team friction and deployment pain.

How to build a backend that scales from 100 to 10 million usershow-to-scale-startup-backend)

The Modular Monolith: The Option Nobody Talks About Enough

Here's the counter-intuitive point: for most startups that think they need microservices, a modular monolith is actually the better answer.

A modular monolith is still a single deployable unit, but the internals are deliberately structured into isolated modules each with clear boundaries, their own data ownership, and strict rules about how modules interact. Think of it as microservices discipline inside a monolith's deployment model.

The practical result: you get much of the architectural clarity of microservices without the operational overhead. You can enforce team ownership of modules. You can move faster without breaking unrelated parts of the system. And when you eventually decide to extract a service, the module boundary makes it a surgical operation instead of a painful untangling.

Shopify ran a monolith for years, and the work they did to make it modular what they called "componentisation" is one of the more honest engineering stories out there. It wasn't glamorous. It was just effective.

A modular monolith is the architecture that earns microservices. You build the discipline first, then you extract services when the operational case is clear.

Microservices: When They Actually Make Sense

Microservices are the right answer for a specific set of conditions. All of them need to be true, not just one.

Your team is large enough that different groups genuinely need independent deployment cycles. You have parts of your system with radically different scaling characteristics say, a real-time notification service that spikes to millions of events per second while your invoicing service handles hundreds of requests per day. You have the platform engineering capacity to run container orchestration, service meshes, distributed tracing, and on-call rotations for multiple services. Your domain boundaries are well understood because you've built and operated the system long enough to know where they should be.

If you're missing any of those, microservices will slow you down.

The operational surface area is real. In a distributed system, you're now debugging network partitions, handling partial failures, managing schema migrations across service boundaries, and coordinating deployments across multiple repositories. Each of those is a solvable problem. Collectively, they require a team that has the headspace to solve them.

One of SpectreDev's clients a Series A fintech came to us after attempting a microservices migration with a team of six engineers. Eighteen months in, they had eight services, three of which couldn't be deployed independently because of undocumented shared state. The team was spending more time on infrastructure incidents than feature work. We spent three months collapsing it back to a modular monolith before rebuilding the extraction incrementally. The irony isn't lost on anyone.

What is database sharding and when does your startup actually need it

The Decision Framework

Use this as a starting point. It's opinionated and it's supposed to be.

Build a monolith if: you're pre-product-market fit, your team is under 8 engineers, and your primary constraint is development speed.

Refactor toward a modular monolith if: you've found product-market fit, your team is growing, and you're starting to feel the organisational friction of a shared codebase but you don't yet have the platform maturity for distributed systems.

Extract services from that modular monolith if: a specific module has a genuinely different deployment cadence, a clearly different scaling profile, or a different team ownership model that justifies the operational overhead. Extract one service, operate it well, then decide on the next.

Notice what's not on that list: "because we expect to have 10 million users someday." That's not an architecture decision. That's wishful thinking. Architect for where you are and the next logical growth phase, not for a ceiling you may never approach.

For Indonesian companies specifically, there's an additional layer to consider: talent availability. Engineers comfortable with distributed systems operations Kubernetes, service mesh, distributed tracing are a thinner slice of the market in Jakarta than in San Francisco. A modular monolith your current team can operate confidently is worth more than a microservices setup that creates a hiring dependency you can't fill.

Practical Example: How a Regional E-Commerce Startup Should Think About This

Consider a regional e-commerce platform think something operating across Indonesia, Malaysia, and the Philippines in the 50,000–200,000 active users range and growing.

At that scale, the right architecture is almost certainly a modular monolith. You'd want clearly isolated modules for the product catalogue, order management, payments (especially given regional payment methods like GoPay, OVO, and GrabPay that each have their own integration logic), and logistics tracking.

None of those need independent deployments yet. But structuring them as modules means when you hit 2 million users and the payments module is getting hammered during 11.11 flash sales, you can extract it as a standalone service with a clear API contract already in place. The groundwork is done.

The alternative building microservices for each of those domains at 50,000 users would add months of infrastructure work before you've even proven the product works in all three markets.

How to build a backend that scales from 100 to 10 million users

FAQ

Q: Can I start with a monolith and migrate to microservices later without rewriting everything?

A: Yes and this is actually the intended migration path. The key is maintaining clean module boundaries inside your monolith from early on. If you've built a well-structured modular monolith, extracting a service means defining the API boundary (it probably already exists as a module interface), setting up the deployment infrastructure for that service, and gradually moving traffic. It's still significant work, but it's not a rewrite. A "big bang" monolith with tangled dependencies is the one that requires a painful rewrite.

Q: At what team size should I seriously consider moving to microservices?

A: Team size isn't the only variable, but a rough heuristic: when you have more than 15–20 engineers working on the same codebase and deployment friction is measurably slowing you down, it's worth having the conversation. The more useful indicator is whether you can deploy changes to one part of the system without risking unrelated parts and whether the answer is "no" consistently enough to hurt you.

Q: Are microservices harder to secure than a monolith?

A: Differently hard, not necessarily harder. In a monolith, your attack surface is more contained but a compromise can affect the whole system. In microservices, you have more network attack surface and need to secure service-to-service communication (mTLS, service accounts, network policies). The security posture depends entirely on your implementation. Neither architecture is inherently more secure.

Q: What about serverless is that a fourth option?

A: Serverless functions can be a useful pattern within any of these architectures, but they're not a replacement architecture. You can have serverless functions inside a modular monolith (for async event processing, for example), and you can have them inside a microservices system. Serverless introduces its own complexity around cold starts, stateless design, and vendor lock-in that most teams underestimate. For most startups, it's a tool, not a strategy.

Q: We're a non-technical founder. How do we evaluate whether our current tech team is recommending the right architecture?

A: Ask them to explain the trade-offs, not just the choice. A strong engineer can tell you what you're giving up with their recommended approach, not just what you're gaining. If the pitch for microservices doesn't include an honest discussion of operational overhead, distributed system complexity, and your team's current capability to manage it that's a flag. The best architecture recommendation for your stage should feel slightly boring. Exciting architecture choices are usually expensive ones.

The right architecture is the one that lets your team ship product, maintain reliability, and adapt as your business changes. That's not microservices for most of you reading this. It's probably a well-structured monolith or modular monolith that gives you the discipline to grow into something more distributed when the evidence actually demands it.

If you're at the stage where these decisions are becoming real whether you're building 0 to 1 or hitting the walls of a system you've outgrown this is exactly what we work through with clients at SpectreDev.

How to Build a Backend That Scales from 100 to 10M Users

Nahwin Rajan — Sun, 17 May 2026 04:32:21 +0000

Your system worked fine. Then it didn't.

Not at 1,000 users — at 1,000 it was still fine, a bit slow maybe. The crash came around 50,000 concurrent requests. Database refused connections. Response times went from 180ms to 11 seconds. The on-call was you. The postmortem was painful.

This isn't a story about bad engineering. Most teams that hit scaling walls wrote reasonable code for the scale they had. The problem is that reasonable code for 100 users has a different shape than reasonable code for 10 million, and nobody warns you about the specific places it breaks in between.

What follows is the sequence of bottlenecks you'll actually hit, roughly in the order you'll hit them. Not theory. The things we've seen break at funded startups, and what actually fixed them.

Start boring. Stay boring as long as you can.

The most expensive advice in early-stage software is "build for scale from day one."

Don't.

Nobody knows what their system actually needs to scale until it needs to scale. Teams that design microservices at MVP stage spend their first year fighting infrastructure instead of building their product. I've watched it happen. It's not a capacity problem — it's a self-inflicted coordination problem.

The right architecture for your first 10,000 users is a monolith: one codebase, one database, one server. A well-tuned PostgreSQL instance on a decent Hetzner or DigitalOcean box can handle more traffic than most founders expect. Gojek didn't launch as a distributed system. Neither did Tokopedia. They started boring, scaled up when they had to, and made the hard architectural decisions with real traffic data instead of guesses.

The skill isn't picking the right architecture upfront. It's recognising when your current one stops working and knowing what to reach for next.

Where systems actually break first: the database

Eighty percent of scaling problems live here. Not the app layer. Not the load balancer. The database.

Most backends start on a single PostgreSQL (or MySQL) instance. That's fine — until queries slow down, connections pile up, and response times spike at peak hours. Before reaching for read replicas or sharding, check these first:

Unindexed columns. Run EXPLAIN ANALYZE on your slowest queries. You'll almost always find a sequential scan on a column with no index. Adding the right index can turn a 4-second query into 40ms. We've seen it on tables with 200 million rows — the query just worked after the index landed.

N+1 queries. ORMs hide these well. Your endpoint that loads 50 products is probably firing 51 queries: one for the list, one per product for a related model. Find it in query logs. Fix it with eager loading or a JOIN.

Connection exhaustion. Every API request opening its own database connection doesn't scale. PgBouncer as a connection pooler is a one-afternoon change that has unblocked teams hitting walls at 50k DAU.

Fix those three things first. You probably just bought yourself three to six months of headroom.

When that's not enough: add a read replica. Route all SELECT queries there, writes stay on primary. This halves primary load for read-heavy applications and is a Monday morning change, not a quarter-long project.

Sharding — splitting data across multiple database instances — comes much later, when a single machine genuinely can't store your data or sustain your write volume. Most startups never get there. The ones that do at least know exactly why they're doing it.

Caching: what it solves, and what it doesn't

Redis is often treated as magic. It isn't. It's a trade-off: faster reads at the cost of potential staleness.

It works well when the same data gets read far more often than it changes — user profiles, product listings, pricing tables, configuration values. The cache-aside pattern covers most cases: check Redis first, on miss hit the database, write the result back to Redis with a TTL.

Two things that bite teams in production:

Cache stampede. Your TTL expires on a popular key. Three hundred concurrent requests miss cache simultaneously and pile onto the database. Fix it with mutex locking on cache population (only one request rebuilds the cache, others wait) or by randomising TTLs so popular keys don't all expire at the same moment.

Stale data at the worst time. A promotion goes live, prices change, cache still serves old values. Every cached key needs a TTL appropriate to how often the underlying data actually changes. "Cache forever" always becomes a problem eventually.

One important note: caching buys time. It doesn't fix slow queries or connection problems. Solve those first, then layer caching on top.

Horizontal scaling: when it helps, when it doesn't

Adding more app servers is the straightforward part — once your application is stateless. Sessions can't live in memory on individual servers. They need to live in Redis or the database so any instance can handle any request.

Beyond statelessness: a load balancer distributes traffic across instances, health checks remove dead ones automatically. Round-robin works for most cases.

What horizontal scaling doesn't fix is a slow database. Five app servers hitting a slow query just create five times the load on the same bottleneck. This is the trap most teams fall into — they see high CPU on the app server, add another instance, and watch database CPU spike instead.

Fix the database layer first. Then scale the application horizontally.

The mistake almost everyone makes

Microservices.

I've seen this at multiple startups in the last two years. The team reads about how a unicorn operates, decides they should architect the same way, and six months later they have fifteen services, a Kubernetes cluster nobody fully understands, distributed tracing that half-works, and a deployment pipeline that takes 45 minutes.

Microservices solve an organisational problem, not a technical one. They exist so large engineering organisations — 50, 100, 200 people — can ship independently without blocking each other. At 10 to 20 engineers, you don't have that problem. You just gave yourself one.

The inflection point where microservices start making sense: multiple teams, multiple deployment cadences, clear domain ownership, and enough engineers to properly staff each service. Before that, the right answer is usually a modular monolith — clear internal module boundaries, defined interfaces between them, deployed as one unit. Most of the organisational benefit, none of the distributed systems complexity.

What an actual scale-up sequence looks like

A fintech company in Jakarta processes payment webhook notifications for a mid-size e-commerce platform. At launch: single Django app, single PostgreSQL, one EC2 instance.

At around 300,000 daily active users, two things broke simultaneously. Database connections were exhausted during 11am–1pm peak (the lunch scroll). Webhook processing was blocking synchronous API responses, adding 3–8 seconds of latency.

The fix sequence:

PgBouncer for connection pooling → connection exhaustion resolved within 24 hours
Celery + Redis for async webhook processing → API responses back to sub-200ms
PostgreSQL read replica → offloaded 60% of DB reads, primary CPU dropped from 82% to 34%

Same Django monolith throughout. No Kubernetes. No microservices. Six times the headroom, two weeks of engineering work.

They're at 1.2M DAU now on the same core architecture. The next actual architectural decision is sharding the payments table, which is approaching 800GB. That's a six-month project, carefully sequenced. It's the right problem to be solving at 1.2M DAU — not at 300k.

FAQ

When should I move from a monolith to microservices?
When you have multiple teams that need to deploy independently, clear domain boundaries in your codebase, and at least two engineers who can own each service end-to-end. Most teams under 30 engineers aren't there yet, and the ones that think they are usually regret it six months in.

How much traffic can a single PostgreSQL instance actually handle?
With proper indexing and connection pooling, a well-specced instance (32 cores, 128GB RAM) handles tens of thousands of queries per second. Most teams hit problems in their application code long before the database itself is the ceiling.

My server is struggling. What's the first thing to check?
Run EXPLAIN ANALYZE on your slowest queries. Then check connection counts in pg_stat_activity. Then look at whether you're repeatedly fetching data that rarely changes. In that order — skipping ahead usually wastes a week.

Do we actually need Kubernetes?
Probably not yet. Kubernetes is operationally expensive. Managed container services — AWS ECS, Cloud Run, Fly.io — give you the container deployment benefits without the complexity overhead. Most startups are better served by those until they have a dedicated platform team who wants to own the cluster.

How do you handle sudden traffic spikes?
Queue-based load levelling is the most reliable pattern: spikes hit the queue, workers drain it at a pace the database can sustain. Teams that handle Lebaran or Harbolnas well pre-scale infrastructure, aggressively cache product and pricing data, and have tested their queue depth limits before the event. The ones that don't plan spend the night firefighting.

Scaling is a sequence of boring decisions made at the right moment. The teams that get it right aren't the ones who designed for 10M users on day one — they're the ones who knew which bottleneck they were actually solving when each one showed up.

If you're not sure where your system starts breaking, an architecture audit is usually faster than guessing in the dark.

SpectreDev builds high-traffic, reliable backend systems for startups in Indonesia, Australia, and Southeast Asia.