đ¤ The Symptoms
Youâve seen it happen. Maybe youâve lived it.
A startup is doing well. The monolith works. Deployments are fast. Customers are happy. Then someone reads a blog post about how Netflix runs 1,000+ microservices. The CTO gets a gleam in their eye. A senior engineer whispers: âWeâll never scale with this monolith.â
Within months, the team becomes kneeâdeep in:
- Kubernetes YAML files that nobody fully understands.
- Service meshes that add 50ms to every call.
- Distributed tracing that still canât find the slow query.
- A dozen broken builds because service A changed its protobuf and service B didnât notice.
Welcome to Microservices Fever â the architectural equivalent of using a flamethrower to light a candle.
In Article 2, we saw how AWS turned the isolationâvsâscale paradox into a superpower using cells. That required thousands of engineers, custom tooling, and a business model that justifies extreme complexity.
Now we look at the Bad side: a startup that copied the pattern without the prerequisites â and paid the price in agility, morale, and money.
đ§ The Core Misunderstanding
The Architecture Paradox, as we defined it, says:
âEvery decision that optimises for one quality (e.g., resilience) inevitably harms another (e.g., simplicity).â
Microservices are a solution to organisational scaling problems â specifically, Conwayâs Law:
âOrganisations design systems that mirror their communication structure.â
If you have 10 teams of 10 engineers each, a monolith forces them to coordinate constantly â which fails. Microservices allow each team to own and deploy its own service independently.
But if you have a single team of 10 engineers total, microservices create the very communication overhead they are supposed to solve. You end up with 10 services, 10 deployment pipelines, and still only 10 people â except now they spend half their time on âplumbingâ.
đ˘ RealâTime Example: FastPay â The Startup That Crashed Into the Paradox
The Scenario (Based on a True Story)
FastPay is a 14âmonthâold fintech startup with:
- 50,000 monthly active users
- 12 engineers (backend, frontend, DevOps â all wearing multiple hats)
- A single, wellâstructured monolith (Rails + Postgres, deployed on a few EC2 instances)
- Deploy frequency: 8â10 times per day
- P95 latency: 80ms
- Uptime: 99.95%
The monolith is not perfect. Some queries are slow. The database connection pool occasionally exhausts under peak load. But customers arenât complaining, and revenue is growing.
The Fever Strikes
The new CTO (hired from a FAANG company) declares: âWe cannot scale this monolith to a million users. We need to decouple now.â
A 6âmonth âmodernisationâ project begins. The team splits the monolith into 40 microservices:
-
user-service,wallet-service,payment-service,transaction-service,ledger-service,notification-service,kyc-service,fraud-service⌠and 32 more.
They adopt:
- Kubernetes (EKS) â because âeveryone uses itâ
- gRPC for interservice calls â because âREST is slowâ
- Istio for service mesh â because âwe need observabilityâ
- Kafka for event streaming â because âeventâdriven is the futureâ
The Aftermath (6 Months Later)
| Metric | Before (Monolith) | After (Microservices) |
|---|---|---|
| Deploy frequency | 8â10/day | 2â3/week (and often broken) |
| P95 latency | 80ms | 450ms (network hops + serialisation) |
| Time to debug a failure | 15 minutes (one log file) | 3 hours (tracing across 12 services) |
| Engineer satisfaction | 8/10 | 3/10 (âI hate YAMLâ) |
| Monthly cloud bill | $4,000 | $18,000 (control plane + load balancers) |
| Outages | ~1 per quarter (minor) | 3 in one month (two cascading, one lost transaction) |
The Killer Incident
One Friday evening, a misconfigured circuit breaker in the payment-service starts rejecting all requests to fraud-service. The payment-serviceâs retry storm exhausts the connection pool of wallet-service. wallet-service crashes. Transactions fail. Customers see 500 Internal Server Error for 90 minutes.
The teamâs distributed tracing UI shows a beautiful flame graph of the failure â but it takes them an hour just to figure out which service started the chain reaction.
The monolith would have shown a single stack trace.
â Why This Is a âBadâ Example (Not Yet âWorseâ)
FastPayâs situation is bad, but not catastrophic. They didnât lose customer data. They didnât go bankrupt. They learned a painful lesson and eventually merged 30 of the 40 services back into three âmacroservicesâ â a pattern now called the modular monolith.
Why is it âbadâ and not âworseâ? Because:
- They didnât have a single point of failure like the bankâs ESB (Article 4 preview).
- They could roll back â most of the damage was operational, not dataâcorrupting.
- They eventually admitted the mistake and simplified.
But the damage was real:
- 6 months of lost feature development (competitors gained ground).
- Team burnout â two senior engineers quit.
- Technical debt â the macroservices still carry the scars of the microservice experiment.
đ The Hidden Assumptions That Failed
Assumption #1: âMicroservices make scaling easierâ
Reality: Horizontal scaling of a monolith (more instances behind a load balancer) is trivial. You only need microservices when different parts of the system have wildly different scaling requirements (e.g., the login service needs 1000 nodes but the reporting service needs 2). FastPay didnât have that â everything scaled together.
Assumption #2: âWe can handle distributed transaction complexityâ
Reality: The team had never implemented a saga pattern or idempotency keys correctly. Their first attempt at a crossâservice payment flow dropped transactions when a service timed out. They spent 3 weeks adding compensating transactions â which introduced new bugs.
Assumption #3: âOur DevOps skills are enoughâ
Reality: Running 40 services on Kubernetes requires dedicated platform engineers. FastPayâs 12 engineers were now spending 30% of their time on cluster management, service mesh configs, and debugging network policies â time they used to spend on customer features.
Assumption #4: âThe monolith is the problemâ
Reality: The monolithâs actual issues were:
- Slow queries â missing indexes (fixed in 2 days)
- Connection pool exhaustion â improper configuration (fixed in 1 day)
- Deployment bottleneck â poor CI pipeline (fixed in 3 days)
None of these required microservices. The team solved the wrong problem because they were seduced by a fashionable pattern.
đ The âMicroservices Readinessâ Checklist
Before you even consider microservices, ask these questions honestly:
| Question | If âYesâ, Proceed Cautiously | If âNoâ, Stay Monolith |
|---|---|---|
| Do you have >50 engineers? | â You likely have team coordination problems that microservices can help with. | â Your team can sit in one room â a monolith with modules is fine. |
| Do different services have wildly different scale/risk profiles? | â e.g., a public API (1,000 req/s) vs. an admin dashboard (1 req/s). | â Everything scales together â a monolith handles it. |
| Do you have a dedicated platform team? | â Someone to build the service mesh, observability, and deployment pipelines. | â Your developers will drown in YAML and networking. |
| Can you tolerate eventual consistency across services? | â Distributed transactions are optional. | â If you need ACID across services, microservices will be painful. |
| Do you have a proven need to deploy services independently? | â e.g., the fraud service changes daily, the ledger changes monthly. | â You deploy everything together anyway â so why split? |
FastPay answered âNoâ to all but the first (they had 12 engineers, not 50). They should have stayed with a modular monolith.
đ§Š The Modular Monolith: The Underrated Alternative
A modular monolith is not a big ball of mud. It is:
- One deployment unit (single binary/container)
- Multiple bounded contexts (packages/modules with wellâdefined interfaces)
- Inâprocess calls (fast, no serialisation overhead)
- Single database (ACID transactions across modules, if needed)
- Option to split later â modules can become services by changing a configuration flag
Example: Shopify ran on a modular monolith for years, supporting millions of stores. They only started splitting into services when they hit thousands of engineers.
How FastPay Should Have Done It
-
Keep the monolith â but refactor into clear modules (
payments,users,ledger,notifications) with internal APIs (just Ruby modules, not network calls). - Fix the actual pain points â database indexes, connection pooling, CI parallelism.
- Add a âservice facadeâ â an internal gateway that can route a moduleâs API to a separate service without changing client code. This makes splitting reversible.
- Split one module at a time â when the monolithâs size genuinely hurts developer velocity.
This approach would have taken 3 months instead of 6, with zero downtime and no distributed transaction nightmares.
đ ď¸ Practical Takeaways for Developers & Architects
For Developers
| Do This | Avoid This |
|---|---|
| â Learn to build clean modular monoliths first â bounded contexts, dependency inversion | â Reaching for gRPC and Kafka before you need them |
| â Measure first â use APM tools to find real bottlenecks | â Assuming âthe monolith is slowâ without profiling |
| â Practice âstrangler patternâ â gradually extract a service while keeping the monolith alive | â Bigâbang rewrites (they almost always fail) |
For Architects
| Do This | Avoid This |
|---|---|
| â Create a âservice cost calculatorâ â estimate the added complexity (network, serialisation, deployment, monitoring) for each new service | â Adding services because âitâs cleanerâ â clean is not free |
| â Design for reversibility â can you merge two services back together without rewriting clients? | â Making irreversible choices (e.g., different databases per service) early |
| â Run a âmicroservices simulationâ â ask each team member to estimate time spent on crossâservice coordination vs. feature work | â Trusting vendor case studies (Netflixâs architecture would destroy a startup) |
| â Document the âantiâgoalsâ â explicitly write down: âWe will not introduce microservices until we have >80 engineers and 3 distinct scaling profilesâ | â Leaving the decision vague â âweâll see when we get thereâ |
đ Article 3 Summary
âMicroservices are a scalability solution for **organisations, not for **code. A 12âperson startup with a slow monolith doesnât need Kubernetes â it needs a better index.â
FastPayâs fever dream taught a painful lesson: Architectural patterns have prerequisites. AWS cells work because AWS has unlimited engineering resources and a business need for extreme isolation. Most of us donât.
The modular monolith is not a failure â itâs a strategic choice that preserves options while keeping complexity low. Split into services only when the pain of not splitting exceeds the pain of splitting.
đ Next in the SeriesâŚ
FastPayâs story was painful â but they survived. Now imagine the same mistake, but with bankâsized consequences.
Article 4 : âThe $15 Million Mistake That Killed a Bank (And What It Teaches You)â
Spoiler: It involves a âperfectâ centralised system, a hidden single point of failure, and 6 hours of total darkness.
Youâve seen bad. Next is worse. đ
Found this useful? Share it with a colleague whoâs about to propose a âmicroservice rewriteâ.
Have your own microservices horror story? Reply â misery loves company.


Top comments (0)