Ujjawal Tyagi

Posted on Apr 24

Microservices Mistakes I Wish Someone Had Warned Me About

#architecture #microservices #softwareengineering #systemdesign

Every team I've talked to that adopted microservices in the last five years has the same arc: enthusiasm at month one, regret at month nine, sober refactoring at month eighteen. At Xenotix Labs we've shipped 30+ platforms on microservices, and we've made every one of these mistakes at least once. Here are the ones I wish someone had told us about earlier.

Mistake 1: Splitting too early

The loudest signal that you're splitting too early: you can't ship a feature without coordinating four pull requests across three repos. Microservices made sense on the architecture diagram, but in practice, your team's natural unit of work spans services.

Fix: when in doubt, start as a modular monolith with clear internal boundaries. Split out a service when one of three things is true:

A separate team owns it
It needs to scale independently from the rest of the system
Its deployment cadence is fundamentally different (e.g., a low-risk service ships hourly, the rest ships weekly)

If none of those apply, you're paying microservices tax for no benefit.

Mistake 2: Splitting along the wrong seams

We split a system into user-service, address-service, and subscription-service. Made sense on paper. In practice, every "create subscription" call had to chain through all three. Latency tripled. Failure modes multiplied. A bug fixed in user-service broke address-service two weeks later.

The right seam is usually a workflow boundary, not a data-table boundary. "Customer" was the workflow. We re-merged the three back into a single customer-service and moved on with our lives.

Mistake 3: Sync HTTP everywhere

When every service calls every other service over synchronous HTTP, you've built a distributed monolith. Latency adds up. One slow service blocks the whole chain. The blast radius of an outage in payments-service reaches notifications-service even though they have nothing to do with each other.

Fix: prefer events for cross-service communication. Service A publishes "order-created". Service B consumes it on its own schedule. They don't know about each other; they know about the event shape.

We use RabbitMQ for task-style events and Kafka for high-throughput log-style events. Either way, the principle is the same: services communicate through events, not direct calls.

Mistake 4: No idempotency

Distributed systems retry. Networks fail mid-request. Workers die mid-process. If your APIs are not idempotent, retries silently create duplicate orders, double-charge customers, and generate phantom inventory.

Fix: every write API takes a client-generated idempotency_key (a UUID). The server stores the key + response. If the same key arrives again, return the cached response.

This costs one column and 10 lines of code. It saves you from 2 a.m. incidents for the rest of the company's life.

Mistake 5: Database per service, taken too literally

The textbook says "each service owns its own database." In practice, this leads to absurdities: now you need to synchronize the customer's address between three databases. You build sync jobs. They lag. Reports are inconsistent.

Fix: a shared database is fine when the data is naturally shared. The rule we use: each service has full ownership over the write path for its tables, but reads can come from a shared analytical replica. Keep transactional writes per-service; let reads scale separately.

Mistake 6: No request tracing

A microservices outage looks like this: "orders are slow." Now you have to figure out which of 12 services is the bottleneck. Without distributed tracing, you're guessing.

Fix: every inbound request gets a trace_id. Every downstream call propagates that trace_id. Every log line includes it. Every span shows up in OpenTelemetry / Jaeger / Honeycomb / Datadog.

With tracing, an outage is a 5-minute investigation. Without it, it's a 5-hour war room.

Mistake 7: Versioning by neglect

"We'll figure out versioning when we need it" is how you end up with 14 services that all crash if you change a field.

Fix: from day one, every API and every event has a version. Add fields, never rename. Deprecate slowly. Maintain backward compatibility for at least one full release cycle. Treat your internal APIs with the same discipline as external ones.

Mistake 8: One service, one database, one team — but no on-call

Microservices distribute the system. They also distribute the responsibility for keeping it up. A service without a clear on-call rotation is a service that goes down on a Sunday and nobody notices until Monday morning.

Fix: every service has a primary owner. The owner is on the on-call rotation for that service. Alerts go to the owner first. The owner's commitment: a P1 alert is acknowledged within 15 minutes, regardless of the time.

This is hard. It's also what makes microservices viable as a long-term architecture rather than a long-term liability.

What we'd tell our past selves

Microservices are a tool to scale teams and isolate failure domains. They are not a goal. If you don't have multiple teams, you don't need them. If you have multiple teams but no real isolation needs, you may not need them.

When you do need them: split slowly, split along workflow boundaries, prefer events over sync calls, make everything idempotent, trace every request, version every interface, and put a real owner on every service.

Need help architecting your stack?

Whether it's a greenfield platform or a monolith you're carefully splitting, Xenotix Labs has shipped microservices architectures across D2C commerce, real-time sports, healthtech, edtech, and more. We've made every mistake on this list and learned from each one. Reach out at https://xenotixlabs.com.

DEV Community