<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nahwin Rajan</title>
    <description>The latest articles on DEV Community by Nahwin Rajan (@nrajan).</description>
    <link>https://dev.to/nrajan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3935741%2Fc4bd9179-a482-4c88-bb64-fc251c87b2df.png</url>
      <title>DEV Community: Nahwin Rajan</title>
      <link>https://dev.to/nrajan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nrajan"/>
    <language>en</language>
    <item>
      <title>API Design for High-Throughput Systems: Rate Limiting, Versioning, Idempotency</title>
      <dc:creator>Nahwin Rajan</dc:creator>
      <pubDate>Sun, 07 Jun 2026 02:30:00 +0000</pubDate>
      <link>https://dev.to/spectredevxyz/api-design-for-high-throughput-systems-rate-limiting-versioning-idempotency-2dm4</link>
      <guid>https://dev.to/spectredevxyz/api-design-for-high-throughput-systems-rate-limiting-versioning-idempotency-2dm4</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/api-design-high-traffic-system" rel="noopener noreferrer"&gt;spectredev.xyz&lt;/a&gt;. Cross-posted here for the Dev.to community.&lt;/p&gt;

&lt;p&gt;Building APIs that hold up under real traffic takes more than fast code. Here's how rate limiting, versioning, and idempotency work and when they matter most.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;An API that works fine at 100 requests per second can become a liability at 10,000. Not because the logic changed, but because the assumptions baked into the design stop holding at scale. Clients retry aggressively. Traffic spikes unpredictably. Downstream services slow down and back-pressure propagates upstream. Payment confirmations arrive twice.&lt;/p&gt;

&lt;p&gt;Most of these failure modes are predictable. The patterns that prevent them rate limiting, versioning, and idempotency aren't exotic engineering. They're table stakes for any API that handles real traffic. The problem is most teams implement them as afterthoughts, bolted on when something has already broken in production.&lt;/p&gt;

&lt;p&gt;This post is about building them in from the start.&lt;/p&gt;




&lt;h2&gt;
  
  
  Rate Limiting: Protecting Your System From Yourself and Everyone Else
&lt;/h2&gt;

&lt;p&gt;Rate limiting is often framed as a defence against malicious clients bots, scrapers, bad actors. That's part of it. But the more important use case is protecting your system from legitimate traffic that exceeds what your infrastructure can actually serve.&lt;/p&gt;

&lt;p&gt;A flash sale on a regional e-commerce platform. A push notification that sends 2 million users to the same product page simultaneously. A third-party integration that has a bug causing it to retry in a tight loop. All of these are real traffic patterns, all of them are potentially legitimate, and all of them can take down an unprotected API.&lt;/p&gt;

&lt;p&gt;Rate limiting is how you define the contract: here's what this system is designed to handle, and here's what happens when you exceed it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The three most common algorithms:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token bucket&lt;/strong&gt; gives each client a bucket that fills with tokens at a fixed rate. Each request consumes a token. When the bucket is empty, requests are rejected or queued. The bucket has a maximum capacity, which means clients can "save up" for short bursts useful for APIs where occasional spikes are normal but sustained high volume is not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Leaky bucket&lt;/strong&gt; processes requests at a fixed output rate regardless of input rate. Excess requests queue (or are dropped). It smooths traffic more aggressively than token bucket and is useful when you need consistent downstream throughput for example, protecting a database that can't handle burst writes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fixed window&lt;/strong&gt; counts requests in a fixed time window (say, 1,000 requests per minute) and resets at the window boundary. Simple to implement, but has an edge case: a client can send 1,000 requests at 11:59 and another 1,000 at 12:00, effectively hitting 2,000 requests in two minutes without technically violating the rule. Sliding window counters fix this but at higher implementation cost.&lt;/p&gt;

&lt;p&gt;The choice between them depends on your traffic pattern and what you're protecting. For most external-facing APIs, token bucket with a sliding window variant is a reasonable default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where to implement it:&lt;/strong&gt; as early in the request path as possible. An API gateway (Kong, AWS API Gateway, Nginx with rate limiting modules) handles this before your application code even sees the request. This matters because rate limiting at the application layer still consumes application resources to reject the request. At the gateway layer, you shed load before it reaches your compute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The response matters too.&lt;/strong&gt; A rejected request should return HTTP 429 with a &lt;code&gt;Retry-After&lt;/code&gt; header telling the client when it can try again. A &lt;code&gt;X-RateLimit-Limit&lt;/code&gt;, &lt;code&gt;X-RateLimit-Remaining&lt;/code&gt;, and &lt;code&gt;X-RateLimit-Reset&lt;/code&gt; header set tells well-behaved clients how to pace themselves. Design for good clients, not just bad ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/how-to-scale-startup-backend" rel="noopener noreferrer"&gt;How to build a backend that scales from 100 to 10 million users&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  API Versioning: Making Change Without Breaking Your Consumers
&lt;/h2&gt;

&lt;p&gt;APIs are promises. The moment an external client a mobile app, a partner integration, a third-party developer starts depending on your API, changing it carries risk. Versioning is how you manage that risk without freezing your system in amber.&lt;/p&gt;

&lt;p&gt;The hard truth: there is no perfect versioning strategy. Every approach involves trade-offs, and the right one depends on how your API is consumed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;URI versioning&lt;/strong&gt; (&lt;code&gt;/v1/orders&lt;/code&gt;, &lt;code&gt;/v2/orders&lt;/code&gt;) is the most common and the most visible. The version is explicit in the URL, easy to route at the gateway level, and easy to document. The downside is it can encourage treating versions as separate products rather than as an evolving contract teams end up maintaining &lt;code&gt;/v1&lt;/code&gt; and &lt;code&gt;/v2&lt;/code&gt; as parallel codebases, which compounds maintenance burden quickly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Header versioning&lt;/strong&gt; (&lt;code&gt;Accept: application/vnd.spectredev.v2+json&lt;/code&gt;) keeps URLs clean and is arguably more semantically correct the resource identity doesn't change, only the representation does. The trade-off is it's less visible, harder to test in a browser, and more complex to route at the infrastructure layer. It's the right approach for mature API programs; it's probably over-engineering for most startups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query parameter versioning&lt;/strong&gt; (&lt;code&gt;/orders?version=2&lt;/code&gt;) is easy to implement and easy to test, but mixes versioning concerns with resource-addressing concerns. Use it for internal tooling if it makes life easier. Don't use it for public APIs.&lt;/p&gt;

&lt;p&gt;The versioning strategy matters less than the discipline around when you version. A change that adds a new optional field to a response is backwards compatible don't version it. A change that removes a field, renames a field, or changes a field's type is breaking version it. A change that alters the semantics of an existing field (same name, different meaning) is the most dangerous kind because it won't cause a client to fail immediately; it'll cause it to fail silently with wrong data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deprecation is part of the contract.&lt;/strong&gt; When you release &lt;code&gt;/v2&lt;/code&gt;, set a clear deprecation timeline for &lt;code&gt;/v1&lt;/code&gt; six months is common for external APIs, three months is often enough for internal ones. Send &lt;code&gt;Deprecation&lt;/code&gt; and &lt;code&gt;Sunset&lt;/code&gt; response headers on every &lt;code&gt;/v1&lt;/code&gt; request. Log which clients are still hitting deprecated versions. Reach out to those clients directly before you pull the plug. The teams that handle API versioning well treat it as a communication problem as much as a technical one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Idempotency: The Pattern That Saves You When the Network Lies
&lt;/h2&gt;

&lt;p&gt;Networks are unreliable. Clients time out and retry. Load balancers reroute mid-request. Mobile apps lose connectivity at exactly the wrong moment and come back online assuming the last request failed.&lt;/p&gt;

&lt;p&gt;In a read-heavy API, this is mostly fine fetching the same resource twice is harmless. In a write-heavy API, it's a serious problem. A payment processed twice is a real financial error. An order created twice is a real fulfilment problem. A user created twice is a real data integrity problem.&lt;/p&gt;

&lt;p&gt;Idempotency is the property that says: sending the same request multiple times has the same effect as sending it once. Implementing it correctly is one of the most valuable things you can do for an API that handles financial transactions, order management, or any operation where duplicates are costly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The standard implementation&lt;/strong&gt; uses an idempotency key a unique identifier generated by the client and sent with each request, typically as a header (&lt;code&gt;Idempotency-Key: &amp;lt;UUID&amp;gt;&lt;/code&gt;). The server stores the key and the result of the first successful processing. On subsequent requests with the same key, it returns the stored result without re-executing the operation.&lt;/p&gt;

&lt;p&gt;The storage mechanism is usually a fast key-value store (Redis works well here) with a TTL keys don't need to live forever, just long enough to cover the client's retry window. 24 hours is a common default for payment APIs; 7 days is more conservative for workflows with longer retry cycles.&lt;/p&gt;

&lt;p&gt;A concrete example: a GoPay or OVO disbursement request that times out on the client side. Did the money move or not? Without idempotency, retrying is risky. With an idempotency key, the client retries with the same key, the server checks its store, sees the operation already completed, and returns the original successful response. No double disbursement. The client gets the confirmation it needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to store:&lt;/strong&gt; at minimum, the idempotency key, the response status code, and the response body. Some implementations also store the request body and validate that subsequent requests with the same key have the same body if a client sends different parameters with the same idempotency key, that's a client bug, and you should return a 422 rather than silently processing the new parameters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Idempotency keys should be client-generated.&lt;/strong&gt; The client owns the key because the client is the one recovering from failure. Server-generated idempotency would require the client to have already received the key, which assumes the first request succeeded defeating the purpose.&lt;/p&gt;

&lt;p&gt;Stripe's API documentation on idempotency is one of the clearest practical references for this pattern, and worth reading if you're implementing payment-adjacent functionality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/what-is-database-sharding-startup" rel="noopener noreferrer"&gt;What is database sharding and when does your startup actually need it&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How These Three Patterns Work Together
&lt;/h2&gt;

&lt;p&gt;Rate limiting, versioning, and idempotency are often treated as separate concerns. In a well-designed high-throughput API, they interact.&lt;/p&gt;

&lt;p&gt;Rate limiting shapes the load your system accepts. Idempotency handles the safe retry behaviour when requests fail. Versioning ensures that as you improve both of those mechanisms over time, you can do so without breaking existing clients.&lt;/p&gt;

&lt;p&gt;A practical scenario: you're running a B2B payments API used by Indonesian SME accounting software integrators similar to the kind of integrations built on top of platforms like Jurnal or Accurate. Your rate limits are per API key, not per IP, because your clients are businesses making requests on behalf of thousands of end users. Your idempotency implementation covers all &lt;code&gt;POST&lt;/code&gt; and &lt;code&gt;PATCH&lt;/code&gt; endpoints because those are the ones with real-world side effects. Your versioning is URI-based with a 6-month deprecation cycle because your clients are third-party developers who need predictability.&lt;/p&gt;

&lt;p&gt;That's not a complex system. It's a coherent one. Each decision reinforces the others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One thing to not overlook:&lt;/strong&gt; documentation. An API with perfect rate limiting, versioning, and idempotency that is poorly documented will still fail in production because clients will implement integrations incorrectly, hit rate limits they didn't know existed, and retry without idempotency keys because they didn't know they needed them. The OpenAPI spec is not documentation. It's a schema. Documentation explains the &lt;em&gt;why&lt;/em&gt; and the &lt;em&gt;what-happens-when&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/monolith-vs-microservices-startup" rel="noopener noreferrer"&gt;Monolith vs modular monolith vs microservices: the honest decision framework&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  A Note on When to Build This Versus When to Buy It
&lt;/h2&gt;

&lt;p&gt;If you're building a public-facing API today, you probably don't need to implement rate limiting or versioning routing from scratch. API gateways AWS API Gateway, Kong, Apigee, or the gateway layer of a managed Kubernetes platform handle the infrastructure concerns and let your application focus on business logic.&lt;/p&gt;

&lt;p&gt;What you do need to implement yourself is idempotency, because that's specific to your domain logic and your data model. No gateway can know whether a payment request has already been processed only your application can.&lt;/p&gt;

&lt;p&gt;The mistake we see most often is teams building sophisticated custom rate limiting middleware in their application framework when a gateway would have served them at a tenth of the cost while simultaneously having no idempotency implementation at all for their payment endpoints, where the stakes are highest.&lt;/p&gt;

&lt;p&gt;Spend your engineering effort where it can't be bought.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: What HTTP status code should I return when a request is rate limited?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: HTTP 429 (Too Many Requests). Always include a &lt;code&gt;Retry-After&lt;/code&gt; header indicating when the client can next attempt the request either as a number of seconds or an HTTP date. Without this, well-behaved clients can't back off intelligently and you'll see retry storms that compound the load problem you were trying to prevent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How do I handle idempotency for operations that involve multiple steps or downstream service calls?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: This is the hard case. If your operation involves multiple downstream calls update a record, charge a payment, send a notification idempotency needs to cover the entire sequence, not just individual steps. The safest pattern is to treat the whole operation as a saga: each step is idempotent individually, and the overall operation can be retried from any point of failure. This requires careful state tracking (typically in your database, not just a cache) and is a significant design investment. For most teams, the first step is making the critical path idempotent and accepting that edge cases in complex sagas require manual reconciliation until you've hit that problem enough times to justify the engineering cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Should internal APIs services talking to each other within our own system also be versioned?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: With less formality, yes. If two internal services are deployed independently, a breaking change in one can break the other mid-deployment. Contract testing (tools like Pact) is often a better fit for internal APIs than explicit versioning, because it catches breaking changes before deployment rather than managing them after. For services deployed together or tightly coupled by design, a shared contract in code (a shared types library, a protobuf schema) is usually cleaner than versioning the HTTP surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What's the right granularity for rate limits per IP, per user, per API key?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: It depends on who your clients are. Per-IP is appropriate for unauthenticated public endpoints where you don't yet know who the caller is. Per-user limits are right for authenticated user-facing endpoints where you're protecting against individual abuse. Per-API-key limits are right for B2B or developer APIs where the client is an organisation making requests on behalf of many end users throttling by IP would punish them for traffic that's legitimately spread across many users. Most mature APIs use a combination: unauthenticated requests rate-limited by IP, authenticated requests by API key or user ID, with different limits for different endpoint tiers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How long should idempotency keys be stored?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Long enough to cover your client's retry window with meaningful margin. For payment APIs, 24 hours is the industry norm Stripe uses this, for example. For longer-running async workflows where a client might retry over days, 7 days is more conservative. There's a storage cost to longer TTLs if you're storing full response bodies at volume, but at most scales it's negligible. Err on the side of longer and trim based on actual storage pressure, not upfront assumptions.&lt;/p&gt;




&lt;p&gt;Rate limiting, versioning, and idempotency aren't the most glamorous parts of API design. They won't make it into your launch post. But they're the difference between an API that holds up when traffic gets real and one that becomes a source of production incidents at the worst possible moment. The patterns are well-understood. The implementation cost is manageable. The cost of not doing it is paid in pages, customer refunds, and emergency architecture work at 2am.&lt;/p&gt;

&lt;p&gt;Build it in from the start. Your future on-call self will notice.&lt;/p&gt;

</description>
      <category>api</category>
      <category>backend</category>
      <category>architecture</category>
      <category>webdev</category>
    </item>
    <item>
      <title>What Is Database Sharding — and When Does Your Startup Actually Need It</title>
      <dc:creator>Nahwin Rajan</dc:creator>
      <pubDate>Sun, 31 May 2026 02:30:00 +0000</pubDate>
      <link>https://dev.to/spectredevxyz/what-is-database-sharding-and-when-does-your-startup-actually-need-it-4d0f</link>
      <guid>https://dev.to/spectredevxyz/what-is-database-sharding-and-when-does-your-startup-actually-need-it-4d0f</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/what-is-database-sharding-startup" rel="noopener noreferrer"&gt;spectredev.xyz&lt;/a&gt;. Cross-posted here for the Dev.to community.&lt;/p&gt;

&lt;p&gt;Database sharding explained without the hype. Learn what it actually is, the real cost of implementing it, and whether your startup genuinely needs it yet.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Most startups don't need database sharding. There. That's the most useful thing this post can tell you upfront.&lt;/p&gt;

&lt;p&gt;But the question of &lt;em&gt;when&lt;/em&gt; you do need it and what it actually costs to implement is worth understanding before you hit the wall, not after. Because by the time sharding becomes urgent, you're usually operating under pressure, and pressure is a terrible time to make irreversible architectural decisions.&lt;/p&gt;

&lt;p&gt;Here's what database sharding is, how it works, and the honest framework for deciding whether it belongs in your near-term roadmap.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Database Sharding Actually Is
&lt;/h2&gt;

&lt;p&gt;A database shard is a horizontal partition of your data. Instead of one database holding all your rows, you split the dataset across multiple database instances each instance (a "shard") holding a subset of the data.&lt;/p&gt;

&lt;p&gt;The key word is &lt;em&gt;horizontal&lt;/em&gt;. Sharding is not the same as replication, where you copy the same data to multiple servers for read scaling or redundancy. In sharding, each record lives in exactly one shard. The total dataset is distributed, not duplicated.&lt;/p&gt;

&lt;p&gt;The mechanism that makes this work is the shard key the field you use to determine which shard a given record belongs to. A common example: shard by &lt;code&gt;user_id&lt;/code&gt;. Users 1–1,000,000 go to Shard A. Users 1,000,001–2,000,000 go to Shard B. Your application (or a routing layer) knows which shard to query for a given user.&lt;/p&gt;

&lt;p&gt;Simple in concept. Genuinely complex in practice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/how-to-scale-startup-backend" rel="noopener noreferrer"&gt;How to build a backend that scales from 100 to 10 million users&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Sharding Solves and Why Most Teams Don't Have It Yet
&lt;/h2&gt;

&lt;p&gt;Sharding exists to solve one specific problem: a single database server that can no longer handle your write volume or data volume, and where vertical scaling (bigger hardware) is no longer a viable or cost-effective option.&lt;/p&gt;

&lt;p&gt;Read scaling is a different problem with different solutions. If you have heavy read load, read replicas solve that cleanly and at a fraction of the operational complexity. Connection pooling, query optimisation, caching layers like Redis these address most database performance issues that startups encounter at early to mid-scale.&lt;/p&gt;

&lt;p&gt;A useful heuristic: if your database is struggling, sharding is probably not the first answer. It's usually the last answer after you've exhausted the simpler ones.&lt;/p&gt;

&lt;p&gt;The rough sequence most production systems follow before sharding becomes necessary:&lt;/p&gt;

&lt;p&gt;First, query optimisation and proper indexing this alone fixes the majority of "the database is slow" problems we see in audit engagements. Second, a caching layer for frequently-read data. Third, read replicas to offload read traffic from the primary. Fourth, connection pooling. Fifth, vertical scaling (larger instance, more RAM, faster disks). Sixth and only after all of the above sharding for write-heavy workloads that have outgrown a single primary.&lt;/p&gt;

&lt;p&gt;Most startups are stuck somewhere between step one and step three. Sharding is step six.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Cost of Sharding
&lt;/h2&gt;

&lt;p&gt;This is where the blog posts usually get dishonest. They explain how sharding works, show you the architecture diagrams, and stop there. The operational reality deserves more attention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-shard queries become painful.&lt;/strong&gt; If your shard key is &lt;code&gt;user_id&lt;/code&gt; and you need to run an analytics query across all users say, "show me all orders placed in the last 24 hours across all regions" you now have to query every shard and aggregate the results in your application layer. Either that, or you maintain a separate analytics database that aggregates across shards. Neither is free.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transactions get complicated.&lt;/strong&gt; In a single database, ACID transactions are straightforward. Across shards, any operation that touches records in two different shards requires a distributed transaction, which is a significantly harder problem to solve correctly. In practice, most teams redesign their data model to avoid cross-shard transactions rather than implement them which is often the right call, but it constrains how you can model your domain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rebalancing is not painless.&lt;/strong&gt; Your shards will not stay balanced forever. If you shard by &lt;code&gt;user_id&lt;/code&gt; range and your power users are all in the upper ID range, one shard gets hammered while the others sit idle a "hot shard" problem. Fixing it means rebalancing data across shards while the system is live. That's a non-trivial operational exercise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema migrations get harder.&lt;/strong&gt; Running a migration on one database is already a careful process on a production system. Running coordinated migrations across twelve shards, ensuring they complete consistently, is a different category of problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your application layer has to know about it.&lt;/strong&gt; Unlike read replicas (which are often transparent to the application), sharding usually requires the application to participate in routing decisions. That's code your team now owns and maintains.&lt;/p&gt;

&lt;p&gt;None of this means sharding is the wrong choice when you actually need it. Tokopedia, Gojek, and Traveloka all run sharded databases at scale because they have the traffic and data volumes that genuinely require it. But they also have dedicated platform engineering teams managing that infrastructure. That context matters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/monolith-vs-microservices-startup" rel="noopener noreferrer"&gt;Monolith vs modular monolith vs microservices: the honest decision framework&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Sharding Strategies: The Four Main Approaches
&lt;/h2&gt;

&lt;p&gt;When sharding is the right call, how you shard matters as much as whether you shard. There are four primary strategies, and each has a different set of trade-offs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Range-based sharding&lt;/strong&gt; partitions data by a continuous range of the shard key value user IDs 1 to 1M on Shard A, 1M to 2M on Shard B, and so on. It's simple to understand and implement, but vulnerable to the hot shard problem if your data isn't evenly distributed across the range.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hash-based sharding&lt;/strong&gt; applies a hash function to the shard key and uses the result to determine placement. This distributes data more evenly, which reduces hot shards, but it destroys range locality you can no longer efficiently query "all users with IDs between X and Y" because those records are now scattered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Directory-based sharding&lt;/strong&gt; maintains a lookup table that maps shard keys to shards. This is the most flexible approach and allows you to rebalance shards without changing your hashing logic. The trade-off is the lookup table itself becomes a dependency a bottleneck and a single point of failure if not handled carefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Geographic sharding&lt;/strong&gt; partitions data by region Southeast Asian users on one cluster, Australian users on another. This is particularly relevant for companies operating across multiple markets with data residency requirements. Indonesia's data localisation regulations under Government Regulation No. 71 of 2019 (PP 71/2019) require certain categories of personal data to be stored on infrastructure physically located in Indonesia. Geographic sharding can be part of how you comply with that, though the regulatory picture is more nuanced than just where the database sits.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Your Startup Actually Needs to Start Thinking About Sharding
&lt;/h2&gt;

&lt;p&gt;Specific signals matter more than vague thresholds, but here are the concrete ones worth paying attention to.&lt;/p&gt;

&lt;p&gt;Your write throughput has exceeded what a single primary can handle even after connection pooling and hardware upgrades. You're seeing consistent replication lag on your read replicas that's impacting user experience. Your largest tables have grown past the point where a single-server B-tree index can serve queries within acceptable latency. Your data volume is approaching the practical storage limits of a single instance and vertical scaling costs have become disproportionate.&lt;/p&gt;

&lt;p&gt;In terms of rough order of magnitude: for most well-optimised PostgreSQL or MySQL setups on decent hardware, you can handle tens of thousands of write transactions per second before you genuinely exhaust single-node capacity. Many startups that feel they need sharding are running at a fraction of that and their actual problem is unoptimised queries, missing indexes, or unnecessary write amplification in their application code.&lt;/p&gt;

&lt;p&gt;A practical test: before pursuing sharding, run a proper database performance audit. Look at your slow query log. Examine your write patterns. Check whether your schema design is creating unnecessary lock contention. We've worked with teams who were convinced they needed sharding and found, after a structured audit, that three index changes and a query rewrite cut their database load by 60 percent. That bought them 18 months of headroom without touching the architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.spectredev.xyz/engineering/technical-debt/software-system-rewrite-without-downtime" rel="noopener noreferrer"&gt;How to run a technical debt audit (a guide for non-engineer founders)&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  A Concrete Example: Sharding Decision for a Payments Platform
&lt;/h2&gt;

&lt;p&gt;Consider a fintech startup processing payments across Indonesia peer-to-peer transfers, bill payments, e-wallet top-ups. They come to us at around 500,000 active users and 200,000 transactions per day, worried about whether their PostgreSQL single-node setup will survive projected growth.&lt;/p&gt;

&lt;p&gt;At 200,000 transactions per day, they're writing roughly 2–3 records per transaction (the transaction record, a ledger entry, a notification event). That's 400,000–600,000 writes per day, which averages to under 10 writes per second. A well-configured PostgreSQL instance can comfortably handle 5,000–10,000 writes per second. They have two to three orders of magnitude of headroom.&lt;/p&gt;

&lt;p&gt;The right conversation isn't sharding it's ensuring their indexes are correct, their connection pooling is configured properly, and they have a read replica absorbing their reporting queries. That architecture will take them past 5 million users without fundamental change.&lt;/p&gt;

&lt;p&gt;Now imagine they've grown to 5 million active users and are processing 50 million transactions per day the kind of volume GoPay was handling in its growth phase. At that scale, write throughput genuinely becomes a single-node constraint, and the case for sharding, probably by &lt;code&gt;user_id&lt;/code&gt; with a consistent hash, becomes real and defensible.&lt;/p&gt;

&lt;p&gt;The architecture decision should follow the traffic, not anticipate it by three years.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: What's the difference between sharding and partitioning?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Partitioning is typically done within a single database instance PostgreSQL table partitioning, for example, splits one logical table into physical sub-tables on the same server. It improves query performance and manageability but doesn't distribute load across multiple servers. Sharding distributes data across multiple separate database instances. Partitioning is often a useful step before sharding and can buy you significant headroom on its own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Can managed databases like Amazon RDS or Google Cloud SQL handle sharding for me?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Not automatically, no. RDS and Cloud SQL manage replication, backups, failover, and vertical scaling, but they don't shard your data across instances on your behalf. Amazon Aurora has some features that push in this direction for read scaling, and Google Spanner is a distributed database that handles horizontal scaling transparently but Spanner is a different product category with different cost and complexity trade-offs. For most startups, managed databases like RDS are the right choice well before sharding is relevant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is MongoDB or Cassandra easier to shard than PostgreSQL?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: MongoDB and Cassandra have sharding (or in Cassandra's case, distributed architecture) built into their core design. PostgreSQL and MySQL require more explicit work to shard, whether through application-level routing, Citus, or tools like Vitess. That said, "easier to shard" shouldn't drive your database choice. The database that fits your data model and query patterns is more important than one that theoretically scales more easily because most teams never reach the scale where sharding is necessary regardless of which database they chose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: If we shard now while we're small, won't that make scaling easier later?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: This is the most common trap. Implementing sharding before you need it adds immediate complexity, slows down development, and optimises for a future scale problem that may never materialise in the form you anticipated. Your shard key choice may turn out to be wrong for your actual access patterns and changing a shard key on a live system is painful. Build with clean boundaries and a data model that could accommodate sharding later. Don't implement the sharding itself until the signals are there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: We're a non-technical founder. How do we know if our CTO is recommending sharding prematurely?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Ask two questions. First: what have we already tried before reaching this conclusion? The answer should include read replicas, caching, query optimisation, and vertical scaling. If sharding is being proposed as a first response to performance issues, that's a flag. Second: what's our current write throughput and how does it compare to the limits of our current setup? If the answer is vague, push for numbers. Real performance problems have measurable symptoms.&lt;/p&gt;




&lt;p&gt;Getting the database architecture wrong in either direction is expensive too early and you're carrying operational complexity that slows your team down; too late and you're doing emergency architecture work under production pressure. The honest answer is that most startups reading this are further from needing sharding than they think, and the simpler scaling levers are worth pulling first.&lt;/p&gt;

&lt;p&gt;When you do hit genuine write-scale constraints, the decision about how to shard and what to extract into separate data stores is one worth getting external perspective on before committing. The choices you make at that stage are difficult to unwind.&lt;/p&gt;

</description>
      <category>database</category>
      <category>postgres</category>
      <category>backend</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Monolith vs Modular Monolith vs Microservices: The Honest Decision Framework</title>
      <dc:creator>Nahwin Rajan</dc:creator>
      <pubDate>Sun, 24 May 2026 02:00:00 +0000</pubDate>
      <link>https://dev.to/spectredevxyz/monolith-vs-modular-monolith-vs-microservices-the-honest-decision-framework-5bb5</link>
      <guid>https://dev.to/spectredevxyz/monolith-vs-modular-monolith-vs-microservices-the-honest-decision-framework-5bb5</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/monolith-vs-microservices-startup" rel="noopener noreferrer"&gt;spectredev.xyz&lt;/a&gt;. Cross-posted here for the Dev.to community.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Choosing between monolith, modular monolith, and microservices? Here's the honest, opinionated framework your startup actually needs. Stop copying Netflix.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Your architecture choice shouldn't be driven by what Netflix or Uber is doing. It should be driven by where you are right now your team size, your traffic, your deployment maturity, and your runway. The monolith vs microservices debate has a real answer for your situation. It's just not the one most blog posts give you.&lt;/p&gt;

&lt;p&gt;Here's the framework we use when helping startups and growth-stage companies make this call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Most Teams Get This Wrong From the Start
&lt;/h2&gt;

&lt;p&gt;The mistake I see most often: founders read about how Airbnb or Grab migrated from a monolith to microservices, and they decide to build microservices from day one because that's what scale looks like.&lt;/p&gt;

&lt;p&gt;It isn't. That's what &lt;em&gt;post-scale&lt;/em&gt; looks like. There's a difference.&lt;/p&gt;

&lt;p&gt;When those companies broke apart their monoliths, they had hundreds of engineers, mature CI/CD pipelines, dedicated platform teams, and years of operational experience with their own domain boundaries. They weren't starting fresh. They were solving a problem that emerged from growth, not anticipating one that might never arrive.&lt;/p&gt;

&lt;p&gt;Starting with microservices before you have product-market fit is one of the fastest ways to burn engineering resources on infrastructure instead of product. We've seen it happen. It's painful to watch.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Monolith: Unfairly Maligned
&lt;/h2&gt;

&lt;p&gt;A monolith isn't a bad architecture. It's a starting point. And for most teams honestly, for teams up to 10–15 engineers with a single product it's the right one.&lt;/p&gt;

&lt;p&gt;In a traditional monolith, all your application code lives in one deployable unit. One codebase, one database, one deployment pipeline. The benefits are real and often underappreciated.&lt;/p&gt;

&lt;p&gt;Development velocity is genuinely faster early on. There's no network latency between services, no distributed transaction complexity, no service discovery overhead. You can refactor across the entire codebase in one shot. Debugging is straightforward because everything runs in one process.&lt;/p&gt;

&lt;p&gt;The problems come later. As your team grows, the codebase becomes harder to navigate. Different teams start stepping on each other's work. Deployments get slow and risky because every change ships everything. That's when the architecture starts to fight you, not help you.&lt;/p&gt;

&lt;p&gt;The real signal you've outgrown a monolith isn't traffic. It's team friction and deployment pain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/how-to-scale-startup-backend" rel="noopener noreferrer"&gt;How to build a backend that scales from 100 to 10 million users&lt;/a&gt;how-to-scale-startup-backend)&lt;/p&gt;




&lt;h2&gt;
  
  
  The Modular Monolith: The Option Nobody Talks About Enough
&lt;/h2&gt;

&lt;p&gt;Here's the counter-intuitive point: for most startups that think they need microservices, a modular monolith is actually the better answer.&lt;/p&gt;

&lt;p&gt;A modular monolith is still a single deployable unit, but the internals are deliberately structured into isolated modules each with clear boundaries, their own data ownership, and strict rules about how modules interact. Think of it as microservices discipline inside a monolith's deployment model.&lt;/p&gt;

&lt;p&gt;The practical result: you get much of the architectural clarity of microservices without the operational overhead. You can enforce team ownership of modules. You can move faster without breaking unrelated parts of the system. And when you eventually decide to extract a service, the module boundary makes it a surgical operation instead of a painful untangling.&lt;/p&gt;

&lt;p&gt;Shopify ran a monolith for years, and the work they did to make it modular what they called "componentisation" is one of the more honest engineering stories out there. It wasn't glamorous. It was just effective.&lt;/p&gt;

&lt;p&gt;A modular monolith is the architecture that earns microservices. You build the discipline first, then you extract services when the operational case is clear.&lt;/p&gt;




&lt;h2&gt;
  
  
  Microservices: When They Actually Make Sense
&lt;/h2&gt;

&lt;p&gt;Microservices are the right answer for a specific set of conditions. All of them need to be true, not just one.&lt;/p&gt;

&lt;p&gt;Your team is large enough that different groups genuinely need independent deployment cycles. You have parts of your system with radically different scaling characteristics say, a real-time notification service that spikes to millions of events per second while your invoicing service handles hundreds of requests per day. You have the platform engineering capacity to run container orchestration, service meshes, distributed tracing, and on-call rotations for multiple services. Your domain boundaries are well understood because you've built and operated the system long enough to know where they should be.&lt;/p&gt;

&lt;p&gt;If you're missing any of those, microservices will slow you down.&lt;/p&gt;

&lt;p&gt;The operational surface area is real. In a distributed system, you're now debugging network partitions, handling partial failures, managing schema migrations across service boundaries, and coordinating deployments across multiple repositories. Each of those is a solvable problem. Collectively, they require a team that has the headspace to solve them.&lt;/p&gt;

&lt;p&gt;One of SpectreDev's clients a Series A fintech came to us after attempting a microservices migration with a team of six engineers. Eighteen months in, they had eight services, three of which couldn't be deployed independently because of undocumented shared state. The team was spending more time on infrastructure incidents than feature work. We spent three months collapsing it back to a modular monolith before rebuilding the extraction incrementally. The irony isn't lost on anyone.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/what-is-database-sharding-startup" rel="noopener noreferrer"&gt;What is database sharding and when does your startup actually need it&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Decision Framework
&lt;/h2&gt;

&lt;p&gt;Use this as a starting point. It's opinionated and it's supposed to be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build a monolith if:&lt;/strong&gt; you're pre-product-market fit, your team is under 8 engineers, and your primary constraint is development speed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Refactor toward a modular monolith if:&lt;/strong&gt; you've found product-market fit, your team is growing, and you're starting to feel the organisational friction of a shared codebase but you don't yet have the platform maturity for distributed systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extract services from that modular monolith if:&lt;/strong&gt; a specific module has a genuinely different deployment cadence, a clearly different scaling profile, or a different team ownership model that justifies the operational overhead. Extract one service, operate it well, then decide on the next.&lt;/p&gt;

&lt;p&gt;Notice what's not on that list: "because we expect to have 10 million users someday." That's not an architecture decision. That's wishful thinking. Architect for where you are and the next logical growth phase, not for a ceiling you may never approach.&lt;/p&gt;

&lt;p&gt;For Indonesian companies specifically, there's an additional layer to consider: talent availability. Engineers comfortable with distributed systems operations Kubernetes, service mesh, distributed tracing are a thinner slice of the market in Jakarta than in San Francisco. A modular monolith your current team can operate confidently is worth more than a microservices setup that creates a hiring dependency you can't fill.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Example: How a Regional E-Commerce Startup Should Think About This
&lt;/h2&gt;

&lt;p&gt;Consider a regional e-commerce platform think something operating across Indonesia, Malaysia, and the Philippines in the 50,000–200,000 active users range and growing.&lt;/p&gt;

&lt;p&gt;At that scale, the right architecture is almost certainly a modular monolith. You'd want clearly isolated modules for the product catalogue, order management, payments (especially given regional payment methods like GoPay, OVO, and GrabPay that each have their own integration logic), and logistics tracking.&lt;/p&gt;

&lt;p&gt;None of those need independent deployments yet. But structuring them as modules means when you hit 2 million users and the payments module is getting hammered during 11.11 flash sales, you can extract it as a standalone service with a clear API contract already in place. The groundwork is done.&lt;/p&gt;

&lt;p&gt;The alternative building microservices for each of those domains at 50,000 users would add months of infrastructure work before you've even proven the product works in all three markets.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.spectredev.xyz/engineering/scalable-architecture/how-to-scale-startup-backend" rel="noopener noreferrer"&gt;How to build a backend that scales from 100 to 10 million users&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Can I start with a monolith and migrate to microservices later without rewriting everything?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Yes and this is actually the intended migration path. The key is maintaining clean module boundaries inside your monolith from early on. If you've built a well-structured modular monolith, extracting a service means defining the API boundary (it probably already exists as a module interface), setting up the deployment infrastructure for that service, and gradually moving traffic. It's still significant work, but it's not a rewrite. A "big bang" monolith with tangled dependencies is the one that requires a painful rewrite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: At what team size should I seriously consider moving to microservices?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Team size isn't the only variable, but a rough heuristic: when you have more than 15–20 engineers working on the same codebase and deployment friction is measurably slowing you down, it's worth having the conversation. The more useful indicator is whether you can deploy changes to one part of the system without risking unrelated parts and whether the answer is "no" consistently enough to hurt you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Are microservices harder to secure than a monolith?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Differently hard, not necessarily harder. In a monolith, your attack surface is more contained but a compromise can affect the whole system. In microservices, you have more network attack surface and need to secure service-to-service communication (mTLS, service accounts, network policies). The security posture depends entirely on your implementation. Neither architecture is inherently more secure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What about serverless is that a fourth option?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Serverless functions can be a useful pattern within any of these architectures, but they're not a replacement architecture. You can have serverless functions inside a modular monolith (for async event processing, for example), and you can have them inside a microservices system. Serverless introduces its own complexity around cold starts, stateless design, and vendor lock-in that most teams underestimate. For most startups, it's a tool, not a strategy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: We're a non-technical founder. How do we evaluate whether our current tech team is recommending the right architecture?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Ask them to explain the trade-offs, not just the choice. A strong engineer can tell you what you're giving up with their recommended approach, not just what you're gaining. If the pitch for microservices doesn't include an honest discussion of operational overhead, distributed system complexity, and your team's current capability to manage it that's a flag. The best architecture recommendation for your stage should feel slightly boring. Exciting architecture choices are usually expensive ones.&lt;/p&gt;




&lt;p&gt;The right architecture is the one that lets your team ship product, maintain reliability, and adapt as your business changes. That's not microservices for most of you reading this. It's probably a well-structured monolith or modular monolith that gives you the discipline to grow into something more distributed when the evidence actually demands it.&lt;/p&gt;

&lt;p&gt;If you're at the stage where these decisions are becoming real whether you're building 0 to 1 or hitting the walls of a system you've outgrown this is exactly what we work through with clients at SpectreDev.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>microservices</category>
      <category>backend</category>
      <category>softwaredesign</category>
    </item>
    <item>
      <title>How to Build a Backend That Scales from 100 to 10M Users</title>
      <dc:creator>Nahwin Rajan</dc:creator>
      <pubDate>Sun, 17 May 2026 04:32:21 +0000</pubDate>
      <link>https://dev.to/spectredevxyz/how-to-build-a-backend-that-scales-from-100-to-10m-users-3g8k</link>
      <guid>https://dev.to/spectredevxyz/how-to-build-a-backend-that-scales-from-100-to-10m-users-3g8k</guid>
      <description>&lt;p&gt;Your system worked fine. Then it didn't.&lt;/p&gt;

&lt;p&gt;Not at 1,000 users — at 1,000 it was still fine, a bit slow maybe. The crash came around 50,000 concurrent requests. Database refused connections. Response times went from 180ms to 11 seconds. The on-call was you. The postmortem was painful.&lt;/p&gt;

&lt;p&gt;This isn't a story about bad engineering. Most teams that hit scaling walls wrote reasonable code for the scale they had. The problem is that reasonable code for 100 users has a different shape than reasonable code for 10 million, and nobody warns you about the specific places it breaks in between.&lt;/p&gt;

&lt;p&gt;What follows is the sequence of bottlenecks you'll actually hit, roughly in the order you'll hit them. Not theory. The things we've seen break at funded startups, and what actually fixed them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Start boring. Stay boring as long as you can.
&lt;/h2&gt;

&lt;p&gt;The most expensive advice in early-stage software is "build for scale from day one."&lt;/p&gt;

&lt;p&gt;Don't.&lt;/p&gt;

&lt;p&gt;Nobody knows what their system actually needs to scale until it needs to scale. Teams that design microservices at MVP stage spend their first year fighting infrastructure instead of building their product. I've watched it happen. It's not a capacity problem — it's a self-inflicted coordination problem.&lt;/p&gt;

&lt;p&gt;The right architecture for your first 10,000 users is a monolith: one codebase, one database, one server. A well-tuned PostgreSQL instance on a decent Hetzner or DigitalOcean box can handle more traffic than most founders expect. Gojek didn't launch as a distributed system. Neither did Tokopedia. They started boring, scaled up when they had to, and made the hard architectural decisions with real traffic data instead of guesses.&lt;/p&gt;

&lt;p&gt;The skill isn't picking the right architecture upfront. It's recognising when your current one stops working and knowing what to reach for next.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where systems actually break first: the database
&lt;/h2&gt;

&lt;p&gt;Eighty percent of scaling problems live here. Not the app layer. Not the load balancer. The database.&lt;/p&gt;

&lt;p&gt;Most backends start on a single PostgreSQL (or MySQL) instance. That's fine — until queries slow down, connections pile up, and response times spike at peak hours. Before reaching for read replicas or sharding, check these first:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unindexed columns.&lt;/strong&gt; Run &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; on your slowest queries. You'll almost always find a sequential scan on a column with no index. Adding the right index can turn a 4-second query into 40ms. We've seen it on tables with 200 million rows — the query just worked after the index landed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;N+1 queries.&lt;/strong&gt; ORMs hide these well. Your endpoint that loads 50 products is probably firing 51 queries: one for the list, one per product for a related model. Find it in query logs. Fix it with eager loading or a JOIN.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection exhaustion.&lt;/strong&gt; Every API request opening its own database connection doesn't scale. PgBouncer as a connection pooler is a one-afternoon change that has unblocked teams hitting walls at 50k DAU.&lt;/p&gt;

&lt;p&gt;Fix those three things first. You probably just bought yourself three to six months of headroom.&lt;/p&gt;

&lt;p&gt;When that's not enough: add a read replica. Route all SELECT queries there, writes stay on primary. This halves primary load for read-heavy applications and is a Monday morning change, not a quarter-long project.&lt;/p&gt;

&lt;p&gt;Sharding — splitting data across multiple database instances — comes much later, when a single machine genuinely can't store your data or sustain your write volume. Most startups never get there. The ones that do at least know exactly why they're doing it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Caching: what it solves, and what it doesn't
&lt;/h2&gt;

&lt;p&gt;Redis is often treated as magic. It isn't. It's a trade-off: faster reads at the cost of potential staleness.&lt;/p&gt;

&lt;p&gt;It works well when the same data gets read far more often than it changes — user profiles, product listings, pricing tables, configuration values. The cache-aside pattern covers most cases: check Redis first, on miss hit the database, write the result back to Redis with a TTL.&lt;/p&gt;

&lt;p&gt;Two things that bite teams in production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache stampede.&lt;/strong&gt; Your TTL expires on a popular key. Three hundred concurrent requests miss cache simultaneously and pile onto the database. Fix it with mutex locking on cache population (only one request rebuilds the cache, others wait) or by randomising TTLs so popular keys don't all expire at the same moment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stale data at the worst time.&lt;/strong&gt; A promotion goes live, prices change, cache still serves old values. Every cached key needs a TTL appropriate to how often the underlying data actually changes. "Cache forever" always becomes a problem eventually.&lt;/p&gt;

&lt;p&gt;One important note: caching buys time. It doesn't fix slow queries or connection problems. Solve those first, then layer caching on top.&lt;/p&gt;




&lt;h2&gt;
  
  
  Horizontal scaling: when it helps, when it doesn't
&lt;/h2&gt;

&lt;p&gt;Adding more app servers is the straightforward part — once your application is stateless. Sessions can't live in memory on individual servers. They need to live in Redis or the database so any instance can handle any request.&lt;/p&gt;

&lt;p&gt;Beyond statelessness: a load balancer distributes traffic across instances, health checks remove dead ones automatically. Round-robin works for most cases.&lt;/p&gt;

&lt;p&gt;What horizontal scaling doesn't fix is a slow database. Five app servers hitting a slow query just create five times the load on the same bottleneck. This is the trap most teams fall into — they see high CPU on the app server, add another instance, and watch database CPU spike instead.&lt;/p&gt;

&lt;p&gt;Fix the database layer first. Then scale the application horizontally.&lt;/p&gt;




&lt;h2&gt;
  
  
  The mistake almost everyone makes
&lt;/h2&gt;

&lt;p&gt;Microservices.&lt;/p&gt;

&lt;p&gt;I've seen this at multiple startups in the last two years. The team reads about how a unicorn operates, decides they should architect the same way, and six months later they have fifteen services, a Kubernetes cluster nobody fully understands, distributed tracing that half-works, and a deployment pipeline that takes 45 minutes.&lt;/p&gt;

&lt;p&gt;Microservices solve an organisational problem, not a technical one. They exist so large engineering organisations — 50, 100, 200 people — can ship independently without blocking each other. At 10 to 20 engineers, you don't have that problem. You just gave yourself one.&lt;/p&gt;

&lt;p&gt;The inflection point where microservices start making sense: multiple teams, multiple deployment cadences, clear domain ownership, and enough engineers to properly staff each service. Before that, the right answer is usually a modular monolith — clear internal module boundaries, defined interfaces between them, deployed as one unit. Most of the organisational benefit, none of the distributed systems complexity.&lt;/p&gt;




&lt;h2&gt;
  
  
  What an actual scale-up sequence looks like
&lt;/h2&gt;

&lt;p&gt;A fintech company in Jakarta processes payment webhook notifications for a mid-size e-commerce platform. At launch: single Django app, single PostgreSQL, one EC2 instance.&lt;/p&gt;

&lt;p&gt;At around 300,000 daily active users, two things broke simultaneously. Database connections were exhausted during 11am–1pm peak (the lunch scroll). Webhook processing was blocking synchronous API responses, adding 3–8 seconds of latency.&lt;/p&gt;

&lt;p&gt;The fix sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;PgBouncer for connection pooling → connection exhaustion resolved within 24 hours&lt;/li&gt;
&lt;li&gt;Celery + Redis for async webhook processing → API responses back to sub-200ms&lt;/li&gt;
&lt;li&gt;PostgreSQL read replica → offloaded 60% of DB reads, primary CPU dropped from 82% to 34%&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Same Django monolith throughout. No Kubernetes. No microservices. Six times the headroom, two weeks of engineering work.&lt;/p&gt;

&lt;p&gt;They're at 1.2M DAU now on the same core architecture. The next actual architectural decision is sharding the payments table, which is approaching 800GB. That's a six-month project, carefully sequenced. It's the right problem to be solving at 1.2M DAU — not at 300k.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;When should I move from a monolith to microservices?&lt;/strong&gt;&lt;br&gt;
When you have multiple teams that need to deploy independently, clear domain boundaries in your codebase, and at least two engineers who can own each service end-to-end. Most teams under 30 engineers aren't there yet, and the ones that think they are usually regret it six months in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How much traffic can a single PostgreSQL instance actually handle?&lt;/strong&gt;&lt;br&gt;
With proper indexing and connection pooling, a well-specced instance (32 cores, 128GB RAM) handles tens of thousands of queries per second. Most teams hit problems in their application code long before the database itself is the ceiling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My server is struggling. What's the first thing to check?&lt;/strong&gt;&lt;br&gt;
Run &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; on your slowest queries. Then check connection counts in &lt;code&gt;pg_stat_activity&lt;/code&gt;. Then look at whether you're repeatedly fetching data that rarely changes. In that order — skipping ahead usually wastes a week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do we actually need Kubernetes?&lt;/strong&gt;&lt;br&gt;
Probably not yet. Kubernetes is operationally expensive. Managed container services — AWS ECS, Cloud Run, Fly.io — give you the container deployment benefits without the complexity overhead. Most startups are better served by those until they have a dedicated platform team who wants to own the cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you handle sudden traffic spikes?&lt;/strong&gt;&lt;br&gt;
Queue-based load levelling is the most reliable pattern: spikes hit the queue, workers drain it at a pace the database can sustain. Teams that handle Lebaran or Harbolnas well pre-scale infrastructure, aggressively cache product and pricing data, and have tested their queue depth limits before the event. The ones that don't plan spend the night firefighting.&lt;/p&gt;




&lt;p&gt;Scaling is a sequence of boring decisions made at the right moment. The teams that get it right aren't the ones who designed for 10M users on day one — they're the ones who knew which bottleneck they were actually solving when each one showed up.&lt;/p&gt;

&lt;p&gt;If you're not sure where your system starts breaking, an architecture audit is usually faster than guessing in the dark.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://www.spectredev.xyz" rel="noopener noreferrer"&gt;SpectreDev&lt;/a&gt; builds high-traffic, reliable backend systems for startups in Indonesia, Australia, and Southeast Asia.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>backend</category>
      <category>architecture</category>
      <category>scalability</category>
      <category>postgres</category>
    </item>
  </channel>
</rss>
