TL;DR — You do not need Kafka, Kubernetes, or 17 microservices to handle serious traffic. A well-structured monolith, PostgreSQL, Redis, PgBouncer, a CDN, and horizontal application scaling can take you to 10 million requests per day with less risk, less cost, and far less operational pain.
Most backend scaling guides start at the wrong point.
They show you the architecture after a company has 200 engineers, multiple platform teams, and years of accumulated infrastructure. Then they present that architecture as if it were the natural starting point.
It was not.
Most high-traffic systems begin the same way: one application, one database, one deployment path, one team trying to ship product fast enough to matter. The systems that survive are not the ones that adopt the most technology. They are the ones that adopt the right technology at the right time.
This is a guide to that path.
Not the hype version.
Not the conference-talk version.
The production version.
What This Guide Covers
This post is an opinionated case study plus guide for building a high-traffic backend using a scalable monolith and a small number of carefully chosen supporting components.
It is optimized for teams that care about:
- Shipping fast without building infrastructure theater
- Scaling predictably under real production traffic
- Keeping debugging and deployment simple
- Avoiding premature microservices
- Reaching 10M requests/day without rewriting the whole backend
Core keywords this guide is built around:
- scalable monolith
- high-traffic backend
- pragmatic architecture
- production scaling
- backend architecture
- Node.js scaling
- PostgreSQL performance
- Redis caching
- PgBouncer
- boring technology
Why Boring Technology Wins
Boring technology is not boring because it lacks power.
It is boring because it is understood.
That matters more than people admit.
A system that uses PostgreSQL, Redis, Nginx, and a monolith is easier to reason about, easier to observe, easier to debug at 2am, and easier to hire for than a system built from six trendy abstractions nobody fully understands.
The highest-leverage backend architecture principle is this:
Prefer the simplest system that survives your current load with margin.
That single rule eliminates most bad architectural decisions.
The Traffic Stages
Every backend goes through roughly four traffic stages. The architecture that is reasonable at one stage is often unnecessary or actively harmful at another.
The mistake most teams make is solving Stage 4 problems while still in Stage 1.
That is how you end up with architecture that looks impressive and ships slowly.
The Boring Stack
Here is the baseline stack for a pragmatic high-traffic backend:
| Layer | Choice | Why it stays |
|---|---|---|
| Runtime | Node.js + Fastify | Fast enough, mature, simple developer experience |
| Primary database | PostgreSQL | Transactional, reliable, great indexing, JSON support |
| Cache | Redis | Perfect for read-heavy data, sessions, rate limits, queues |
| Queue | BullMQ | Uses Redis, avoids introducing another broker too early |
| Connection pooler | PgBouncer | Protects Postgres from connection explosion |
| Reverse proxy | Nginx | Stable, battle-tested, high-performance |
| CDN | Cloudflare | Offloads traffic before it reaches origin |
| Monitoring | Prometheus + Grafana | Standard, simple, effective |
| Logs | Structured JSON logs | Searchable, aggregatable, production-friendly |
What is intentionally not here:
- Kafka by default
- Microservices by default
- Kubernetes by default
- Event-driven everything
- Service mesh
- Distributed transactions
- Premature CQRS
None of those are inherently bad. They are just not your starting point.
Stage 1: Build a Monolith That Is Easy to Scale
At low traffic, the correct architecture is usually one codebase, one app process group, one database, one deployment pipeline.
That is not a temporary embarrassment. That is good architecture.
Why a Monolith Is the Right Default
A scalable monolith gives you:
- One deployable unit
- One place to debug business logic
- One database transaction boundary
- No network hops between internal features
- No distributed systems complexity
- No cross-service schema drift
- No internal API versioning burden
At this stage, your job is not to create elegant infrastructure. Your job is to create a stable product with clean boundaries inside a single codebase.
That means modularizing inside the monolith.
Monolith Structure That Ages Well
A good monolith is not a folder full of chaos. It has domain boundaries even though it deploys as one service.
src/
modules/
auth/
auth.routes.js
auth.service.js
auth.repo.js
billing/
billing.routes.js
billing.service.js
billing.repo.js
users/
users.routes.js
users.service.js
users.repo.js
lib/
db.js
cache.js
queue.js
logger.js
app.js
server.js
This gives you the operational simplicity of a monolith and the code organization of a more mature system.
Start with Pooling Immediately
Even at low traffic, do not connect to PostgreSQL casually from every request path. Use pooling from the start.
import Fastify from 'fastify';
import postgres from '@fastify/postgres';
const app = Fastify({ logger: true });
app.register(postgres, {
connectionString: process.env.DATABASE_URL,
pool: {
min: 2,
max: 10,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
},
});
The point is not that you need massive pooling on day one.
The point is that you build the habit before traffic arrives.
Stage 1 Database Design: Decisions That Matter Later
Most scaling pain is not caused by traffic alone. It is caused by data model decisions that looked harmless when traffic was small.
Index Foreign Keys Immediately
This is one of the most common missing pieces in production systems.
CREATE TABLE users (
id UUID PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE posts (
id UUID PRIMARY KEY,
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
slug TEXT NOT NULL UNIQUE,
status TEXT NOT NULL DEFAULT 'draft',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_posts_created_at ON posts(created_at DESC);
CREATE INDEX idx_posts_published ON posts(status) WHERE status = 'published';
That partial index on status = 'published' matters because it reflects the access pattern you are likely to have in production.
Design for Query Shapes, Not Just Entities
A schema is not just a representation of objects. It is a representation of future queries.
Ask:
- What will be filtered often?
- What will be sorted often?
- What joins will appear on critical paths?
- What should be cached?
- What can remain eventually consistent?
- What must stay transactional?
That level of thinking matters more than fashionable architecture diagrams.
Stage 2: Remove the Easy Bottlenecks
Once traffic starts becoming meaningful, most problems are still boring.
They usually fall into one of these buckets:
- Repeated identical reads hitting the database
- Too many application connections reaching Postgres
- Missing indexes
- Static assets reaching the app server unnecessarily
- Expensive synchronous operations done inside requests
This is where good production scaling begins.
Add Redis the Right Way
Redis should enter the architecture as a targeted optimization, not as a random dependency.
Three strong uses at this stage:
- Caching
- Rate limiting
- Session storage / temporary state
Cache-Aside Pattern
The most reliable caching pattern for application data is still cache-aside.
export class Cache {
constructor(redis, ttl = 300) {
this.redis = redis;
this.ttl = ttl;
}
async get(key) {
const raw = await this.redis.get(key);
return raw ? JSON.parse(raw) : null;
}
async set(key, value, ttl = this.ttl) {
await this.redis.setex(key, ttl, JSON.stringify(value));
}
async del(key) {
await this.redis.del(key);
}
async getOrFetch(key, fetchFn, ttl = this.ttl) {
const cached = await this.get(key);
if (cached !== null) return cached;
const fresh = await fetchFn();
if (fresh !== null && fresh !== undefined) {
await this.set(key, fresh, ttl);
}
return fresh;
}
}
app.get('/posts/:slug', async (req, reply) => {
const key = `post:slug:${req.params.slug}`;
const post = await cache.getOrFetch(
key,
async () => {
const result = await req.pg.query(
'SELECT * FROM posts WHERE slug = $1 AND status = $2',
[req.params.slug, 'published']
);
return result.rows ?? null;
},
600
);
if (!post) return reply.code(404).send({ error: 'Not found' });
return post;
});
Cache What Is Expensive and Stable
Good cache candidates:
- Product details
- Blog posts
- User profile summaries
- Public pricing data
- Aggregated dashboard numbers
- Settings that rarely change
Bad cache candidates:
- Highly volatile counters without clear invalidation
- Permission-sensitive data unless keyed safely
- Write-heavy rows with constant mutation
- Anything you cannot invalidate confidently
Caching is not magic. It is a trade: memory and invalidation complexity in exchange for lower latency and lower database load.
Protect PostgreSQL with PgBouncer
At moderate traffic, PostgreSQL often stops being limited by raw query power and starts being limited by connection handling.
That is what PgBouncer solves.
Why PgBouncer Matters
Your app may create many logical connections.
Postgres should not have to maintain that many physical ones.
PgBouncer lets you:
- Smooth connection spikes
- Keep Postgres stable
- Scale app instances without linearly scaling DB connections
- Reduce memory pressure at the database layer
Critical config principle:
pool_mode = transaction
That one setting is often the difference between a useful PgBouncer deployment and a misleading one.
Stage 3: Horizontal Scale Without Losing Simplicity
Now the backend starts looking more serious.
Not because you adopted microservices.
Because you removed single points of failure.
This is still a boring architecture.
It is also enough for a high-traffic backend.
Add Read Replicas When Read Load Dominates
The right time to add Postgres read replicas is when your write volume is manageable but reads are crowding the primary.
That is a common pattern for:
- SaaS dashboards
- CMS-backed sites
- APIs with heavy lookup traffic
- B2B platforms with read-heavy admin views
Split Reads and Writes Explicitly
import pg from 'pg';
const writePool = new pg.Pool({
connectionString: process.env.DATABASE_PRIMARY_URL,
max: 5,
});
const readPool = new pg.Pool({
connectionString: process.env.DATABASE_REPLICA_URL,
max: 20,
});
export const db = {
write(sql, params) {
return writePool.query(sql, params);
},
read(sql, params) {
return readPool.query(sql, params);
},
};
This is deliberately explicit.
You want engineers to know when they are reading from a replica versus writing to a primary.
Understand Replica Lag
Read replicas are not free throughput.
They introduce a real tradeoff: replica lag.
That means a request can:
- Write to primary
- Immediately read from replica
- Not see its own write yet
So do not route consistency-sensitive reads blindly to replicas.
Examples of reads that should still hit primary:
- Immediately after creating a resource
- Checkout confirmation flows
- Billing updates
- Permission changes
- Authentication-adjacent state
This is where many scaling guides get too hand-wavy. Replica lag is not theoretical. You must design for it.
Add a CDN Earlier Than Most Teams Do
A CDN is not just for images.
It is a traffic absorber.
A CDN should sit in front of your system as early as possible because it gives you:
- Lower latency globally
- Reduced origin load
- Edge caching for static assets
- Basic DDoS mitigation
- TLS termination
- Better burst handling
Cloudflare alone can eliminate a surprising amount of backend work before the request ever reaches your application.
What Should Be Cached at the Edge
Good CDN candidates:
- JS, CSS, fonts, images
- Static marketing pages
- Public docs pages
- Public blog content with short revalidation windows
- Some anonymous API responses if safe
Bad CDN candidates:
- Authenticated user dashboards
- Personalized responses
- Permission-sensitive resources
- Anything with ambiguous cache headers
Add Rate Limiting Before You Think You Need It
A production system gets stressed not only by success but by abuse, bugs, retries, crawlers, scripts, and bursty clients.
Rate limiting is not just security. It is stability.
import { RateLimiterRedis } from 'rate-limiter-flexible';
const limiter = new RateLimiterRedis({
storeClient: redis,
keyPrefix: 'rate_limit',
points: 100,
duration: 60,
blockDuration: 60,
});
export async function rateLimit(req, reply) {
try {
const key = req.user?.id || req.ip;
await limiter.consume(key);
} catch {
return reply.code(429).send({
error: 'Too many requests',
});
}
}
The crucial point here is that the limiter state lives in Redis, not memory.
If the limiter is in memory per instance, it becomes inconsistent behind a load balancer.
Stage 4: Make the System Degrade Gracefully
At 1M to 10M requests/day, you stop thinking only about scale and start thinking about failure shape.
The question changes from:
“Can the system handle this?”
to:
“How does the system behave when it cannot?”
That is a more mature question.
Move Non-Critical Work Out of the Request Path
Anything not required to complete the user-visible response should leave the request path.
That includes:
- Emails
- Webhook delivery
- Image processing
- Video transcoding
- Search indexing
- Report generation
- Analytics fanout
- Notification dispatch
This is where a Redis-backed queue like BullMQ is exactly right.
Background Jobs with BullMQ
import { Queue, Worker } from 'bullmq';
const connection = {
host: process.env.REDIS_HOST,
port: 6379,
};
export const emailQueue = new Queue('email', {
connection,
defaultJobOptions: {
attempts: 3,
backoff: { type: 'exponential', delay: 2000 },
removeOnComplete: { count: 1000 },
removeOnFail: { count: 5000 },
},
});
export const emailWorker = new Worker(
'email',
async (job) => {
const { to, subject, template, data } = job.data;
await sendEmail({ to, subject, template, data });
},
{
connection,
concurrency: 5,
}
);
app.post('/users/:id/welcome-email', async (req, reply) => {
const user = await db.read(
'SELECT id, email, name FROM users WHERE id = $1',
[req.params.id]
).then(r => r.rows);
if (!user) return reply.code(404).send({ error: 'User not found' });
await emailQueue.add('welcome', {
to: user.email,
subject: 'Welcome aboard',
template: 'welcome',
data: { name: user.name },
});
return reply.code(202).send({ message: 'Email queued' });
});
This is a major step in backend architecture maturity.
Not because queues are fashionable. Because they protect request latency and isolate failure.
Build a Caching Hierarchy
At 10M requests/day, caching must exist at more than one layer.
Each layer exists for a different reason:
- CDN handles global traffic and static/public content
- Nginx can absorb repeated identical upstream requests
- Redis handles application-level object caching
- Read replicas absorb remaining read pressure
- Primary is reserved for writes and consistency-sensitive reads
If everything reaches the primary database, the rest of your scaling story is mostly fiction.
Observability: The Part Everyone Delays Too Long
A backend is not scalable because it can survive load once in a benchmark.
It is scalable when you can understand what is happening under production load quickly enough to act.
That means metrics, logs, health checks, and dashboards.
Minimum Metrics You Need
| Metric | Why it matters |
|---|---|
| Request rate | Traffic shape and load changes |
| p95 latency | Real user experience under load |
| Error rate | Detects systemic failure quickly |
| DB query duration | Tells you when DB is the bottleneck |
| Cache hit rate | Shows whether Redis is doing useful work |
| Queue depth | Reveals background work backlog |
| CPU / memory | Capacity planning and saturation signals |
| Replica lag | Prevents stale-read surprises |
Example Prometheus Metrics
import client from 'prom-client';
client.collectDefaultMetrics();
export const httpDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5],
});
export const dbQueryDuration = new client.Histogram({
name: 'db_query_duration_seconds',
help: 'Database query duration in seconds',
labelNames: ['operation', 'table'],
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1],
});
Practical Thresholds
| Signal | Warning | Critical |
|---|---|---|
| p95 request latency | > 200ms | > 500ms |
| p95 DB query latency | > 50ms | > 200ms |
| Cache hit rate | < 85% | < 70% |
| 5xx error rate | > 0.5% | > 2% |
| Queue backlog growth | Sustained 5 min | Sustained 15 min |
If you do not know these numbers, you do not yet know whether your backend is healthy.
When Not to Use Microservices
This section matters because too many teams still treat microservices as a rite of passage.
They are not.
Microservices are a tax. Sometimes a necessary one, often an unnecessary one.
Do Not Break the Monolith If:
- One team can still understand the codebase
- Deployments are still manageable
- The database is not your deployment bottleneck
- Most modules do not need independent scaling
- Internal network calls would replace in-process calls for no clear gain
- Your operational maturity is still limited
- Your debugging pipeline is still weak
- You do not have platform engineers
Consider Extracting a Service Only If:
- One subsystem has drastically different scaling needs
- One subsystem needs a different runtime or storage model
- Deployment coupling is causing continuous friction
- Team boundaries are stable and long-lived
- You already have strong observability and operational discipline
The goal is not ideological purity.
The goal is operational sanity.
What 10M Requests/Day Actually Means
This number sounds dramatic, but it helps to reduce it.
10M requests/day is about:
- 115 requests per second on average
- Often 300–600 req/s at peak
- Higher during bursts, launches, crawls, or regional concentration
That is large enough to be real.
It is also absolutely within reach of a well-built monolith plus:
- horizontal app scaling
- Redis caching
- PgBouncer
- read replicas
- CDN offload
- async job processing
The scary part is not the request count.
The scary part is waste.
A bad query, missing index, unbounded N+1 pattern, or synchronous email send can collapse a backend long before traffic itself becomes the true problem.
Production Checklist
Database
- [ ] PostgreSQL primary configured correctly
- [ ] Read replicas added when read traffic justifies them
- [ ] PgBouncer deployed in transaction mode
- [ ] All foreign keys indexed
- [ ] Slow query logging enabled
- [ ] Backup and restore tested, not just configured
- [ ]
pg_stat_statementsenabled
Caching
- [ ] Redis in place for cache, sessions, or rate limiting
- [ ] Cache key naming convention documented
- [ ] Invalidation exists on all write paths
- [ ] Cache hit ratio monitored
- [ ] No sensitive cross-user cache leakage
App Layer
- [ ] Health checks at
/health - [ ] Graceful shutdown implemented
- [ ] Request timeouts enforced
- [ ] Rate limiting in Redis, not memory
- [ ] Background tasks moved off request path
- [ ] Structured logs everywhere
Infrastructure
- [ ] Nginx or equivalent reverse proxy configured
- [ ] CDN in front of public traffic
- [ ] HTTPS enforced
- [ ] Compression enabled
- [ ] Static assets cached aggressively
Observability
- [ ] Prometheus metrics exposed
- [ ] Grafana dashboards for latency, errors, cache hit rate, queue depth
- [ ] Alerts configured before incidents happen
- [ ] Logs searchable centrally
- [ ] Replica lag visible if using replicas
The Real Scaling Mindset
The most useful scaling principle is not “build for hyperscale.”
It is this:
Remove the current bottleneck without introducing unnecessary permanent complexity.
That is how serious systems are actually built.
A good backend architecture is not the one with the most boxes in the diagram.
It is the one that stays understandable while traffic grows.
A pragmatic architecture wins because it keeps your team fast, your failure modes legible, and your operating costs reasonable. A scalable monolith wins because most systems need better boundaries, better caching, and better query discipline long before they need service decomposition.
If you want to reach 10M requests/day, do not start by asking whether you need Kafka.
Start by asking:
- Are my queries indexed?
- Is my cache hit rate healthy?
- Am I blocking requests on background work?
- Can Postgres survive connection spikes?
- Can my system degrade gracefully under load?
- Can I explain every major component in one sentence?
If the answer to those questions is yes, you are much closer than you think.
Real scale rarely demands flashy architecture first. It demands disciplined engineering first.





Top comments (0)