Ovaise Qayoom

Posted on Apr 2

From 0 to 10M Requests/Day, Architecting a Boring but Bulletproof Backend

#backend #architecture #node #ai

TL;DR — You do not need Kafka, Kubernetes, or 17 microservices to handle serious traffic. A well-structured monolith, PostgreSQL, Redis, PgBouncer, a CDN, and horizontal application scaling can take you to 10 million requests per day with less risk, less cost, and far less operational pain.

Most backend scaling guides start at the wrong point.
They show you the architecture after a company has 200 engineers, multiple platform teams, and years of accumulated infrastructure. Then they present that architecture as if it were the natural starting point.

It was not.

Most high-traffic systems begin the same way: one application, one database, one deployment path, one team trying to ship product fast enough to matter. The systems that survive are not the ones that adopt the most technology. They are the ones that adopt the right technology at the right time.

This is a guide to that path.
Not the hype version.

Not the conference-talk version.

The production version.

What This Guide Covers

This post is an opinionated case study plus guide for building a high-traffic backend using a scalable monolith and a small number of carefully chosen supporting components.

It is optimized for teams that care about:

Shipping fast without building infrastructure theater
Scaling predictably under real production traffic
Keeping debugging and deployment simple
Avoiding premature microservices
Reaching 10M requests/day without rewriting the whole backend

Core keywords this guide is built around:

scalable monolith
high-traffic backend
pragmatic architecture
production scaling
backend architecture
Node.js scaling
PostgreSQL performance
Redis caching
PgBouncer
boring technology

Why Boring Technology Wins

Boring technology is not boring because it lacks power.

It is boring because it is understood.

That matters more than people admit.

A system that uses PostgreSQL, Redis, Nginx, and a monolith is easier to reason about, easier to observe, easier to debug at 2am, and easier to hire for than a system built from six trendy abstractions nobody fully understands.

The highest-leverage backend architecture principle is this:

Prefer the simplest system that survives your current load with margin.

That single rule eliminates most bad architectural decisions.

The Traffic Stages

Every backend goes through roughly four traffic stages. The architecture that is reasonable at one stage is often unnecessary or actively harmful at another.

The mistake most teams make is solving Stage 4 problems while still in Stage 1.

That is how you end up with architecture that looks impressive and ships slowly.

The Boring Stack

Here is the baseline stack for a pragmatic high-traffic backend:

Layer	Choice	Why it stays
Runtime	Node.js + Fastify	Fast enough, mature, simple developer experience
Primary database	PostgreSQL	Transactional, reliable, great indexing, JSON support
Cache	Redis	Perfect for read-heavy data, sessions, rate limits, queues
Queue	BullMQ	Uses Redis, avoids introducing another broker too early
Connection pooler	PgBouncer	Protects Postgres from connection explosion
Reverse proxy	Nginx	Stable, battle-tested, high-performance
CDN	Cloudflare	Offloads traffic before it reaches origin
Monitoring	Prometheus + Grafana	Standard, simple, effective
Logs	Structured JSON logs	Searchable, aggregatable, production-friendly

What is intentionally not here:

Kafka by default
Microservices by default
Kubernetes by default
Event-driven everything
Service mesh
Distributed transactions
Premature CQRS

None of those are inherently bad. They are just not your starting point.

Stage 1: Build a Monolith That Is Easy to Scale

At low traffic, the correct architecture is usually one codebase, one app process group, one database, one deployment pipeline.

That is not a temporary embarrassment. That is good architecture.

Why a Monolith Is the Right Default

A scalable monolith gives you:

One deployable unit
One place to debug business logic
One database transaction boundary
No network hops between internal features
No distributed systems complexity
No cross-service schema drift
No internal API versioning burden

At this stage, your job is not to create elegant infrastructure. Your job is to create a stable product with clean boundaries inside a single codebase.

That means modularizing inside the monolith.

Monolith Structure That Ages Well

A good monolith is not a folder full of chaos. It has domain boundaries even though it deploys as one service.

src/
  modules/
    auth/
      auth.routes.js
      auth.service.js
      auth.repo.js
    billing/
      billing.routes.js
      billing.service.js
      billing.repo.js
    users/
      users.routes.js
      users.service.js
      users.repo.js
  lib/
    db.js
    cache.js
    queue.js
    logger.js
  app.js
  server.js

This gives you the operational simplicity of a monolith and the code organization of a more mature system.

Start with Pooling Immediately

Even at low traffic, do not connect to PostgreSQL casually from every request path. Use pooling from the start.

import Fastify from 'fastify';
import postgres from '@fastify/postgres';

const app = Fastify({ logger: true });

app.register(postgres, {
  connectionString: process.env.DATABASE_URL,
  pool: {
    min: 2,
    max: 10,
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 2000,
  },
});

The point is not that you need massive pooling on day one.

The point is that you build the habit before traffic arrives.

Stage 1 Database Design: Decisions That Matter Later

Most scaling pain is not caused by traffic alone. It is caused by data model decisions that looked harmless when traffic was small.

Index Foreign Keys Immediately

This is one of the most common missing pieces in production systems.

CREATE TABLE users (
  id UUID PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE posts (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  slug TEXT NOT NULL UNIQUE,
  status TEXT NOT NULL DEFAULT 'draft',
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_posts_created_at ON posts(created_at DESC);
CREATE INDEX idx_posts_published ON posts(status) WHERE status = 'published';

That partial index on status = 'published' matters because it reflects the access pattern you are likely to have in production.

Design for Query Shapes, Not Just Entities

A schema is not just a representation of objects. It is a representation of future queries.

Ask:

What will be filtered often?
What will be sorted often?
What joins will appear on critical paths?
What should be cached?
What can remain eventually consistent?
What must stay transactional?

That level of thinking matters more than fashionable architecture diagrams.

Stage 2: Remove the Easy Bottlenecks

Once traffic starts becoming meaningful, most problems are still boring.

They usually fall into one of these buckets:

Repeated identical reads hitting the database
Too many application connections reaching Postgres
Missing indexes
Static assets reaching the app server unnecessarily
Expensive synchronous operations done inside requests

This is where good production scaling begins.

Add Redis the Right Way

Redis should enter the architecture as a targeted optimization, not as a random dependency.

Three strong uses at this stage:

Caching
Rate limiting
Session storage / temporary state

Cache-Aside Pattern

The most reliable caching pattern for application data is still cache-aside.

export class Cache {
  constructor(redis, ttl = 300) {
    this.redis = redis;
    this.ttl = ttl;
  }

  async get(key) {
    const raw = await this.redis.get(key);
    return raw ? JSON.parse(raw) : null;
  }

  async set(key, value, ttl = this.ttl) {
    await this.redis.setex(key, ttl, JSON.stringify(value));
  }

  async del(key) {
    await this.redis.del(key);
  }

  async getOrFetch(key, fetchFn, ttl = this.ttl) {
    const cached = await this.get(key);
    if (cached !== null) return cached;

    const fresh = await fetchFn();
    if (fresh !== null && fresh !== undefined) {
      await this.set(key, fresh, ttl);
    }
    return fresh;
  }
}

app.get('/posts/:slug', async (req, reply) => {
  const key = `post:slug:${req.params.slug}`;

  const post = await cache.getOrFetch(
    key,
    async () => {
      const result = await req.pg.query(
        'SELECT * FROM posts WHERE slug = $1 AND status = $2',
        [req.params.slug, 'published']
      );
      return result.rows ?? null;
    },
    600
  );

  if (!post) return reply.code(404).send({ error: 'Not found' });
  return post;
});

Cache What Is Expensive and Stable

Good cache candidates:

Product details
Blog posts
User profile summaries
Public pricing data
Aggregated dashboard numbers
Settings that rarely change

Bad cache candidates:

Highly volatile counters without clear invalidation
Permission-sensitive data unless keyed safely
Write-heavy rows with constant mutation
Anything you cannot invalidate confidently

Caching is not magic. It is a trade: memory and invalidation complexity in exchange for lower latency and lower database load.

Protect PostgreSQL with PgBouncer

At moderate traffic, PostgreSQL often stops being limited by raw query power and starts being limited by connection handling.

That is what PgBouncer solves.

Why PgBouncer Matters

Your app may create many logical connections.

Postgres should not have to maintain that many physical ones.

PgBouncer lets you:

Smooth connection spikes
Keep Postgres stable
Scale app instances without linearly scaling DB connections
Reduce memory pressure at the database layer

Critical config principle:

pool_mode = transaction

That one setting is often the difference between a useful PgBouncer deployment and a misleading one.

Stage 3: Horizontal Scale Without Losing Simplicity

Now the backend starts looking more serious.

Not because you adopted microservices.

Because you removed single points of failure.

This is still a boring architecture.

It is also enough for a high-traffic backend.

Add Read Replicas When Read Load Dominates

The right time to add Postgres read replicas is when your write volume is manageable but reads are crowding the primary.

That is a common pattern for:

SaaS dashboards
CMS-backed sites
APIs with heavy lookup traffic
B2B platforms with read-heavy admin views

Split Reads and Writes Explicitly

import pg from 'pg';

const writePool = new pg.Pool({
  connectionString: process.env.DATABASE_PRIMARY_URL,
  max: 5,
});

const readPool = new pg.Pool({
  connectionString: process.env.DATABASE_REPLICA_URL,
  max: 20,
});

export const db = {
  write(sql, params) {
    return writePool.query(sql, params);
  },

  read(sql, params) {
    return readPool.query(sql, params);
  },
};

This is deliberately explicit.

You want engineers to know when they are reading from a replica versus writing to a primary.

Understand Replica Lag

Read replicas are not free throughput.

They introduce a real tradeoff: replica lag.

That means a request can:

Write to primary
Immediately read from replica
Not see its own write yet

So do not route consistency-sensitive reads blindly to replicas.

Examples of reads that should still hit primary:

Immediately after creating a resource
Checkout confirmation flows
Billing updates
Permission changes
Authentication-adjacent state

This is where many scaling guides get too hand-wavy. Replica lag is not theoretical. You must design for it.

Add a CDN Earlier Than Most Teams Do

A CDN is not just for images.

It is a traffic absorber.

A CDN should sit in front of your system as early as possible because it gives you:

Lower latency globally
Reduced origin load
Edge caching for static assets
Basic DDoS mitigation
TLS termination
Better burst handling

Cloudflare alone can eliminate a surprising amount of backend work before the request ever reaches your application.

What Should Be Cached at the Edge

Good CDN candidates:

JS, CSS, fonts, images
Static marketing pages
Public docs pages
Public blog content with short revalidation windows
Some anonymous API responses if safe

Bad CDN candidates:

Authenticated user dashboards
Personalized responses
Permission-sensitive resources
Anything with ambiguous cache headers

Add Rate Limiting Before You Think You Need It

A production system gets stressed not only by success but by abuse, bugs, retries, crawlers, scripts, and bursty clients.

Rate limiting is not just security. It is stability.

import { RateLimiterRedis } from 'rate-limiter-flexible';

const limiter = new RateLimiterRedis({
  storeClient: redis,
  keyPrefix: 'rate_limit',
  points: 100,
  duration: 60,
  blockDuration: 60,
});

export async function rateLimit(req, reply) {
  try {
    const key = req.user?.id || req.ip;
    await limiter.consume(key);
  } catch {
    return reply.code(429).send({
      error: 'Too many requests',
    });
  }
}

The crucial point here is that the limiter state lives in Redis, not memory.

If the limiter is in memory per instance, it becomes inconsistent behind a load balancer.

Stage 4: Make the System Degrade Gracefully

At 1M to 10M requests/day, you stop thinking only about scale and start thinking about failure shape.

The question changes from:

“Can the system handle this?”

to:

“How does the system behave when it cannot?”

That is a more mature question.

Move Non-Critical Work Out of the Request Path

Anything not required to complete the user-visible response should leave the request path.

That includes:

Emails
Webhook delivery
Image processing
Video transcoding
Search indexing
Report generation
Analytics fanout
Notification dispatch

This is where a Redis-backed queue like BullMQ is exactly right.

Background Jobs with BullMQ

import { Queue, Worker } from 'bullmq';

const connection = {
  host: process.env.REDIS_HOST,
  port: 6379,
};

export const emailQueue = new Queue('email', {
  connection,
  defaultJobOptions: {
    attempts: 3,
    backoff: { type: 'exponential', delay: 2000 },
    removeOnComplete: { count: 1000 },
    removeOnFail: { count: 5000 },
  },
});

export const emailWorker = new Worker(
  'email',
  async (job) => {
    const { to, subject, template, data } = job.data;
    await sendEmail({ to, subject, template, data });
  },
  {
    connection,
    concurrency: 5,
  }
);

app.post('/users/:id/welcome-email', async (req, reply) => {
  const user = await db.read(
    'SELECT id, email, name FROM users WHERE id = $1',
    [req.params.id]
  ).then(r => r.rows);

  if (!user) return reply.code(404).send({ error: 'User not found' });

  await emailQueue.add('welcome', {
    to: user.email,
    subject: 'Welcome aboard',
    template: 'welcome',
    data: { name: user.name },
  });

  return reply.code(202).send({ message: 'Email queued' });
});

This is a major step in backend architecture maturity.

Not because queues are fashionable. Because they protect request latency and isolate failure.

Build a Caching Hierarchy

At 10M requests/day, caching must exist at more than one layer.

Each layer exists for a different reason:

CDN handles global traffic and static/public content
Nginx can absorb repeated identical upstream requests
Redis handles application-level object caching
Read replicas absorb remaining read pressure
Primary is reserved for writes and consistency-sensitive reads

If everything reaches the primary database, the rest of your scaling story is mostly fiction.

Observability: The Part Everyone Delays Too Long

A backend is not scalable because it can survive load once in a benchmark.

It is scalable when you can understand what is happening under production load quickly enough to act.

That means metrics, logs, health checks, and dashboards.

Minimum Metrics You Need

Metric	Why it matters
Request rate	Traffic shape and load changes
p95 latency	Real user experience under load
Error rate	Detects systemic failure quickly
DB query duration	Tells you when DB is the bottleneck
Cache hit rate	Shows whether Redis is doing useful work
Queue depth	Reveals background work backlog
CPU / memory	Capacity planning and saturation signals
Replica lag	Prevents stale-read surprises

Example Prometheus Metrics

import client from 'prom-client';

client.collectDefaultMetrics();

export const httpDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5],
});

export const dbQueryDuration = new client.Histogram({
  name: 'db_query_duration_seconds',
  help: 'Database query duration in seconds',
  labelNames: ['operation', 'table'],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1],
});

Practical Thresholds

Signal	Warning	Critical
p95 request latency	> 200ms	> 500ms
p95 DB query latency	> 50ms	> 200ms
Cache hit rate	< 85%	< 70%
5xx error rate	> 0.5%	> 2%
Queue backlog growth	Sustained 5 min	Sustained 15 min

If you do not know these numbers, you do not yet know whether your backend is healthy.

When Not to Use Microservices

This section matters because too many teams still treat microservices as a rite of passage.

They are not.

Microservices are a tax. Sometimes a necessary one, often an unnecessary one.

Do Not Break the Monolith If:

One team can still understand the codebase
Deployments are still manageable
The database is not your deployment bottleneck
Most modules do not need independent scaling
Internal network calls would replace in-process calls for no clear gain
Your operational maturity is still limited
Your debugging pipeline is still weak
You do not have platform engineers

Consider Extracting a Service Only If:

One subsystem has drastically different scaling needs
One subsystem needs a different runtime or storage model
Deployment coupling is causing continuous friction
Team boundaries are stable and long-lived
You already have strong observability and operational discipline

The goal is not ideological purity.

The goal is operational sanity.

What 10M Requests/Day Actually Means

This number sounds dramatic, but it helps to reduce it.

10M requests/day is about:

115 requests per second on average
Often 300–600 req/s at peak
Higher during bursts, launches, crawls, or regional concentration

That is large enough to be real.

It is also absolutely within reach of a well-built monolith plus:

horizontal app scaling
Redis caching
PgBouncer
read replicas
CDN offload
async job processing

The scary part is not the request count.

The scary part is waste.

A bad query, missing index, unbounded N+1 pattern, or synchronous email send can collapse a backend long before traffic itself becomes the true problem.

Production Checklist

Database

[ ] PostgreSQL primary configured correctly
[ ] Read replicas added when read traffic justifies them
[ ] PgBouncer deployed in transaction mode
[ ] All foreign keys indexed
[ ] Slow query logging enabled
[ ] Backup and restore tested, not just configured
[ ] pg_stat_statements enabled

Caching

[ ] Redis in place for cache, sessions, or rate limiting
[ ] Cache key naming convention documented
[ ] Invalidation exists on all write paths
[ ] Cache hit ratio monitored
[ ] No sensitive cross-user cache leakage

App Layer

[ ] Health checks at /health
[ ] Graceful shutdown implemented
[ ] Request timeouts enforced
[ ] Rate limiting in Redis, not memory
[ ] Background tasks moved off request path
[ ] Structured logs everywhere

Infrastructure

[ ] Nginx or equivalent reverse proxy configured
[ ] CDN in front of public traffic
[ ] HTTPS enforced
[ ] Compression enabled
[ ] Static assets cached aggressively

Observability

[ ] Prometheus metrics exposed
[ ] Grafana dashboards for latency, errors, cache hit rate, queue depth
[ ] Alerts configured before incidents happen
[ ] Logs searchable centrally
[ ] Replica lag visible if using replicas

The Real Scaling Mindset

The most useful scaling principle is not “build for hyperscale.”

It is this:

Remove the current bottleneck without introducing unnecessary permanent complexity.

That is how serious systems are actually built.

A good backend architecture is not the one with the most boxes in the diagram.

It is the one that stays understandable while traffic grows.

A pragmatic architecture wins because it keeps your team fast, your failure modes legible, and your operating costs reasonable. A scalable monolith wins because most systems need better boundaries, better caching, and better query discipline long before they need service decomposition.

If you want to reach 10M requests/day, do not start by asking whether you need Kafka.

Start by asking:

Are my queries indexed?
Is my cache hit rate healthy?
Am I blocking requests on background work?
Can Postgres survive connection spikes?
Can my system degrade gracefully under load?
Can I explain every major component in one sentence?

If the answer to those questions is yes, you are much closer than you think.

Real scale rarely demands flashy architecture first. It demands disciplined engineering first.

If this post gave you even one useful insight, that’s a win.

I focus on writing practical, no-BS content that actually helps you build better — not just consume more.

👉 Read more blogs

If you want to reach out or discuss anything:

👉 Contact me

If you're stuck in any project or need help, feel free to connect — I actually respond.

Follow for more real-world dev + design content 🚀

Top comments (3)

Andre Cytryn • Apr 2

the "boring technology wins" framing is exactly right. I've audited enough codebases where Kafka and k8s were introduced at stage 1 and caused more incidents than they prevented. the replica lag section is the part that catches the most teams off guard too. people route reads to replicas confidently then debug mysterious "data not found" errors for hours. one thing worth adding: visualizing the query execution plan early matters a lot. I've been building a tool that helps diagram how data flows through system layers and the first thing it surfaces is almost always the query shaping decisions that got made early and never revisited.

Ovaise Qayoom • Apr 2

Thanks, That’s spot on. The boring technology wins paradigm has saved many teams from the same kinds of early-day Kafka + K8s catastrophes that I’ve audited. Your suggestion to address replica lag was spot on.... "Data not found" bugs after a write are the kinds of silent killers nobody ever warns about. ur suggestion to add early visualization of query execution plans was excellent. These kinds of high-leverage behaviors will pay off forever. Your tool sounds awesome for surfacing the kinds of data flow decisions we make in Stage 1 and forget. I’d love to take a look if you share a link. Thanks for the excellent and thoughtful feedback!

Raghavendra Kedlaya • Apr 6

Bottom line - Technologies are great, it is all about how these can be put to use. Whether PHP, Java - all of them can scale well. Looking at 10M requests Per day - I read the title again, and related it better with my experience. While 10M is high, but it is not so much I think from some of the large applications perspective. In one of the portal(controlled access) where I support, I see requests approximately 1000 per sec - at the front end apache ssl_access.log and in that more than 70% hit the backend API servers. No "standard" tools are in use. Simple Stack - 1) Auth server does login and gives token, 2) calls Java Spring Boot backend. 3) Dynamic masters related calls are cached at spring boot, does not even reach the DB. 4) PostgreSQL DB - scaled horizontally for business reasons ( 36 instances), for separation of data sets.

Common application codebase, comon db schema. Each database instance connected with its own java backend. Calls are routed to respective API servers - Apache does that.
Nothing more.
There is scope for further improvements - shift the read heavy report queries etc.
I continue to explore - but use them wisely - for example TimeScaleDB would be a great addition.