The Architecture Decisions That Actually Mattered: Building a Production-Ready Multi-Service Backend

U C H E N N A — Sun, 31 May 2026 23:26:21 +0000

What most system design articles skip is the part where you explain why the boring choice was the right one

I built a platform that runs social media giveaway events, a gift card marketplace, and telecom gift vending — all in one system, all sharing the same PostgreSQL database and Redis instance, all running in production on infrastructure that costs €27 a month.

This article is not about the product. It is about the seven architectural decisions I made while building the backend, why I made them, and what the actual capacity ceiling looks like before anything needs to change.

The full architecture document (20 sections, every design decision documented) is linked at the end. This is the version you can read in ten minutes.

1. Three services, not ten — and not one

The backend is a NestJS monorepo with three independently deployable applications:

Giveaway API (port 5000) — events, participants, host wallet, auth, admin
Giftcard API (port 5002) — cards, merchants, escrow, redemptions, merchant wallet
Job Processor (port 5001 / 5003) — background work only, no HTTP surface

This is not a microservice architecture. It is a modular monolith with a deployment boundary.

The question most engineers get wrong is treating "microservices" as the goal rather than as a tool. I had two bounded domains with genuinely different data ownership and access patterns. Splitting them gave me independent deployability and the ability to scale each service's connection pool separately. Splitting further — into separate auth services, notification services, payment services — would have added coordination overhead with no operational gain at this scale.

The monorepo means all three apps share one build pipeline, one migration runner, one set of gRPC contract types, and one test suite. Changes to shared infrastructure (JWT guards, Redis config, payment client) deploy everywhere in one merge. You get the organisational clarity of separate services without the dependency management nightmare of separate repositories.

2. gRPC between services, not HTTP

The Giveaway API and Giftcard API call each other constantly. The Giveaway side calls Giftcard to allocate prize escrows, issue gift card instances, and refund expired holds. The Giftcard side calls Giveaway to validate winner PINs, credit wallet refunds, and log payment audit records.

The standard answer is REST. I used gRPC instead, for three concrete reasons.

Typed contracts enforced at compile time. The request and response shapes live in .proto files and are compiled into TypeScript interfaces that both the caller and handler must satisfy. If you rename a field in the Giftcard service's response, the Giveaway service fails to compile. With REST JSON, that failure happens in production.

Bidirectional calls without auth overhead. Both services call each other, over an internal Docker network. gRPC over TCP with no auth headers, no CORS, no URL routing. Just a typed method call that the runtime handles.

Every gRPC call is persisted through Bull, not just retried in memory. This is the part most architectures get wrong. A custom Proxy (createQueuedGrpcService) intercepts every gRPC method call before it reaches the wire. Instead of calling the stub directly, it enqueues a GRPC_CALL Bull job — serialising the service name, method name, and request payload into Redis. The Job Processor picks up the job and executes the actual gRPC stub:

API calls giftcardService.allocateCards(payload)
    → Proxy intercepts the call
    → Enqueues GRPC_CALL job to Bull (persisted to Redis)
    → Awaits job.finished() with a deadline timeout
    → Job Processor picks up job, executes real gRPC stub
    → Result returned to the waiting caller

The caller still receives the response synchronously — from the outside it behaves like a direct call. But the transport goes through Redis, which means the call survives an API process restart, and Bull's retry policy handles failures at the execution layer. If the Job Processor is temporarily down, the call sits durably in Redis until it comes back.

On top of this, a circuit breaker sits at the proxy level — before any job is enqueued. After five consecutive failures, the circuit opens and the proxy immediately rejects calls without touching Redis or the network. This prevents a struggling downstream service from filling the Bull queue with jobs that will all fail.

3. Redis is doing four different jobs simultaneously

This is the part of the architecture that surprises most engineers when they see it.

The same single Redis instance handles all of the following, concurrently:

Bull queue backend. Six job queues — email, social verification, notifications, analytics, event processing, WebSocket — all backed by Redis lists and sorted sets. AOF persistence is enabled, so queued jobs survive a Redis restart.

Socket.IO pub/sub adapter. The WebSocket gateway runs in a separate process from the API. When the Job Processor needs to emit giveaway.winner to an event room, it publishes to a Redis channel. Every gateway instance subscribed to that channel propagates the event to connected clients. This is what makes horizontal scaling of the WebSocket layer possible without any code changes — add more gateways, they all share rooms via Redis.

TTL-based application cache. A CacheService wraps Redis with a getOrSet(key, factory, ttl) pattern. Merchant wallet balances, bank lists, admin analytics aggregates, service fee rates — all cached with appropriate TTLs and invalidated on write. If Redis is unavailable, the cache degrades gracefully to always calling the database rather than throwing.

Atomic concurrency control. The real-time winner selection algorithm uses a Lua script executed atomically inside Redis. During a live event, up to hundreds of participants may submit prize attempts simultaneously. The script increments a budget counter only if it is below the current time-slot threshold — preventing over-awarding regardless of concurrent load, without any application-level locking.

local v = redis.call('GET', k)
if not v then v = 0 else v = tonumber(v) end
if v < lim then
  v = v + 1
  redis.call('SET', k, v)
  if exp and exp > 0 then redis.call('EXPIRE', k, exp) end
  return v
else
  return -1
end

One Redis instance. Four distinct, production-critical responsibilities. No Kafka, no separate cache cluster, no dedicated counter service.

4. The escrow state machine is where financial correctness lives

Gift card prizes work as follows: a host allocates cards to a prize pool, funds are held in escrow, winners claim their card, and the merchant receives the net settlement when the card is physically redeemed at a branch.

RESERVED → SETTLED   (redeemed at branch POS)
    │
    └─→ RELEASED  (event cancelled / prize unassigned / expiry)

Every state transition is irreversible. Every financial operation carries an idempotency key. The settlement calculation strips out the payment provider's deposit fee before applying the platform service rate — because the provider already took that fee at topup time, and charging a service fee on money the platform never held would be a systematic double-charge on merchants.

The most important design decision here is the ordering of operations at settlement. Two payment transfers must happen: one to the merchant's payout account, one to the platform's fee account. If the database were committed first and then a transfer failed, a compensation transaction would be needed. Instead, both transfers run before any database write. A failure on either leg leaves the database untouched, and the client retries safely — both transfer references are idempotent, so a retry after a partial success is clean.

Finalization, where prize escrows are first created via cross-service gRPC calls, uses an inline saga compensation pattern. If the wallet deduction fails after one or more escrows have been reserved, the catch block immediately refunds each escrow before re-throwing the error. Promise.allSettled ensures a failed refund on one prize does not block the others.

5. The job processor runs in two separate modes

The job processor binary starts in one of three modes controlled by JOB_MODE:

JOB_MODE=worker — email sending, event draws, social API verification, analytics
JOB_MODE=gateway — Socket.IO WebSocket server only
JOB_MODE=all — both, for development

These scale on completely different axes. Workers scale when the job queue backlog grows. Gateways scale when the number of concurrent WebSocket connections grows. A worker crash means queued jobs drain slower — the data is safe in Redis. A gateway crash means connected clients briefly disconnect and reconnect via Socket.IO's built-in reconnection. These are independent failure modes with no coupling.

The API processes never touch Socket.IO directly. When the Giveaway API needs to broadcast a winner announcement to everyone in an event room, it enqueues a job. The gateway dequeues it and emits. This keeps the HTTP event loop free from the overhead of maintaining persistent connections.

Scaling either process is one command:

docker compose up --scale worker-prod=4 -d
docker compose up --scale gateway-prod=3 -d

Bull distributes jobs across all worker instances automatically. The Redis adapter keeps Socket.IO rooms synchronised across all gateway instances automatically. No Kubernetes required.

6. Observability: infrastructure errors go to Sentry, not to users

The observability design makes a deliberate distinction between two categories of errors.

User-facing errors (validation failures, not-found, unauthorised) are returned as structured HTTP responses. The user sees a clear message. The application handles them with ErrorInterceptor and ErrorFilter.

Infrastructure errors (Nomba transfer failed, gRPC circuit open, Bull job exhausted retries, Redis connection dropped) are captured to Sentry in production and never propagate to the user as HTTP 500s with internal stack traces. These are engineering signals, not user messages.

Every infrastructure boundary has an explicit Sentry capture:

@OnQueueFailed()
onFailed(job: Job, error: Error) {
  const isLastAttempt = job.attemptsMade >= (job.opts.attempts ?? 1);
  if (!isLastAttempt) return;

  Sentry.withScope((scope) => {
    scope.setTag('queue', QueueName.EVENT_PROCESSING);
    scope.setExtra('correlationId', job.data?.correlationId);
    Sentry.captureException(error);
  });
}

Sentry capture is disabled in development and staging — development generates constant infrastructure noise (Redis not yet connected, gRPC services starting up) that would drown the production error stream.

Every HTTP request carries an x-correlation-id header that is propagated into every Bull job enqueued during that request. When a social verification job fails ninety seconds after the original HTTP request, the Sentry event includes the correlation ID — making it possible to trace the failure to the originating user action without a distributed tracing platform.

7. The honest capacity ceiling

With one VPS (8 vCPU, 16 GB RAM), one PostgreSQL VPS, and default TypeORM connection pools, this is what the system handles:

Scenario	Capacity
Concurrent browsing / dashboard reads	300–500 req/s
Simultaneous event registrations	~100–200
Concurrent auth operations	~50–100
Concurrent WebSocket connections	5,000–10,000
Total comfortable active users	1,000–3,000
Total registered users (10–20% active)	15,000–30,000

The first wall is the TypeORM connection pool, which defaults to 10 connections per app. Three apps × 10 = 30 connections total. Increasing this to 25 per app costs nothing and roughly triples concurrent write capacity:

extra: {
  max: 25,
  min: 2,
  idleTimeoutMillis: 30000,
}

The WebSocket ceiling is per gateway instance. Each additional gateway instance added via --scale increases this linearly, because the Redis adapter keeps rooms synchronised.

8. The scaling roadmap

The architecture is designed so that each scaling action is independent and requires no code changes:

Stage 1 (now): Single VPS, tune connection pools, run 1 gateway + 1 worker. Handles 15,000–30,000 registered users.

Stage 2: Add more workers (--scale worker-prod=3). Scales queue throughput. No infrastructure change.

Stage 3: Move Redis to a dedicated VPS when adding a second app VPS. All instances point at the same Redis URL. Bull queues and Socket.IO rooms synchronise automatically.

Stage 4: Add a second app VPS. Traefik load-balances across both. No code changes.

Stage 5 (Kubernetes): Only justified when managing 3+ VPS nodes manually becomes operationally expensive. The current Docker Compose setup maps nearly 1:1 to Kubernetes Deployments when that time comes — stateless containers, externalised config, health checks already defined.

The full architecture document covers every decision documented here in depth, plus: the database schema separation design, the full gRPC contract structure, the presigned URL file upload pattern, the drift-corrected countdown timer implementation, the complete financial flow for every money movement in the system, and the detailed scaling migration procedures.

Read the full architecture document → https://blog.ucheofor.xyz/post/537tg0

Built with NestJS 11, TypeScript, PostgreSQL 15, Redis 7, gRPC, Bull, Socket.IO, Traefik, and Docker on Hetzner Cloud.

DEV Community: U C H E N N A