From HTTP Chaos to Kafka: How We Fixed Inter-Service Communication in a NestJS Microservices Platform
A technical deep-dive into replacing synchronous HTTP calls with Kafka-based async messaging — covering architecture decisions, NestJS implementation, Redis caching, and BullMQ for background processing.
The Problem: Synchronous HTTP in a Distributed System
On a production NestJS microservices platform I was working on, services were communicating entirely over synchronous HTTP. On paper, this looks fine. In production, under load, it fell apart fast.
Here's what we were dealing with:
- Cascading failures — if Service B was slow, Service A timed out, and the user felt it
- Retry storms — failed requests were retried, multiplying the load on already struggling services
- Tight coupling — every service needed to know the exact endpoint, auth, and contract of every other service it called
- Blocking threads — Node.js event loop threads were being held waiting for HTTP responses
A simplified version of what the architecture looked like:
User Request
│
▼
[API Gateway]
│ HTTP POST /order
▼
[Order Service]
│ HTTP POST /inventory/check ← blocking
▼
[Inventory Service]
│ HTTP POST /notification/send ← blocking
▼
[Notification Service]
Every step in this chain was synchronous. If Notification Service had a spike in latency, it bubbled all the way up to the user's response time. If it went down, orders failed.
The Solution: Kafka-Based Async Messaging
Apache Kafka gave us a way to decouple producers from consumers completely. Services no longer call each other — they emit events, and other services react to those events independently.
The new architecture:
User Request
│
▼
[API Gateway]
│ HTTP POST /order
▼
[Order Service] ──── kafka: order.created ────▶ [Inventory Service]
──── kafka: order.created ────▶ [Notification Service]
──── kafka: order.created ────▶ [Analytics Service]
│
▼ (immediate response — no waiting)
User gets 202 Accepted
The Order Service emits a single event. Every downstream service consumes it independently, at its own pace, with its own retry logic. The user gets an immediate response.
NestJS Implementation
Setting Up the Kafka Module
We used @nestjs/microservices with the Kafka transport. Here's how we configured it:
// main.ts
import { NestFactory } from '@nestjs/core';
import { MicroserviceOptions, Transport } from '@nestjs/microservices';
import { AppModule } from './app.module';
async function bootstrap() {
const app = await NestFactory.create(AppModule);
app.connectMicroservice<MicroserviceOptions>({
transport: Transport.KAFKA,
options: {
client: {
brokers: [process.env.KAFKA_BROKER],
clientId: 'order-service',
},
consumer: {
groupId: 'order-service-consumer',
},
},
});
await app.startAllMicroservices();
await app.listen(3000);
}
bootstrap();
Producing Events (Order Service)
// order.service.ts
import { Injectable, Inject } from '@nestjs/common';
import { ClientKafka } from '@nestjs/microservices';
@Injectable()
export class OrderService {
constructor(
@Inject('KAFKA_SERVICE') private readonly kafkaClient: ClientKafka,
) {}
async createOrder(createOrderDto: CreateOrderDto) {
const order = await this.orderRepository.save(createOrderDto);
// Fire and forget — no waiting for downstream services
this.kafkaClient.emit('order.created', {
orderId: order.id,
userId: order.userId,
items: order.items,
totalAmount: order.totalAmount,
createdAt: new Date().toISOString(),
});
return { orderId: order.id, status: 'processing' };
}
}
Consuming Events (Inventory Service)
// inventory.controller.ts
import { Controller } from '@nestjs/common';
import { EventPattern, Payload } from '@nestjs/microservices';
@Controller()
export class InventoryController {
constructor(private readonly inventoryService: InventoryService) {}
@EventPattern('order.created')
async handleOrderCreated(@Payload() data: OrderCreatedEvent) {
await this.inventoryService.reserveStock(data.orderId, data.items);
}
}
Each service has its own consumer group, so every service receives every order.created event independently.
Adding Redis Caching to Reduce Repeat Load
Kafka solved the async problem. But we still had expensive database reads happening repeatedly for the same data — product catalogue, user profiles, configuration.
We introduced Redis as a caching layer using @nestjs/cache-manager:
// product.service.ts
import { Injectable } from '@nestjs/common';
import { Cache } from 'cache-manager';
import { InjectRepository } from '@nestjs/typeorm';
@Injectable()
export class ProductService {
constructor(
private cacheManager: Cache,
@InjectRepository(Product)
private productRepository: Repository<Product>,
) {}
async getProduct(productId: string): Promise<Product> {
const cacheKey = `product:${productId}`;
// Check cache first
const cached = await this.cacheManager.get<Product>(cacheKey);
if (cached) return cached;
// Miss — fetch from DB and cache result
const product = await this.productRepository.findOne({
where: { id: productId },
});
await this.cacheManager.set(cacheKey, product, 300); // 5 min TTL
return product;
}
async invalidateProduct(productId: string): Promise<void> {
await this.cacheManager.del(`product:${productId}`);
}
}
Cache invalidation was handled by listening to Kafka events. When a product was updated, the product service emitted a product.updated event, and any consumer that cached that product would invalidate its entry.
BullMQ for Heavy Background Tasks
Not everything belongs on Kafka. For heavy, retry-sensitive background tasks — PDF generation, email sending, report processing — we introduced BullMQ backed by Redis.
The key distinction we settled on:
| Use Case | Tool |
|---|---|
| Inter-service events (order created, user registered) | Kafka |
| Heavy background jobs (PDF, email, exports) | BullMQ |
| Repeated read-heavy data | Redis Cache |
// report.processor.ts
import { Processor, WorkerHost } from '@nestjs/bullmq';
import { Job } from 'bullmq';
@Processor('report-generation')
export class ReportProcessor extends WorkerHost {
async process(job: Job<ReportJobData>): Promise<void> {
const { userId, reportType, dateRange } = job.data;
// This can take 10–30 seconds — fine in a background job
const reportData = await this.analyticsService.generateReport(
userId,
reportType,
dateRange,
);
const pdfBuffer = await this.pdfService.generate(reportData);
await this.storageService.upload(`reports/${userId}/${job.id}.pdf`, pdfBuffer);
// Notify user via Kafka when done
this.kafkaClient.emit('report.ready', { userId, reportUrl: `...` });
}
}
BullMQ gives you built-in retries, concurrency control, and job prioritisation — things Kafka is not designed for at the job level.
Results
After the migration:
- Response times dropped significantly — the critical request path no longer waited on downstream services
- System stability improved under load — a slow consumer no longer caused cascading failures upstream
-
Architecture became cleanly decoupled — adding a new service that reacted to
order.createdrequired zero changes to the Order Service - Background tasks were offloaded — the main event loop stayed free for user-facing requests
Key Lessons Learned
1. Not everything needs Kafka. We initially over-used it. Short-lived, job-oriented tasks belong in BullMQ. Kafka shines for durable, multi-consumer event streams.
2. Consumer group naming matters. Each service must have a unique groupId. Getting this wrong means multiple instances of the same service all consume the same message — or worse, they compete and only one gets it.
3. Schema discipline is non-negotiable. Once you have three services consuming order.created, changing the event payload becomes a coordination challenge. We moved to versioned event schemas early.
4. Idempotency is your responsibility. Kafka can deliver a message more than once. Every consumer must handle duplicate events gracefully — usually by checking if the work has already been done before processing.
5. Dead Letter Queues save you. We configured DLQ topics for every consumer so failed messages didn't just disappear — they landed somewhere we could inspect and replay.
Final Architecture
┌─────────────────┐
│ API Gateway │
└────────┬────────┘
│ HTTP
┌────────▼────────┐
│ Order Service │──── Redis Cache
└────────┬────────┘
│ Kafka: order.created
┌───────────────────┼───────────────────┐
│ │ │
┌──────────▼──────┐ ┌────────▼────────┐ ┌──────▼──────────┐
│Inventory Service│ │ Notification │ │Analytics Service│
│ + Redis Cache │ │ Service │ │ │
└─────────────────┘ └────────┬────────┘ └─────────────────┘
│ BullMQ
┌────────▼────────┐
│ Email / PDF │
│ Processor │
└─────────────────┘
Wrapping Up
Migrating from synchronous HTTP to Kafka-based async messaging was one of the highest-impact architectural changes we made. The system became more resilient, more scalable, and easier to extend.
If you're running NestJS microservices and hitting the same walls — timeouts, cascading failures, tight coupling — Kafka is worth the investment. Start with one event type, get comfortable with the consumer group model, and expand from there.
Ritesh Macwan is a Senior Backend Engineer specialising in NestJS, Kafka, Kubernetes, and distributed systems. Connect on LinkedIn or explore projects on GitHub.
Top comments (1)
One thing we wrestled with during this migration was schema evolution. How is everyone else handling breaking changes in your Kafka events? Would love to hear your approaches!