DEV Community

Cover image for From HTTP Chaos to Kafka: How We Fixed Inter-Service Communication in a NestJS Microservices Platform
Ritesh Macwan
Ritesh Macwan

Posted on

From HTTP Chaos to Kafka: How We Fixed Inter-Service Communication in a NestJS Microservices Platform

From HTTP Chaos to Kafka: How We Fixed Inter-Service Communication in a NestJS Microservices Platform

A technical deep-dive into replacing synchronous HTTP calls with Kafka-based async messaging — covering architecture decisions, NestJS implementation, Redis caching, and BullMQ for background processing.


The Problem: Synchronous HTTP in a Distributed System

On a production NestJS microservices platform I was working on, services were communicating entirely over synchronous HTTP. On paper, this looks fine. In production, under load, it fell apart fast.

Here's what we were dealing with:

  • Cascading failures — if Service B was slow, Service A timed out, and the user felt it
  • Retry storms — failed requests were retried, multiplying the load on already struggling services
  • Tight coupling — every service needed to know the exact endpoint, auth, and contract of every other service it called
  • Blocking threads — Node.js event loop threads were being held waiting for HTTP responses

A simplified version of what the architecture looked like:

User Request
    │
    ▼
[API Gateway]
    │  HTTP POST /order
    ▼
[Order Service]
    │  HTTP POST /inventory/check       ← blocking
    ▼
[Inventory Service]
    │  HTTP POST /notification/send     ← blocking
    ▼
[Notification Service]
Enter fullscreen mode Exit fullscreen mode

Every step in this chain was synchronous. If Notification Service had a spike in latency, it bubbled all the way up to the user's response time. If it went down, orders failed.


The Solution: Kafka-Based Async Messaging

Apache Kafka gave us a way to decouple producers from consumers completely. Services no longer call each other — they emit events, and other services react to those events independently.

The new architecture:

User Request
    │
    ▼
[API Gateway]
    │  HTTP POST /order
    ▼
[Order Service]  ──── kafka: order.created ────▶ [Inventory Service]
                 ──── kafka: order.created ────▶ [Notification Service]
                 ──── kafka: order.created ────▶ [Analytics Service]
    │
    ▼ (immediate response — no waiting)
User gets 202 Accepted
Enter fullscreen mode Exit fullscreen mode

The Order Service emits a single event. Every downstream service consumes it independently, at its own pace, with its own retry logic. The user gets an immediate response.


NestJS Implementation

Setting Up the Kafka Module

We used @nestjs/microservices with the Kafka transport. Here's how we configured it:

// main.ts
import { NestFactory } from '@nestjs/core';
import { MicroserviceOptions, Transport } from '@nestjs/microservices';
import { AppModule } from './app.module';

async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  app.connectMicroservice<MicroserviceOptions>({
    transport: Transport.KAFKA,
    options: {
      client: {
        brokers: [process.env.KAFKA_BROKER],
        clientId: 'order-service',
      },
      consumer: {
        groupId: 'order-service-consumer',
      },
    },
  });

  await app.startAllMicroservices();
  await app.listen(3000);
}
bootstrap();
Enter fullscreen mode Exit fullscreen mode

Producing Events (Order Service)

// order.service.ts
import { Injectable, Inject } from '@nestjs/common';
import { ClientKafka } from '@nestjs/microservices';

@Injectable()
export class OrderService {
  constructor(
    @Inject('KAFKA_SERVICE') private readonly kafkaClient: ClientKafka,
  ) {}

  async createOrder(createOrderDto: CreateOrderDto) {
    const order = await this.orderRepository.save(createOrderDto);

    // Fire and forget — no waiting for downstream services
    this.kafkaClient.emit('order.created', {
      orderId: order.id,
      userId: order.userId,
      items: order.items,
      totalAmount: order.totalAmount,
      createdAt: new Date().toISOString(),
    });

    return { orderId: order.id, status: 'processing' };
  }
}
Enter fullscreen mode Exit fullscreen mode

Consuming Events (Inventory Service)

// inventory.controller.ts
import { Controller } from '@nestjs/common';
import { EventPattern, Payload } from '@nestjs/microservices';

@Controller()
export class InventoryController {
  constructor(private readonly inventoryService: InventoryService) {}

  @EventPattern('order.created')
  async handleOrderCreated(@Payload() data: OrderCreatedEvent) {
    await this.inventoryService.reserveStock(data.orderId, data.items);
  }
}
Enter fullscreen mode Exit fullscreen mode

Each service has its own consumer group, so every service receives every order.created event independently.


Adding Redis Caching to Reduce Repeat Load

Kafka solved the async problem. But we still had expensive database reads happening repeatedly for the same data — product catalogue, user profiles, configuration.

We introduced Redis as a caching layer using @nestjs/cache-manager:

// product.service.ts
import { Injectable } from '@nestjs/common';
import { Cache } from 'cache-manager';
import { InjectRepository } from '@nestjs/typeorm';

@Injectable()
export class ProductService {
  constructor(
    private cacheManager: Cache,
    @InjectRepository(Product)
    private productRepository: Repository<Product>,
  ) {}

  async getProduct(productId: string): Promise<Product> {
    const cacheKey = `product:${productId}`;

    // Check cache first
    const cached = await this.cacheManager.get<Product>(cacheKey);
    if (cached) return cached;

    // Miss — fetch from DB and cache result
    const product = await this.productRepository.findOne({
      where: { id: productId },
    });

    await this.cacheManager.set(cacheKey, product, 300); // 5 min TTL
    return product;
  }

  async invalidateProduct(productId: string): Promise<void> {
    await this.cacheManager.del(`product:${productId}`);
  }
}
Enter fullscreen mode Exit fullscreen mode

Cache invalidation was handled by listening to Kafka events. When a product was updated, the product service emitted a product.updated event, and any consumer that cached that product would invalidate its entry.


BullMQ for Heavy Background Tasks

Not everything belongs on Kafka. For heavy, retry-sensitive background tasks — PDF generation, email sending, report processing — we introduced BullMQ backed by Redis.

The key distinction we settled on:

Use Case Tool
Inter-service events (order created, user registered) Kafka
Heavy background jobs (PDF, email, exports) BullMQ
Repeated read-heavy data Redis Cache
// report.processor.ts
import { Processor, WorkerHost } from '@nestjs/bullmq';
import { Job } from 'bullmq';

@Processor('report-generation')
export class ReportProcessor extends WorkerHost {
  async process(job: Job<ReportJobData>): Promise<void> {
    const { userId, reportType, dateRange } = job.data;

    // This can take 10–30 seconds — fine in a background job
    const reportData = await this.analyticsService.generateReport(
      userId,
      reportType,
      dateRange,
    );

    const pdfBuffer = await this.pdfService.generate(reportData);
    await this.storageService.upload(`reports/${userId}/${job.id}.pdf`, pdfBuffer);

    // Notify user via Kafka when done
    this.kafkaClient.emit('report.ready', { userId, reportUrl: `...` });
  }
}
Enter fullscreen mode Exit fullscreen mode

BullMQ gives you built-in retries, concurrency control, and job prioritisation — things Kafka is not designed for at the job level.


Results

After the migration:

  • Response times dropped significantly — the critical request path no longer waited on downstream services
  • System stability improved under load — a slow consumer no longer caused cascading failures upstream
  • Architecture became cleanly decoupled — adding a new service that reacted to order.created required zero changes to the Order Service
  • Background tasks were offloaded — the main event loop stayed free for user-facing requests

Key Lessons Learned

1. Not everything needs Kafka. We initially over-used it. Short-lived, job-oriented tasks belong in BullMQ. Kafka shines for durable, multi-consumer event streams.

2. Consumer group naming matters. Each service must have a unique groupId. Getting this wrong means multiple instances of the same service all consume the same message — or worse, they compete and only one gets it.

3. Schema discipline is non-negotiable. Once you have three services consuming order.created, changing the event payload becomes a coordination challenge. We moved to versioned event schemas early.

4. Idempotency is your responsibility. Kafka can deliver a message more than once. Every consumer must handle duplicate events gracefully — usually by checking if the work has already been done before processing.

5. Dead Letter Queues save you. We configured DLQ topics for every consumer so failed messages didn't just disappear — they landed somewhere we could inspect and replay.


Final Architecture

                         ┌─────────────────┐
                         │   API Gateway   │
                         └────────┬────────┘
                                  │ HTTP
                         ┌────────▼────────┐
                         │  Order Service  │──── Redis Cache
                         └────────┬────────┘
                                  │ Kafka: order.created
              ┌───────────────────┼───────────────────┐
              │                   │                   │
   ┌──────────▼──────┐  ┌────────▼────────┐  ┌──────▼──────────┐
   │Inventory Service│  │  Notification   │  │Analytics Service│
   │  + Redis Cache  │  │    Service      │  │                 │
   └─────────────────┘  └────────┬────────┘  └─────────────────┘
                                  │ BullMQ
                         ┌────────▼────────┐
                         │  Email / PDF    │
                         │   Processor     │
                         └─────────────────┘
Enter fullscreen mode Exit fullscreen mode

Wrapping Up

Migrating from synchronous HTTP to Kafka-based async messaging was one of the highest-impact architectural changes we made. The system became more resilient, more scalable, and easier to extend.

If you're running NestJS microservices and hitting the same walls — timeouts, cascading failures, tight coupling — Kafka is worth the investment. Start with one event type, get comfortable with the consumer group model, and expand from there.


Ritesh Macwan is a Senior Backend Engineer specialising in NestJS, Kafka, Kubernetes, and distributed systems. Connect on LinkedIn or explore projects on GitHub.

Top comments (1)

Collapse
 
riteshmacwan profile image
Ritesh Macwan

One thing we wrestled with during this migration was schema evolution. How is everyone else handling breaking changes in your Kafka events? Would love to hear your approaches!