DEV Community

Nadim Chowdhury
Nadim Chowdhury

Posted on

Building a Production-Ready Real-Time Chat App with NestJS

Look, I've built a lot of chat applications over the years, and every single time I think "this one will be different." Spoiler alert: they're all the same until they're not. The difference between a toy project and something actually production-ready comes down to architecture, and that's what we're diving into today.

I recently spent three weeks building a real-time chat application with NestJS, Socket.IO, Redis, and PostgreSQL. Not because the world needs another chat app, but because I wanted to understand how to structure a WebSocket-heavy application that could actually scale. Here's everything I learned, broken down into digestible chunks.

Why NestJS Though?

Before you roll your eyes at yet another framework choice, hear me out. Express is great for simple APIs, but the moment you introduce WebSockets, background jobs, and complex business logic, things get messy fast. NestJS gave me:

  • Dependency injection that actually makes sense
  • Built-in WebSocket support with decorators
  • TypeScript everywhere (yes, this is a feature)
  • A structure that forces you to think in modules

Plus, the transition from Express isn't painful. It's more like Express grew up and got its life together.

The Architecture That Actually Works

Here's the thing about chat applications: they're deceptively simple. Send message, receive message, done. Except you need typing indicators, read receipts, user presence, message history, file uploads, and suddenly you're debugging race conditions at 2 AM.

High-Level System Design

┌─────────────┐         ┌──────────────┐         ┌─────────────┐
│   Client    │────────▶│   NestJS     │────────▶│ PostgreSQL  │
│  (React/    │◀────────│   Server     │◀────────│  Database   │
│   Next.js)  │         └──────────────┘         └─────────────┘
└─────────────┘                │                         
                               │                         
                               ▼                         
                      ┌──────────────┐                  
                      │    Redis     │                  
                      │   (Cache +   │                  
                      │   PubSub)    │                  
                      └──────────────┘                  
                               │                         
                               ▼                         
                      ┌──────────────┐                  
                      │  Socket.IO   │                  
                      │  (WebSocket) │                  
                      └──────────────┘                  
Enter fullscreen mode Exit fullscreen mode

The key insight here: Redis isn't optional. It's doing double duty as our cache layer and our pub/sub system for horizontal scaling. When User A sends a message, it might hit Server 1, but User B is connected to Server 2. Redis ensures everyone gets the message.

Project Structure That Scales

Here's the folder structure I settled on after multiple refactors:

chat-app/
├── src/
│   ├── main.ts
│   ├── app.module.ts
│   ├── config/
│   │   ├── database.config.ts
│   │   ├── redis.config.ts
│   │   └── socket.config.ts
│   ├── common/
│   │   ├── decorators/
│   │   ├── filters/
│   │   ├── guards/
│   │   ├── interceptors/
│   │   └── pipes/
│   ├── modules/
│   │   ├── auth/
│   │   │   ├── auth.controller.ts
│   │   │   ├── auth.service.ts
│   │   │   ├── auth.module.ts
│   │   │   ├── strategies/
│   │   │   │   ├── jwt.strategy.ts
│   │   │   │   └── local.strategy.ts
│   │   │   └── dto/
│   │   │       ├── login.dto.ts
│   │   │       └── register.dto.ts
│   │   ├── users/
│   │   │   ├── users.controller.ts
│   │   │   ├── users.service.ts
│   │   │   ├── users.module.ts
│   │   │   ├── entities/
│   │   │   │   └── user.entity.ts
│   │   │   └── dto/
│   │   │       └── update-user.dto.ts
│   │   ├── chat/
│   │   │   ├── chat.gateway.ts
│   │   │   ├── chat.service.ts
│   │   │   ├── chat.module.ts
│   │   │   ├── entities/
│   │   │   │   ├── message.entity.ts
│   │   │   │   ├── conversation.entity.ts
│   │   │   │   └── participant.entity.ts
│   │   │   └── dto/
│   │   │       ├── send-message.dto.ts
│   │   │       └── create-conversation.dto.ts
│   │   ├── presence/
│   │   │   ├── presence.gateway.ts
│   │   │   ├── presence.service.ts
│   │   │   └── presence.module.ts
│   │   ├── notifications/
│   │   │   ├── notifications.gateway.ts
│   │   │   ├── notifications.service.ts
│   │   │   └── notifications.module.ts
│   │   └── upload/
│   │       ├── upload.controller.ts
│   │       ├── upload.service.ts
│   │       └── upload.module.ts
│   └── database/
│       ├── migrations/
│       └── seeds/
├── test/
├── dist/
├── node_modules/
├── .env
├── .env.example
├── .gitignore
├── nest-cli.json
├── package.json
├── tsconfig.json
└── README.md
Enter fullscreen mode Exit fullscreen mode

Each module is self-contained. The chat module doesn't know about auth implementation details, only that it needs a valid user. This made testing so much easier.

Core Modules Breakdown

Authentication Module

JWT-based auth with refresh tokens. Nothing fancy, but it works:

// auth/strategies/jwt.strategy.ts
@Injectable()
export class JwtStrategy extends PassportStrategy(Strategy) {
  constructor(private configService: ConfigService) {
    super({
      jwtFromRequest: ExtractJwt.fromAuthHeaderAsBearerToken(),
      secretOrKey: configService.get('JWT_SECRET'),
    });
  }

  async validate(payload: any) {
    return { userId: payload.sub, username: payload.username };
  }
}
Enter fullscreen mode Exit fullscreen mode

The refresh token flow lives in Redis with a 7-day TTL. When a token expires, hit the refresh endpoint, get a new pair, keep going.

Chat Gateway (The Heart)

This is where WebSockets come alive. The gateway handles real-time message delivery, typing indicators, and read receipts:

// chat/chat.gateway.ts
@WebSocketGateway({
  cors: { origin: '*' },
  namespace: '/chat',
})
export class ChatGateway implements OnGatewayConnection, OnGatewayDisconnect {
  @WebSocketServer()
  server: Server;

  constructor(
    private chatService: ChatService,
    private presenceService: PresenceService,
  ) {}

  async handleConnection(client: Socket) {
    const user = await this.authenticateSocket(client);
    if (!user) {
      client.disconnect();
      return;
    }
    await this.presenceService.setUserOnline(user.id, client.id);
  }

  @SubscribeMessage('sendMessage')
  async handleMessage(
    @ConnectedSocket() client: Socket,
    @MessageBody() payload: SendMessageDto,
  ) {
    const message = await this.chatService.createMessage(payload);

    // Emit to all participants in the conversation
    const participants = await this.chatService.getConversationParticipants(
      payload.conversationId,
    );

    participants.forEach((participantId) => {
      this.server.to(participantId).emit('newMessage', message);
    });

    return message;
  }

  @SubscribeMessage('typing')
  async handleTyping(
    @ConnectedSocket() client: Socket,
    @MessageBody() payload: { conversationId: string; isTyping: boolean },
  ) {
    client.to(payload.conversationId).emit('userTyping', {
      userId: client.data.userId,
      isTyping: payload.isTyping,
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

The trick here is using rooms. Each conversation is a room, and Socket.IO handles the heavy lifting of message distribution.

Presence System

Online/offline/away status, powered by Redis:

// presence/presence.service.ts
@Injectable()
export class PresenceService {
  constructor(@Inject('REDIS_CLIENT') private redis: Redis) {}

  async setUserOnline(userId: string, socketId: string) {
    await this.redis.hset('user:presence', userId, 'online');
    await this.redis.hset('user:sockets', userId, socketId);
    await this.redis.expire(`user:presence:${userId}`, 300); // 5 min timeout
  }

  async setUserOffline(userId: string) {
    await this.redis.hdel('user:presence', userId);
    await this.redis.hdel('user:sockets', userId);
  }

  async getUserPresence(userId: string): Promise<string> {
    return (await this.redis.hget('user:presence', userId)) || 'offline';
  }
}
Enter fullscreen mode Exit fullscreen mode

Every 30 seconds, connected clients send a heartbeat. If we don't hear from them, Redis TTL handles marking them offline. Simple, effective, scalable.

Database Schema

PostgreSQL with TypeORM. Here's the entity relationship:

users
├── id (UUID, PK)
├── username (unique)
├── email (unique)
├── password_hash
├── avatar_url
├── created_at
└── updated_at

conversations
├── id (UUID, PK)
├── type (direct, group)
├── name (nullable, for groups)
├── created_at
└── updated_at

participants
├── id (UUID, PK)
├── conversation_id (FK → conversations)
├── user_id (FK → users)
├── joined_at
└── last_read_at

messages
├── id (UUID, PK)
├── conversation_id (FK → conversations)
├── sender_id (FK → users)
├── content (text)
├── type (text, image, file)
├── metadata (jsonb)
├── created_at
└── edited_at
Enter fullscreen mode Exit fullscreen mode

The participants table is doing a lot of work here. It manages who's in what conversation and tracks read status. The last_read_at timestamp lets us calculate unread counts efficiently.

The Frontend Connection

I went with a clean, minimal design inspired by modern messaging apps. Think iMessage meets Slack, but without the clutter.

Key UI decisions:

Message List: Virtualized scrolling with react-window. Loading 10,000 messages in the DOM is a browser's worst nightmare. Virtual scrolling renders only what's visible plus a buffer.

Optimistic Updates: When you send a message, it appears immediately with a "sending" indicator. If it fails, show a retry button. Don't make users wait for the server to confirm.

Typing Indicators: Three dots animation, but throttled. If someone's typing rapidly, we're not emitting 50 events per second. Debounce to once every 300ms.

Connection Status: A subtle bar at the top shows connection status. Green for connected, yellow for reconnecting, red for offline. Users should always know what's happening.

Performance Optimizations That Matter

Message Pagination

Never load all messages. Ever. Implement cursor-based pagination:

async getMessages(conversationId: string, cursor?: string, limit = 50) {
  const query = this.messageRepository
    .createQueryBuilder('message')
    .where('message.conversationId = :conversationId', { conversationId })
    .orderBy('message.createdAt', 'DESC')
    .limit(limit);

  if (cursor) {
    query.andWhere('message.createdAt < :cursor', { cursor });
  }

  return await query.getMany();
}
Enter fullscreen mode Exit fullscreen mode

Redis Caching

Recent messages (last 100 per conversation) live in Redis. Database hits happen only for older messages:

async getCachedMessages(conversationId: string) {
  const cached = await this.redis.lrange(
    `conversation:${conversationId}:messages`,
    0,
    99,
  );

  if (cached.length > 0) {
    return cached.map(msg => JSON.parse(msg));
  }

  // Cache miss, hit the database
  const messages = await this.getMessagesFromDb(conversationId);
  await this.cacheMessages(conversationId, messages);
  return messages;
}
Enter fullscreen mode Exit fullscreen mode

Connection Pooling

Database connections are expensive. Connection pooling is non-negotiable:

// database.config.ts
export default {
  type: 'postgres',
  host: process.env.DB_HOST,
  port: parseInt(process.env.DB_PORT, 10),
  username: process.env.DB_USERNAME,
  password: process.env.DB_PASSWORD,
  database: process.env.DB_DATABASE,
  entities: [__dirname + '/../**/*.entity{.ts,.js}'],
  synchronize: false,
  logging: process.env.NODE_ENV === 'development',
  extra: {
    max: 20, // maximum pool size
    min: 5,  // minimum pool size
    idleTimeoutMillis: 30000,
  },
};
Enter fullscreen mode Exit fullscreen mode

Deployment Strategy

Containerized with Docker, orchestrated with Docker Compose for development, Kubernetes for production.

# docker-compose.yml
version: '3.8'
services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgresql://user:pass@db:5432/chatapp
      - REDIS_URL=redis://redis:6379
    depends_on:
      - db
      - redis

  db:
    image: postgres:15
    environment:
      POSTGRES_DB: chatapp
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
    volumes:
      - postgres_data:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:
Enter fullscreen mode Exit fullscreen mode

For production, each component scales independently. Three app instances behind a load balancer, Redis cluster for high availability, managed PostgreSQL from your cloud provider.

Lessons Learned

WebSockets are stateful: This complicates load balancing. Use sticky sessions or Redis adapter for Socket.IO to share state across instances.

Error handling matters more than you think: Users will have flaky connections. Implement automatic reconnection with exponential backoff. Show clear error messages.

Database migrations are your friend: Never modify the database schema manually. TypeORM migrations or Prisma migrations keep everything in sync across environments.

Logging everything: When things break (and they will), detailed logs are the difference between fixing it in 10 minutes or 10 hours. I used Winston with correlation IDs to trace requests across services.

Testing WebSockets is hard: Unit tests are straightforward, but integration tests for WebSocket flows require patience. I used Socket.IO client in tests to simulate real connections.

What's Next?

This architecture handles thousands of concurrent users without breaking a sweat. But there's always room to grow:

  • End-to-end encryption for private conversations
  • Voice and video calling with WebRTC
  • Message search with Elasticsearch
  • Analytics dashboard for user behavior
  • Mobile apps with React Native
  • Push notifications for offline users

The foundation is solid. Everything else is just features.

Final Thoughts

Building a chat app taught me more about real-time systems, WebSocket management, and scalable architecture than any tutorial ever could. The code is messy in places, there are TODOs scattered around, and I'm not entirely happy with how I handled file uploads. But it works, it's fast, and it's maintainable.

That's the goal, right? Not perfect code, but code that solves problems and can evolve as requirements change.

If you're building something similar, steal these ideas. Improve on them. Break them. Then tell me what you learned.


That's a wrap 🎁

Now go touch some code 👨‍💻

Catch me here → LinkedIn | GitHub | YouTube

Top comments (0)