Look, I've built a lot of chat applications over the years, and every single time I think "this one will be different." Spoiler alert: they're all the same until they're not. The difference between a toy project and something actually production-ready comes down to architecture, and that's what we're diving into today.
I recently spent three weeks building a real-time chat application with NestJS, Socket.IO, Redis, and PostgreSQL. Not because the world needs another chat app, but because I wanted to understand how to structure a WebSocket-heavy application that could actually scale. Here's everything I learned, broken down into digestible chunks.
Why NestJS Though?
Before you roll your eyes at yet another framework choice, hear me out. Express is great for simple APIs, but the moment you introduce WebSockets, background jobs, and complex business logic, things get messy fast. NestJS gave me:
- Dependency injection that actually makes sense
- Built-in WebSocket support with decorators
- TypeScript everywhere (yes, this is a feature)
- A structure that forces you to think in modules
Plus, the transition from Express isn't painful. It's more like Express grew up and got its life together.
The Architecture That Actually Works
Here's the thing about chat applications: they're deceptively simple. Send message, receive message, done. Except you need typing indicators, read receipts, user presence, message history, file uploads, and suddenly you're debugging race conditions at 2 AM.
High-Level System Design
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Client │────────▶│ NestJS │────────▶│ PostgreSQL │
│ (React/ │◀────────│ Server │◀────────│ Database │
│ Next.js) │ └──────────────┘ └─────────────┘
└─────────────┘ │
│
▼
┌──────────────┐
│ Redis │
│ (Cache + │
│ PubSub) │
└──────────────┘
│
▼
┌──────────────┐
│ Socket.IO │
│ (WebSocket) │
└──────────────┘
The key insight here: Redis isn't optional. It's doing double duty as our cache layer and our pub/sub system for horizontal scaling. When User A sends a message, it might hit Server 1, but User B is connected to Server 2. Redis ensures everyone gets the message.
Project Structure That Scales
Here's the folder structure I settled on after multiple refactors:
chat-app/
├── src/
│ ├── main.ts
│ ├── app.module.ts
│ ├── config/
│ │ ├── database.config.ts
│ │ ├── redis.config.ts
│ │ └── socket.config.ts
│ ├── common/
│ │ ├── decorators/
│ │ ├── filters/
│ │ ├── guards/
│ │ ├── interceptors/
│ │ └── pipes/
│ ├── modules/
│ │ ├── auth/
│ │ │ ├── auth.controller.ts
│ │ │ ├── auth.service.ts
│ │ │ ├── auth.module.ts
│ │ │ ├── strategies/
│ │ │ │ ├── jwt.strategy.ts
│ │ │ │ └── local.strategy.ts
│ │ │ └── dto/
│ │ │ ├── login.dto.ts
│ │ │ └── register.dto.ts
│ │ ├── users/
│ │ │ ├── users.controller.ts
│ │ │ ├── users.service.ts
│ │ │ ├── users.module.ts
│ │ │ ├── entities/
│ │ │ │ └── user.entity.ts
│ │ │ └── dto/
│ │ │ └── update-user.dto.ts
│ │ ├── chat/
│ │ │ ├── chat.gateway.ts
│ │ │ ├── chat.service.ts
│ │ │ ├── chat.module.ts
│ │ │ ├── entities/
│ │ │ │ ├── message.entity.ts
│ │ │ │ ├── conversation.entity.ts
│ │ │ │ └── participant.entity.ts
│ │ │ └── dto/
│ │ │ ├── send-message.dto.ts
│ │ │ └── create-conversation.dto.ts
│ │ ├── presence/
│ │ │ ├── presence.gateway.ts
│ │ │ ├── presence.service.ts
│ │ │ └── presence.module.ts
│ │ ├── notifications/
│ │ │ ├── notifications.gateway.ts
│ │ │ ├── notifications.service.ts
│ │ │ └── notifications.module.ts
│ │ └── upload/
│ │ ├── upload.controller.ts
│ │ ├── upload.service.ts
│ │ └── upload.module.ts
│ └── database/
│ ├── migrations/
│ └── seeds/
├── test/
├── dist/
├── node_modules/
├── .env
├── .env.example
├── .gitignore
├── nest-cli.json
├── package.json
├── tsconfig.json
└── README.md
Each module is self-contained. The chat
module doesn't know about auth
implementation details, only that it needs a valid user. This made testing so much easier.
Core Modules Breakdown
Authentication Module
JWT-based auth with refresh tokens. Nothing fancy, but it works:
// auth/strategies/jwt.strategy.ts
@Injectable()
export class JwtStrategy extends PassportStrategy(Strategy) {
constructor(private configService: ConfigService) {
super({
jwtFromRequest: ExtractJwt.fromAuthHeaderAsBearerToken(),
secretOrKey: configService.get('JWT_SECRET'),
});
}
async validate(payload: any) {
return { userId: payload.sub, username: payload.username };
}
}
The refresh token flow lives in Redis with a 7-day TTL. When a token expires, hit the refresh endpoint, get a new pair, keep going.
Chat Gateway (The Heart)
This is where WebSockets come alive. The gateway handles real-time message delivery, typing indicators, and read receipts:
// chat/chat.gateway.ts
@WebSocketGateway({
cors: { origin: '*' },
namespace: '/chat',
})
export class ChatGateway implements OnGatewayConnection, OnGatewayDisconnect {
@WebSocketServer()
server: Server;
constructor(
private chatService: ChatService,
private presenceService: PresenceService,
) {}
async handleConnection(client: Socket) {
const user = await this.authenticateSocket(client);
if (!user) {
client.disconnect();
return;
}
await this.presenceService.setUserOnline(user.id, client.id);
}
@SubscribeMessage('sendMessage')
async handleMessage(
@ConnectedSocket() client: Socket,
@MessageBody() payload: SendMessageDto,
) {
const message = await this.chatService.createMessage(payload);
// Emit to all participants in the conversation
const participants = await this.chatService.getConversationParticipants(
payload.conversationId,
);
participants.forEach((participantId) => {
this.server.to(participantId).emit('newMessage', message);
});
return message;
}
@SubscribeMessage('typing')
async handleTyping(
@ConnectedSocket() client: Socket,
@MessageBody() payload: { conversationId: string; isTyping: boolean },
) {
client.to(payload.conversationId).emit('userTyping', {
userId: client.data.userId,
isTyping: payload.isTyping,
});
}
}
The trick here is using rooms. Each conversation is a room, and Socket.IO handles the heavy lifting of message distribution.
Presence System
Online/offline/away status, powered by Redis:
// presence/presence.service.ts
@Injectable()
export class PresenceService {
constructor(@Inject('REDIS_CLIENT') private redis: Redis) {}
async setUserOnline(userId: string, socketId: string) {
await this.redis.hset('user:presence', userId, 'online');
await this.redis.hset('user:sockets', userId, socketId);
await this.redis.expire(`user:presence:${userId}`, 300); // 5 min timeout
}
async setUserOffline(userId: string) {
await this.redis.hdel('user:presence', userId);
await this.redis.hdel('user:sockets', userId);
}
async getUserPresence(userId: string): Promise<string> {
return (await this.redis.hget('user:presence', userId)) || 'offline';
}
}
Every 30 seconds, connected clients send a heartbeat. If we don't hear from them, Redis TTL handles marking them offline. Simple, effective, scalable.
Database Schema
PostgreSQL with TypeORM. Here's the entity relationship:
users
├── id (UUID, PK)
├── username (unique)
├── email (unique)
├── password_hash
├── avatar_url
├── created_at
└── updated_at
conversations
├── id (UUID, PK)
├── type (direct, group)
├── name (nullable, for groups)
├── created_at
└── updated_at
participants
├── id (UUID, PK)
├── conversation_id (FK → conversations)
├── user_id (FK → users)
├── joined_at
└── last_read_at
messages
├── id (UUID, PK)
├── conversation_id (FK → conversations)
├── sender_id (FK → users)
├── content (text)
├── type (text, image, file)
├── metadata (jsonb)
├── created_at
└── edited_at
The participants
table is doing a lot of work here. It manages who's in what conversation and tracks read status. The last_read_at
timestamp lets us calculate unread counts efficiently.
The Frontend Connection
I went with a clean, minimal design inspired by modern messaging apps. Think iMessage meets Slack, but without the clutter.
Key UI decisions:
Message List: Virtualized scrolling with react-window
. Loading 10,000 messages in the DOM is a browser's worst nightmare. Virtual scrolling renders only what's visible plus a buffer.
Optimistic Updates: When you send a message, it appears immediately with a "sending" indicator. If it fails, show a retry button. Don't make users wait for the server to confirm.
Typing Indicators: Three dots animation, but throttled. If someone's typing rapidly, we're not emitting 50 events per second. Debounce to once every 300ms.
Connection Status: A subtle bar at the top shows connection status. Green for connected, yellow for reconnecting, red for offline. Users should always know what's happening.
Performance Optimizations That Matter
Message Pagination
Never load all messages. Ever. Implement cursor-based pagination:
async getMessages(conversationId: string, cursor?: string, limit = 50) {
const query = this.messageRepository
.createQueryBuilder('message')
.where('message.conversationId = :conversationId', { conversationId })
.orderBy('message.createdAt', 'DESC')
.limit(limit);
if (cursor) {
query.andWhere('message.createdAt < :cursor', { cursor });
}
return await query.getMany();
}
Redis Caching
Recent messages (last 100 per conversation) live in Redis. Database hits happen only for older messages:
async getCachedMessages(conversationId: string) {
const cached = await this.redis.lrange(
`conversation:${conversationId}:messages`,
0,
99,
);
if (cached.length > 0) {
return cached.map(msg => JSON.parse(msg));
}
// Cache miss, hit the database
const messages = await this.getMessagesFromDb(conversationId);
await this.cacheMessages(conversationId, messages);
return messages;
}
Connection Pooling
Database connections are expensive. Connection pooling is non-negotiable:
// database.config.ts
export default {
type: 'postgres',
host: process.env.DB_HOST,
port: parseInt(process.env.DB_PORT, 10),
username: process.env.DB_USERNAME,
password: process.env.DB_PASSWORD,
database: process.env.DB_DATABASE,
entities: [__dirname + '/../**/*.entity{.ts,.js}'],
synchronize: false,
logging: process.env.NODE_ENV === 'development',
extra: {
max: 20, // maximum pool size
min: 5, // minimum pool size
idleTimeoutMillis: 30000,
},
};
Deployment Strategy
Containerized with Docker, orchestrated with Docker Compose for development, Kubernetes for production.
# docker-compose.yml
version: '3.8'
services:
app:
build: .
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- DATABASE_URL=postgresql://user:pass@db:5432/chatapp
- REDIS_URL=redis://redis:6379
depends_on:
- db
- redis
db:
image: postgres:15
environment:
POSTGRES_DB: chatapp
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
volumes:
postgres_data:
redis_data:
For production, each component scales independently. Three app instances behind a load balancer, Redis cluster for high availability, managed PostgreSQL from your cloud provider.
Lessons Learned
WebSockets are stateful: This complicates load balancing. Use sticky sessions or Redis adapter for Socket.IO to share state across instances.
Error handling matters more than you think: Users will have flaky connections. Implement automatic reconnection with exponential backoff. Show clear error messages.
Database migrations are your friend: Never modify the database schema manually. TypeORM migrations or Prisma migrations keep everything in sync across environments.
Logging everything: When things break (and they will), detailed logs are the difference between fixing it in 10 minutes or 10 hours. I used Winston with correlation IDs to trace requests across services.
Testing WebSockets is hard: Unit tests are straightforward, but integration tests for WebSocket flows require patience. I used Socket.IO client in tests to simulate real connections.
What's Next?
This architecture handles thousands of concurrent users without breaking a sweat. But there's always room to grow:
- End-to-end encryption for private conversations
- Voice and video calling with WebRTC
- Message search with Elasticsearch
- Analytics dashboard for user behavior
- Mobile apps with React Native
- Push notifications for offline users
The foundation is solid. Everything else is just features.
Final Thoughts
Building a chat app taught me more about real-time systems, WebSocket management, and scalable architecture than any tutorial ever could. The code is messy in places, there are TODOs scattered around, and I'm not entirely happy with how I handled file uploads. But it works, it's fast, and it's maintainable.
That's the goal, right? Not perfect code, but code that solves problems and can evolve as requirements change.
If you're building something similar, steal these ideas. Improve on them. Break them. Then tell me what you learned.
That's a wrap 🎁
Now go touch some code 👨💻
Top comments (0)