DEV Community

Cover image for How I built horizontally scalable chat server in Go
D Abhiram Karanth
D Abhiram Karanth

Posted on

How I built horizontally scalable chat server in Go

How I built a horizontally scalable chat server in Go.

It's not about sending messages fast.
It's about doing three things at once:
— deliver in real time
— preserve order + durability
— scale without sticky sessions

Project: Signal-Flow

GitHub: https://github.com/abhiram-karanth-core/signal-flow

Demo: https://global-chat-app-web.vercel.app/

Here's how I broke it down


The key insight: stop treating a chat message as one thing.

Every message has a lifecycle with very different requirements at each stage. Collapsing them into one system is where most chat servers break down at scale.


The lifecycle:

  1. User sends a message (Next.js)
  2. Go WebSocket server receives it
  3. Made durable (Kafka)
  4. Fanned out to other users (Redis)
  5. Written to queryable history (Postgres)

Each step needs a different system. Each system does exactly one job.


The first big lesson: real-time delivery and durable storage have opposing needs.

Real-time = sub-millisecond latency
Durable = safety, ordering, replayability

Forcing one system to do both means compromising on both. So don't.


Kafka is the source of truth. Every message hits Kafka first — before fan-out, before anything.

But Kafka doesn't deliver messages to users. It exists so the system can survive failures and recover state.

Correctness lives here. Latency does not.


Why a single Kafka partition?

Message ordering matters more than raw throughput in chat. Ordering is only guaranteed within a partition. Cross-partition ordering breaks chat semantics.

Single partition → ~5k msg/sec cap. But deterministic ordering, simpler consumers, clean replay.

Conscious trade-off. Not a limitation.


Redis makes it feel instant.

It sits on the hot path — broadcasting messages to connected WebSocket clients and coordinating fan-out across multiple Go servers.

Redis is intentionally ephemeral. If it drops a message: Kafka still has it. Clients recover on reconnect. History stays correct.


Postgres makes it usable.

Kafka consumers write to Postgres asynchronously. This means:
— DB slowness never blocks ingestion
— State can be rebuilt from Kafka replay
— Frontend reloads always return consistent history

Postgres is the materialized view users actually interact with.


The most important architectural decision: heavy decoupling between consumers.

Kafka → Redis (real-time fan-out)
Kafka → Postgres (durable writes)

Producer publishes once. Doesn't care who consumes. Each consumer fails and restarts independently.

No cascading failures.

HTTP publish → Kafka producer → Kafka topic → (parallel) → Kafka → DB → Kafka → Redis → Redis → server → server → WebSocket clients

Why it scales horizontally:

— Go servers are stateless (no sticky sessions)
— Real-time delivery is decoupled from durability
— Each component has one job
— Any component can rebuild from Kafka replay

Scaling becomes an ops concern. Not an app rewrite.


The frontend never sees any of this.

WebSockets → real-time updates via Redis
HTTP APIs → message history from Postgres
Kafka stays internal. Clean boundary.


Don't build a chat server. Build three systems that each do one thing well:

Kafka → correctness
Redis → latency
Postgres → usability

Everything else is trade-offs.

Top comments (0)