Adam - The Developer

Posted on Mar 18

No Redis. No Kafka. Just Postgres. I Built Chat.

#webdev #programming #database #architecture

Everyone reaches for Redis and Kafka the moment they need real-time. What if you didn't have to?

I'm not anti-Redis. I'm not anti-Kafka. I'm anti-"add another distributed system because we're nervous." I've watched teams ship a broker cluster before they even had a throughput problem, then spend the next year learning the operational and failure semantics of that cluster while their database sat there, underused, doing a fraction of what it was capable of.

So I built a chat application that does the hard parts: multi-node real-time delivery, per-room ordering, presence, typing indicators, using Postgres as both the source of truth and the real-time message bus.

No brokers. No Redis. No RabbitMQ. No separate pub/sub service. Just Postgres, NestJS, and raw WebSockets.

TL;DR - Repo url here: https://github.com/adamreaksmey/chat-application-with-postgres

The Bet: "Just Postgres" Is Enough
The Core Mechanism: Triggers + LISTEN/NOTIFY as the Message Bus
Schema Decision: Per-Room Sequences for Gapless Ordering
Schema Decision: NOTIFY on Insert, With the Full Payload
Node Mechanics: A Dedicated LISTEN Client + In-Process Event Routing
Reference-Counted Room Subscriptions
Exponential Backoff Reconnect for the LISTEN Connection
Horizontal Scaling: Stateless Nodes + Nginx least_conn + Postgres Fanout
The Unsexy Part: Postgres Settings That Mattered
The Detour I Didn't Keep: Sharding LISTEN/NOTIFY
The Performance Turning Point: Batched Fanout + One Batch Send per Room per Tick
Proof: Load Test Results
Tradeoffs (The Part You Should Actually Read)
The Takeaway

1. The Bet: "Just Postgres" Is Enough

The architecture decision was intentionally provocative:

Persist everything in Postgres: users, rooms, room_members, messages, presence, typing.
Emit real-time events from Postgres using triggers + pg_notify.
Consume events in Node using a dedicated LISTEN connection.
Fan out to clients over raw WebSockets (JSON-framed events).
Scale horizontally by running multiple identical, stateless app nodes behind Nginx.

The thesis: in most apps, the database is already your most reliable coordination point. If you're writing a row anyway, you can publish the corresponding event from the same transaction boundary. No second system. No "eventual consistency between your DB and your broker." One write, one source of truth.

Why not Redis pub/sub?

Redis pub/sub is fast, but it introduces questions you now have to answer:

What happens to messages when a subscriber is offline? Redis doesn't care.
How do you replay history for a reconnecting client? Redis doesn't store it.
Who owns ordering? You do, now, in application code.

Postgres already answers all of these. Transactional writes give you ordering. The messages table gives you history. LISTEN/NOTIFY gives you the real-time transport. The only thing you give up is the comfort of reaching for a familiar tool.

2. The Core Mechanism: Triggers + LISTEN/NOTIFY as the Message Bus

At the center is a single idea:

When a message row is inserted, Postgres fires a trigger. The trigger calls pg_notify. Every app node that is LISTENing on that channel receives the payload and pushes it to its connected WebSocket clients.

That sounds almost too simple. The complexity is in doing it correctly:

Per-room ordering that holds under concurrent inserts.
Efficient subscribe/unsubscribe as clients join and leave rooms.
Automatic reconnect when the LISTEN connection drops.
Backpressure so one slow client doesn't stall delivery to everyone else.

Let's walk through each piece.

3. Schema Decision: Per-Room Sequences for Gapless Ordering

Chat is not just "messages exist." Chat requires: clients can ask for history since the last message they saw and expect a cursor that is gapless, stable, and per-room.

BIGSERIAL primary keys don't give you that. Postgres sequences are non-transactional by design: rolled-back transactions burn sequence values, leaving gaps. You can't use id > last_seen as a reliable "catch me up" cursor.

So instead, I track a per-room monotonic counter in a room_sequences table:

CREATE TABLE IF NOT EXISTS room_sequences (
  room_id   UUID PRIMARY KEY REFERENCES rooms(id) ON DELETE CASCADE,
  next_seq  BIGINT NOT NULL DEFAULT 1
);

A BEFORE INSERT trigger atomically claims the next value and stamps it onto the new message row:

CREATE OR REPLACE FUNCTION assign_room_sequence()
RETURNS trigger AS $$
BEGIN
  UPDATE room_sequences
  SET next_seq = next_seq + 1
  WHERE room_id = NEW.room_id
  RETURNING next_seq - 1 INTO NEW.seq;

  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER set_message_seq
  BEFORE INSERT ON messages
  FOR EACH ROW EXECUTE FUNCTION assign_room_sequence();

The UPDATE ... RETURNING is the serialization point. Postgres takes a row lock on the room_sequences row for that room, increments it atomically, and returns the previous value. Concurrent inserts into the same room queue up on that row lock. No application-level locking needed, no gaps, no duplicates.

Clients store last_seen_seq. On reconnect they send it, and the server replays exactly what they missed:

SELECT * FROM messages
WHERE room_id = $1 AND seq > $2
ORDER BY seq ASC LIMIT 100;

4. Schema Decision: NOTIFY on Insert, With the Full Payload

Once seq is assigned and the row is durable, a second trigger fires the notification:

CREATE OR REPLACE FUNCTION notify_new_message()
RETURNS trigger AS $$
BEGIN
  PERFORM pg_notify(
    'room:' || NEW.room_id::text,
    json_build_object(
      'id',         NEW.id,
      'seq',        NEW.seq,
      'room_id',    NEW.room_id,
      'user_id',    NEW.user_id,
      'username',   NEW.username,
      'content',    NEW.content,
      'created_at', NEW.created_at
    )::text
  );
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

The payload is a complete message event, not just an ID. This is a deliberate choice: the fanout path never touches the database again. No "fetch the message after receiving the notification" round-trip. One insert, one notify, one broadcast. Read amplification stays flat regardless of how many nodes are running.

5. Node Mechanics: A Dedicated LISTEN Client + In-Process Event Routing

The Node app maintains two separate Postgres interfaces:

A connection pool for all queries and transactions.
A single dedicated pg.Client that never runs queries. It only LISTENs and receives notifications.

Keeping these separate matters. A pooled connection can be checked in and out by different requests. A LISTEN client needs to stay alive and stateful for the lifetime of the process. Mixing the two leads to subtle bugs where your LISTEN connection gets recycled by the pool at the worst possible moment.

When a notification arrives, it's decoded and re-emitted as an in-process EventEmitter event:

client.on('notification', (msg: { channel: string; payload?: string }) => {
  const { channel, payload } = msg;
  if (!payload) return;

  if (channel.startsWith('room:')) {
    const roomId = channel.slice('room:'.length);
    const data = JSON.parse(payload);
    emitter.emit('room_message', roomId, data);
  }
});

This keeps the service boundary clean: PostgresService owns the DB layer, ChatWsService owns the socket layer, and they communicate through typed in-process events rather than direct coupling.

6. Reference-Counted Room Subscriptions

You don't want to LISTEN to every room channel globally. A busy system might have thousands of active rooms, and you only care about the ones where this node has connected clients.

The solution is reference counting. Each node tracks how many local sockets are in each room. When the count goes from 0 to 1, issue LISTEN. When it drops back to 0, issue UNLISTEN.

async subscribeToRoomChannel(roomId: string): Promise<void> {
  const count = this.roomSubscriptionCounts.get(roomId) ?? 0;
  this.roomSubscriptionCounts.set(roomId, count + 1);

  if (count === 0) {
    await this.listenClient.query(`LISTEN "room:${roomId}"`);
  }
}

async unsubscribeFromRoomChannel(roomId: string): Promise<void> {
  const count = this.roomSubscriptionCounts.get(roomId) ?? 0;
  if (count === 0) return;

  const next = count - 1;
  if (next > 0) {
    this.roomSubscriptionCounts.set(roomId, next);
    return;
  }

  this.roomSubscriptionCounts.delete(roomId);
  await this.listenClient.query(`UNLISTEN "room:${roomId}"`);
}

One important detail: if LISTEN or UNLISTEN throws, roll back the ref count before rethrowing. Otherwise your in-memory count diverges from the actual Postgres subscription state, and the system silently delivers or drops messages incorrectly.

7. Exponential Backoff Reconnect for the LISTEN Connection

pg.Client does not auto-reconnect. If the dedicated LISTEN connection drops (network blip, Postgres restart, container reschedule), your entire real-time fanout silently dies. The app keeps running, inserts keep succeeding, but no notifications ever arrive. Nothing logs an error. Clients just stop receiving messages.

This is the kind of failure that looks like a different problem entirely until you know to look for it.

The fix is a reconnect loop with exponential backoff:

On error, schedule a reconnect attempt after a delay.
Double the delay on each failure, up to a cap (30 seconds works well).
On successful reconnect, re-issue LISTEN for every channel that was active before the drop.
Reset the backoff counter on success.

That last step, re-subscribing to all previously active channels, is why the roomSubscriptionCounts map exists beyond just reference counting. It's also the recovery manifest.

8. Horizontal Scaling: Stateless Nodes + Nginx least_conn + Postgres Fanout

Here is where the architecture earns its keep.

Each app node is completely stateless except for its in-memory socket registry and its LISTEN subscriptions. Add another node, and it connects to the same Postgres, starts LISTENing on channels as clients join rooms, and begins receiving the same NOTIFY events as every other node.

Cross-node fanout is free. You don't configure it. You don't pay for a broker to coordinate it. Postgres notifies everyone who is listening, and each node pushes to its own clients.

upstream app {
  least_conn;
  server app-1:3000;
  server app-2:3000;
  server app-3:3000;
  server app-4:3000;
  server app-5:3000;
  keepalive 1024;
}

least_conn routes new connections to whichever node has the fewest active ones, distributing WebSocket connections naturally as load grows.

With raw WebSockets and stateless nodes, you don't need sticky sessions at all. That's one less thing to configure.

9. The Unsexy Part: Postgres Settings That Mattered

Running Postgres with default container settings and then complaining it can't handle real-time load is like putting regular fuel in a race car and wondering why it's slow. The defaults are conservative. They're designed for safety, not throughput.

Under sustained insert load, Postgres periodically flushes dirty pages to disk in a checkpoint. With default settings, these checkpoints can cause multi-second write stalls. During a load test, that shows up as sudden latency spikes that look completely unrelated to your application code.

The fix:

checkpoint_completion_target = 0.9   # spread I/O across 90% of the checkpoint interval
checkpoint_timeout = 15min           # checkpoint less frequently
max_wal_size = 2GB                   # give WAL more room before forcing a checkpoint
wal_level = logical                  # needed for logical replication and event-oriented setups
max_wal_senders = 20                 # headroom for replication sender processes under load

Before applying these, p95 delivery latency occasionally spiked into the seconds mid-test. After: consistent sub-800ms p95 across consecutive runs. The application code didn't change. Only the database configuration did.

Treat Postgres like a primary system component, not a black box, and it behaves like one.

10. The Detour I Didn't Keep: Sharding LISTEN/NOTIFY

At one point I tried sharding room channels by hashing room IDs into a fixed set of channels and routing in the application layer. The idea was to reduce the number of active LISTEN channels per node.

It worked, in the sense that messages moved. But it immediately introduced everything I was trying to avoid: hot channels with uneven load distribution, routing logic in the app that Postgres used to handle transparently, and debugging sessions spent asking "why is this room not getting notifications?" because the hash put it on the wrong channel.

In short: I was reinventing a worse version of Kafka, inside my own application, on top of a system that already had a perfectly good primitive for exactly this problem.

The right model is simpler: one channel per room, LISTEN only when you have local clients, UNLISTEN when the last one leaves. Postgres handles the rest. The complexity budget is better spent elsewhere.

11. The Performance Turning Point: Batched Fanout + One Batch Send per Room per Tick

I expected the Postgres NOTIFY path to be the first bottleneck. It wasn't.

The first bottleneck was the fanout loop itself. Every notification triggered an immediate synchronous loop over all connected sockets in the room. With 300 receivers, that's 300 socket.send() calls in a single event loop tick. While that loop ran, everything else (incoming frames, heartbeats, new connections) waited.

Two changes fixed it.

First: batch messages before sending. Rather than flushing one WebSocket frame per NOTIFY, each notification appends its payload to a per-room buffer. A shared 30ms timer flushes all pending buffers, sending a single new_message_batch frame per room per tick. Under high insert rates this collapses many individual sends into one, dramatically reducing the number of socket.send() calls per second.

private enqueueRoomMessage(roomId: string, payload: unknown): void {
  const buf = this.roomMessageBatchBuffers.get(roomId);
  if (buf) buf.push(payload);
  else this.roomMessageBatchBuffers.set(roomId, [payload]);

  if (this.roomMessageBatchFlushTimer) return;
  this.roomMessageBatchFlushTimer = setTimeout(() => {
    this.roomMessageBatchFlushTimer = null;
    this.flushRoomMessageBatches();
  }, ChatWsService.ROOM_MESSAGE_BATCH_WINDOW_MS);
}

Second: yield once with setImmediate, then send to all eligible sockets. After the batch window closes, we JSON.stringify the batch once, filter to sockets that are open and not backpressured, then hand off to setImmediate. The actual send loop runs on the next event loop tick, meaning the current tick finishes cleanly before any socket I/O begins.

private broadcastToRoom(
  roomId: string,
  message: { event: string; data: unknown },
): void {
  const sockets = this.roomSockets.get(roomId);
  if (!sockets?.size) return;

  const payload = JSON.stringify(message);
  const openSockets = Array.from(sockets).filter(
    (socket) =>
      socket.readyState === WebSocket.OPEN &&
      socket.bufferedAmount <= WS_BACKPRESSURE_THRESHOLD_BYTES,
  );

  if (!openSockets.length) return;

  // setImmediate yields control back to the event loop before sending.
  // Without this, the send loop blocks incoming frames, heartbeats,
  // and new connections for the entire duration of the broadcast.
  setImmediate(() => {
    for (const socket of openSockets) {
      try {
        socket.send(payload);
      } catch {
        // ignore per-socket send errors
      }
    }
  });
}

Presence and typing notifications go out unbatched since they are low frequency and latency-sensitive. Only chat messages use the buffer and timer.

12. Proof: Load Test Results

The test script runs two concurrent scenarios: senders ramp from 0 to 120 VUs over 15 seconds, hold for 2 minutes 30 seconds, then ramp back down. Receivers hold at 300 constant VUs for the full 3 minutes. Peak concurrency is 420 VUs hitting the same single room, deliberately the worst-case fanout scenario. All tests ran locally on an Apple M4 Pro (10 cores), 5 app nodes behind Nginx. Results will vary on actual production environments.

Run	Delivery rate	Latency p95	Msgs received/s
1	99.76%	744ms	~25.6k
2	99.58%	167ms	~25.4k
3	98.93%	594ms	~25.0k
4	99.83%	310ms	~13.9k
5 (threshold tripped)	99.35%	1.11s	~12.6k

On the variance in p95 latency: the 30ms batch window is a tradeoff, not a constant. Shorter windows mean more flushes and more event loop pressure. Longer windows mean larger bursts when the timer fires. In a single-hot-room test like this, you can see the system “catch a bad rhythm” (GC, timer alignment, bursty flushes), and p95 will swing even when the underlying design is unchanged. The numbers that stayed consistent across every single run: zero duplicate sequences, zero out-of-order delivery. The database side of the story held perfectly.

On running locally vs cloud: the M4 Pro is not a server. There is no network hop between Docker containers, no cloud storage latency, no noisy neighbors. A properly provisioned cloud deployment with dedicated instances and low-latency networking should perform comparably on the fanout path. Postgres write throughput will depend on your storage class. These numbers are a floor, not a ceiling.

On why delivery isn't 100%: this is the most aggressive test case possible: one room, 300 receivers, 120 senders, all on the same fanout path. Real traffic spreads across rooms and varies over time. The few missed deliveries in each run are sender echo-back misses under event-loop/backpressure pressure, not “Postgres lost messages.”

About the “failed” run (threshold tripped): This isn’t the architecture failing—it’s the system exposing its bottleneck: WebSocket fanout on a single Node.js event loop.

When p95 latency crosses the threshold, it’s not because ordering breaks or Postgres falls behind. It happens when a hot-room burst aligns poorly with event loop scheduling, causing fanout delays and triggering backpressure.

That’s exactly what stress tests are supposed to reveal: not whether the system works, but where it stops being predictable.

At that point, the decision becomes explicit—either distribute load across rooms, relax latency expectations for worst-case hotspots, or move fanout beyond a single event loop (e.g. multiple processes or workers).

13. Tradeoffs (The Part You Should Actually Read)

This architecture is not universally correct. It's deliberately narrow, and it earns its simplicity by accepting real constraints.

When Postgres-as-bus makes sense

Your events come from rows you're already writing. Messages, presence changes, typing indicators: these all exist in the DB anyway. NOTIFY is free.
You want transactional coupling between write and publish. If the insert fails, the notification never fires. You can't accidentally notify clients about a write that rolled back.
You want a small operational footprint. One database, many stateless nodes. No broker cluster to provision, monitor, or debug.
Your consumers are currently connected clients. NOTIFY is not a durable queue. If no one is listening, the event is gone. That's fine for real-time fanout to live connections. It's not fine for offline consumers or audit pipelines.
Presence fits naturally. Online/offline state is a row in a presence table, swept by a periodic cleanup job. If a single room grows to 10k+ users, you would move presence aggregation to a materialized view or a dedicated summary table rather than broadcasting raw rows. But the underlying mechanism stays the same.

When you should reach for Kafka (or friends)

You need durable, replayable event streams that exist independently of your database tables.
Consumers need to catch up from arbitrary offsets without querying the DB directly.
You have multiple heterogeneous consumers (analytics pipelines, audit logs, downstream services) each with their own consumption rate and retry semantics.
You need retention and backpressure as first-class features, not something you're building yourself on top of a messages table.
You're at a scale where Postgres becoming the coordination point is genuinely risky, where a slow query or a checkpoint stall has downstream consequences across the entire system.

Kafka is not a real-time messaging tool. Kafka is a distributed, durable log with consumer groups. If that's what you need, reach for it. If you're reaching for it because "that's what everyone does for real-time," slow down and check whether Postgres is already doing 80% of the work.

14. The Takeaway

Most systems don't have a broker problem.

They have an "I don't know my database well enough" problem.

Postgres is not just a place to put rows. It's a concurrency control engine, a transactional boundary, a durable store, and if you let it be, a perfectly capable event source for a whole class of real-time products. Triggers fire in the same transaction as your writes. LISTEN/NOTIFY delivers to every connected node. Per-room sequences give clients a reliable cursor with no gaps. A connection pool handles your queries. One dedicated client handles your subscriptions. That's the entire infrastructure story.

The lesson from this project isn’t “never use Redis.” It’s to understand your tools before you add more of them.

In fast-moving teams with tight deadlines, adding Redis or a message broker early is often the right call because it reduces risk and gives predictable scaling. There’s nothing wrong with that.

But if you always reach for those tools by default, you never learn where your existing stack actually breaks. You end up solving problems you might not have yet and carrying the complexity anyway.

Build the boring version when you can. Push it until it shows real limits. Then add complexity deliberately, not reflexively.

Top comments (3)

rahul patwa • Mar 24

Your post was the direct spark for this. The per-room sequencing and the dedicated LISTEN client pattern especially I kept coming back to those two ideas while building.

I wanted to see how far the pattern could stretch, so I extended it into a full auction system: live bidding, countdown timers, item broadcasts, and manual discontinuation all still zero extra services. The atomicity point you made ("a message that fails validation never fires a notification") turned out to be even more valuable in an auction context where phantom bid updates would be a real problem.

Wrote it up here if you're curious where your architecture leads when money is on the line:

dev.to/rahul_patwa_f99f19cd1519b/n...

Adam - The Developer • Mar 24

wow, that's awesome, Rahul. Really cool to see someone take the Postgres-only idea and turn it into a full auction system! Love how you leveraged triggers + pg_notify for the real-time flow just exactly the direction I hoped others would experiment with.

Funny enough, I've also built an auction system but it wasn't from scratch haha, i inherited an unfinished one, written in Java with no race condition handling whatsoever and had to go through sequencing, room broadcasting and ordering hell just to get it production-ready.

Arty P • May 15

what if the msg arrive on the same room at the same time? so they cannot get same sequence number, one need to wait for the lock because the sequence lives in one row per room.

DEV Community