Replacing Your Message Queue with PostgreSQL

#programming #webdev

---
title: "Replace Your Message Queue with PostgreSQL: SKIP LOCKED, LISTEN/NOTIFY, and the Transactional Outbox"
published: true
description: "A hands-on guide to building job queues, real-time pub/sub, and exactly-once event publishing using only PostgreSQL — no Redis or RabbitMQ required."
tags: postgresql, architecture, api, performance
canonical_url: https://blog.mvpfactory.co/replace-your-message-queue-with-postgresql
---

## What We're Building

By the end of this tutorial, you'll have three production-ready patterns running entirely inside PostgreSQL:

1. A **concurrent job queue** using `FOR UPDATE SKIP LOCKED` — multiple workers, zero contention.
2. **Real-time fan-out** with `LISTEN/NOTIFY` — lightweight pub/sub without polling.
3. A **transactional outbox** that eliminates dual-write bugs — the silent data loss hiding in most startup codebases.

No Redis. No RabbitMQ. Just the database you're already running. This holds comfortably to ~10,000 jobs/minute before you need something bigger. Every service you remove is a service you don't debug at 2 AM.

## Prerequisites

- PostgreSQL 9.5+ (for `SKIP LOCKED` support)
- A running application that already uses PostgreSQL for state
- Basic SQL knowledge (transactions, `UPDATE`, `INSERT`)

## Step 1: SKIP LOCKED as a Job Queue

Here is the minimal setup to get this working. The `FOR UPDATE SKIP LOCKED` clause turns any table into a concurrent-safe work queue.

Create your queue table, then use this single query to dequeue:

sql
-- Dequeue the next available job (multiple workers, zero contention)
WITH next_job AS (
SELECT id, payload
FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED
)
UPDATE job_queue SET status = 'processing'
FROM next_job
WHERE job_queue.id = next_job.id
RETURNING job_queue.*;


Workers grab the next unlocked row atomically. No polling races, no double-processing. Let me show you how this stacks up against dedicated infrastructure:

| Metric | PG SKIP LOCKED | Redis (rpoplpush) | RabbitMQ |
|---|---|---|---|
| Throughput (jobs/min) | ~10,000-12,000 | ~80,000+ | ~40,000+ |
| Latency (p99) | 5-15 ms | <1 ms | 1-3 ms |
| Exactly-once delivery | Native (transactions) | Requires Lua scripts | Requires publisher confirms + dedup |
| Additional infra | None | Redis instance + monitoring | Broker cluster + monitoring |
| Failure mode complexity | One system | Two systems | Two systems |

PostgreSQL won't win a throughput race. It doesn't need to. 10K jobs/minute covers the vast majority of startups. You hit that ceiling only when you're processing ~150 jobs/second sustained, and at that point you have the revenue to justify dedicated infrastructure.

## Step 2: LISTEN/NOTIFY for Real-Time Fan-Out

PostgreSQL's `LISTEN/NOTIFY` gives you lightweight pub/sub without polling. Works well for cache invalidation, WebSocket push, and internal microservice signaling.

sql
-- Publisher (inside your existing transaction)
NOTIFY order_events, '{"order_id": 42, "status": "paid"}';

-- Subscriber (any connected client)
LISTEN order_events;


That's it. No broker, no topic configuration, no consumer groups to manage.

## Step 3: The Transactional Outbox

Here is the gotcha that will save you hours — possibly weeks. The dual-write problem is the silent data loss bug hiding in most startup codebases. You save an order to your database, then publish an event to your queue. If the publish fails after the commit, your event is lost. If the publish succeeds but the transaction rolls back, you have a phantom event. Both happen more often than people think.

The transactional outbox kills this:

sql
BEGIN;
INSERT INTO orders (id, total) VALUES (42, 99.00);
INSERT INTO outbox (aggregate_id, event_type, payload)
VALUES (42, 'order.created', '{"id":42,"total":99.00}');
COMMIT;


A separate poller (using `SKIP LOCKED`) reads the outbox and forwards events to downstream consumers. The event write and the business write live in the same transaction. They either both happen or neither does. No distributed transactions, no Saga compensations, no eventual-inconsistency surprises.

This is the same foundation behind Debezium's CDC approach, and Microsoft recommends it in their .NET microservices architecture guide.

## Gotchas

**PgBouncer breaks LISTEN/NOTIFY.** The docs don't mention this prominently, but `LISTEN/NOTIFY` does not work through PgBouncer in transaction pooling mode. PgBouncer reassigns connections between transactions, so your `LISTEN` subscription gets silently dropped. You have three options:

1. Run a dedicated direct connection for NOTIFY listeners, bypassing PgBouncer entirely.
2. Set up session pooling mode on a separate PgBouncer instance for subscriber connections.
3. Fall back to polling a `notifications` table with `SKIP LOCKED` (which kind of defeats the purpose).

Go with option 1. One dedicated connection per subscriber service costs almost nothing compared to adding an entire Redis instance.

**Know your ceiling.** Reach for RabbitMQ or Kafka when:

- Throughput exceeds ~10K jobs/min sustained and vertical scaling is maxed
- You need multi-datacenter replication of your event stream
- Consumer fan-out exceeds 10+ independent subscribers on a single topic
- Message retention and replay is a core product requirement (event sourcing at scale)

**Draw your migration trigger line now.** Pick a number: "When we sustain X jobs/second for Y hours, we move to dedicated infrastructure." Without that threshold written down somewhere, teams either migrate too early out of anxiety or too late after a production fire. 150 jobs/second sustained is a reasonable starting line for most startups.

**Use the outbox from day one.** Dual-write bugs are silent and cumulative. By the time you notice lost events, you've already shipped inconsistent data to customers. The outbox costs one extra INSERT per transaction — nothing compared to debugging ghost events at midnight.

## Conclusion

Let me show you a pattern I use in every project: start with `SKIP LOCKED` for all background job processing. Add a `job_queue` table today and drop your Redis dependency. Migration takes an afternoon; the operational simplification lasts forever.

The infrastructure creep problem is real. Every box on your architecture diagram is a thing that fails, needs monitoring, and requires someone who understands its failure modes. If PostgreSQL is already running your application state, making it also run your job queue and event bus is just good engineering.

Until you're sustaining 150 jobs/second, you're adding operational complexity for theoretical scale. Build with what you have. Graduate when the numbers force you to.

DEV Community

Replacing Your Message Queue with PostgreSQL

Top comments (0)