Kalio Princewill

Posted on Apr 17

PostgreSQL Read Replicas with Docker: A Step-by-Step Guide

#postgres #distributedsystems #database

As your application scales, your database becomes the first bottleneck. Every dashboard query, every report, every analytics request competes with the writes coming in from your application. The database starts spending its time arbitrating between readers and writers instead of doing either well.

One of the simplest ways to relieve that pressure is separating reads from writes using replication. This guide walks you through setting it up locally with Docker, a primary database, and live streaming replica, all running on your machine.

By the end, you will have a working primary and replica, understand what every configuration line does, and know how to verify that replication is happening in real time.

Prerequisites

Before you start, make sure you have the following installed:

Docker Desktop (or Docker Engine + Compose plugin on Linux)
A terminal
Basic familiarity with PostgreSQL, you do not need to be an expert, but knowing what a table and a connection string are will help

That is all. We are not installing PostgreSQL locally. Docker handles everything.

What we are building

Here is the architecture we are going to set up:

The primary accepts all writes. The replica receives a continuous stream of changes from the primary via PostgreSQL's Write-Ahead Log and applies them in real time. Your application sends reads to the replica and writes to the primary. The two workloads stop competing with each other.

How PostgreSQL streaming replication actually works

Before we write a single line of config, it is worth understanding the mechanism. PostgreSQL records every change (insert, update, delete) to the database in a file called the Write-Ahead Log before applying it. This is PostgreSQL's core durability guarantee. If the server crashes mid-transaction, the WAL is what allows it to recover to a consistent state on restart.

Streaming replication repurposes this same log as a live feed. The replica connects to the primary as a special replication client, requests WAL records starting from a given position, and applies them continuously as they arrive.

The replica runs in what PostgreSQL calls hot standby mode. This means it accepts read queries while simultaneously applying incoming WAL changes. It will reject any write attempt with a clear error, we will verify this ourselves later.

Synchronous vs asynchronous replication

This is the one thing most guides skip, and it is worth understanding before you configure anything. The distinction is about what happens between the moment your application says "commit this transaction" and the moment PostgreSQL says "done."

Asynchronous replication (the default)

With asynchronous replication, the primary database writes the transaction to its local WAL, acknowledges the commit to your application immediately, and then ships the WAL record to the replica in the background. Your application does not wait for the replica to confirm anything.

The advantage is speed. The primary never stalls waiting for a network round-trip to the replica. The trade-off is that if the primary crashes in the window between acknowledging the commit and shipping the WAL record, that transaction is gone. For most applications this is perfectly acceptable. The window is typically milliseconds, and the performance benefit is significant.

Synchronous replication (zero data loss)

With synchronous replication, the primary will not acknowledge a commit to your application until at least one replica confirms it has received and written the WAL record to disk. Your application waits for that confirmation before getting a response.

The advantage is durability. A transaction that receives a success response is guaranteed to exist on at least two machines. If the primary dies the moment after acknowledging, the replica has the data. The trade-off is write latency, every write now includes at least one network round-trip to the replica before completing.

There is also an availability risk worth knowing about. If the replica goes down and you are running synchronous replication, writes on the primary will stall and eventually block until the replica comes back. In production this is mitigated by having multiple synchronous replicas or falling back to asynchronous mode automatically, but it is a real operational concern.

The rule of thumb: use asynchronous replication unless you have a specific requirement for zero data loss like financial transactions, audit logs, anything where losing a single committed record is unacceptable. For this walkthrough we are running asynchronous replication.

Project structure

Create a folder called pg-replication on your machine. Inside it, create the following structure, we will fill each file in as we go:


pg-replication/  
├── docker-compose.yml  
├── init/  
│   ├── 01_init.sql  
│   └── 02_schema.sql  
├── primary/  
│   ├── postgresql.conf  
│   └── pg_hba.conf  
└── replica/  
    ├── postgresql.conf  
    └── replica-entrypoint.sh

Step 1: Configure the primary database

The primary needs two configuration files. The first controls PostgreSQL behaviour. The second controls who can connect and how they authenticate.

Create primary/postgresql.conf and paste in the following:


# Accept connections from all network interfaces  
# Required for Docker container networking without this PostgreSQL only  
# listens on loopback and the replica container cannot reach it  
listen_addresses = '*'

# Replication  
wal_level = replica          # Include full row changes in the WAL  
max_wal_senders = 3          # Allow up to 3 simultaneous replication connections  
max_replication_slots = 3    # One slot per replica, prevents premature WAL deletion  
wal_keep_size = 256          # Keep 256MB of WAL files for replicas that fall behind

# Commit behaviour  
synchronous_commit = on      # Flush to local disk before acknowledging, not the replica

# Performance  
shared_buffers = 256MB  
work_mem = 4MB  
checkpoint_completion_target = 0.9

listen_addresses = '*' By default PostgreSQL only listens on 127.0.0.1 and ::1. That makes it completely invisible to any other container on the Docker network. Setting this to * tells it to bind on all available interfaces.
wal_level = replica is what enables replication. The default level of minimal does not include enough information for a standby to reconstruct data. This setting adds the full row-level change data the replica needs.

Now create primary/pg_hba.conf:


# Local connections
local   all             all                                     trust
host    all             all             127.0.0.1/32            trust
host    all             all             ::1/128                 trust

# Allow the replica to connect for replication
# 0.0.0.0/0 covers the entire Docker network range
# Restrict this to specific IPs in production
host    replication     replicator      0.0.0.0/0               md5

# Allow app connections from the Docker network
host    all             all             0.0.0.0/0               md5

replication in the third column is not a database name , it is a special PostgreSQL keyword that grants replication connection privileges only. The replicator user we create shortly is the one named here.

Step 2: Configure the replica

The replica configuration is simpler. It only needs to know it is running as a hot standby.

Create replica/postgresql.conf with this content:


# Enable read queries while applying WAL changes
hot_standby = on

# Tell the primary which rows the replica still needs
# Prevents vacuuming rows that a long-running read query on the replica still references
hot_standby_feedback = on

# Performance
shared_buffers = 256MB
work_mem = 4MB

hot_standby = on With it enabled, read queries are accepted and served from the replica's continuously updating state.
hot_standby_feedback tells the primary which rows the replica still needs, preventing the primary from vacuuming rows a long-running read query still references.

Step 3: Write the replica startup script

The replica cannot start from an empty data directory, it needs a physical snapshot of the primary's data as its starting point. PostgreSQL has a tool called pg_basebackup for exactly this. The entrypoint script waits for the primary to be healthy, takes the snapshot, then starts PostgreSQL in standby mode.

Create replica/replica-entrypoint.sh and paste in the following:


#!/bin/bash
set -e

echo ">>> Waiting for primary to be ready..."
until pg_isready -h postgres-primary -p 5432 -U postgres; do
  echo "    Primary not ready — retrying in 2s..."
  sleep 2
done

echo ">>> Primary is ready. Clearing data directory for base backup..."
rm -rf /var/lib/postgresql/data/*

echo ">>> Taking base backup from primary..."
PGPASSWORD=replicator_password pg_basebackup \
  -h postgres-primary \
  -U replicator \
  -D /var/lib/postgresql/data \
  -P \
  -Xs \
  -R

echo ">>> Base backup complete. Overlaying replica config..."
cp /etc/replica/postgresql.conf /var/lib/postgresql/data/postgresql.conf

echo ">>> Starting replica in hot standby mode..."
exec docker-entrypoint.sh postgres

The flags on pg_basebackup are worth understanding:
*-P shows a progress bar during the backup
*-Xs streams WAL during the backup itself, so no WAL records are lost in the window between the snapshot starting and replication connecting
*-R writes a standby.signal file and appends the primary connection string to postgresql.auto.conf automatically, this is what tells PostgreSQL to start in standby mode without any further manual configuration

Step 4: Write the init SQL

These files run automatically on primary startup via Docker's docker-entrypoint-initdb.d mechanism.

Create init01_init.sql:


-- Dedicated replication user — replication privileges only
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'replicator_password';

-- Application user
CREATE USER app WITH ENCRYPTED PASSWORD 'app_password';

-- Application database
CREATE DATABASE runtime_sentinel OWNER app;

GRANT CONNECT ON DATABASE runtime_sentinel TO replicator;

Create init02_schema.sql:

\c runtime_sentinel

-- One row per monitored host
CREATE TABLE hosts (
    id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    hostname       TEXT NOT NULL,
    cloud_provider TEXT,
    region         TEXT,
    registered_at  TIMESTAMPTZ DEFAULT now()
);

-- Runtime events — partitioned by time for write performance
CREATE TABLE runtime_events (
    id            UUID DEFAULT gen_random_uuid(),
    event_id      TEXT NOT NULL,
    host_id       UUID REFERENCES hosts(id),
    source_type   TEXT NOT NULL,
    library_name  TEXT,
    severity      INT DEFAULT 0,
    raw_payload   JSONB,
    captured_at   TIMESTAMPTZ NOT NULL,
    received_at   TIMESTAMPTZ DEFAULT now()
) PARTITION BY RANGE (captured_at);

-- Current month partition — all new writes land here
CREATE TABLE runtime_events_current
    PARTITION OF runtime_events
    FOR VALUES FROM (DATE_TRUNC('month', NOW()))
              TO   (DATE_TRUNC('month', NOW()) + INTERVAL '1 month');

-- Idempotency table — tracks which events have already been processed
CREATE TABLE processed_events (
    event_id       TEXT PRIMARY KEY,
    processed_at   TIMESTAMPTZ DEFAULT now(),
    consumer_group TEXT NOT NULL
);

-- Partial index covering only critical unresolved events
-- Dashboard queries use this — intentionally small and fast
CREATE INDEX idx_events_critical
    ON runtime_events (captured_at DESC, severity)
    WHERE severity >= 4;

GRANT ALL ON ALL TABLES    IN SCHEMA public TO app;
GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO app;

-- Seed data
INSERT INTO hosts (hostname, cloud_provider, region) VALUES
    ('host-dev-001', 'aws',   'eu-west-1'),
    ('host-dev-002', 'gcp',   'us-central1'),
    ('host-dev-003', 'azure', 'westeurope');

Step 5: Write the Docker Compose file

Create docker-compose.yml at the root of the pg-replication folder:

services:

  postgres-primary:
    image: postgres:16
    container_name: postgres-primary
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres_password
      POSTGRES_DB: postgres
    volumes:
      - primary-data:/var/lib/postgresql/data
      - ./primary/postgresql.conf:/etc/primary/postgresql.conf
      - ./primary/pg_hba.conf:/etc/primary/pg_hba.conf
      - ./init:/docker-entrypoint-initdb.d
    command: >
      postgres
        -c config_file=/etc/primary/postgresql.conf
        -c hba_file=/etc/primary/pg_hba.conf
    ports:
      - "5432:5432"
    healthcheck:
      # Check over 0.0.0.0 — proves the primary is reachable from other
      # containers, not just alive on its own loopback interface
      test: ["CMD-SHELL", "pg_isready -h 0.0.0.0 -p 5432 -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 15
      start_period: 10s

  postgres-replica:
    image: postgres:16
    container_name: postgres-replica
    user: postgres
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres_password
    volumes:
      - replica-data:/var/lib/postgresql/data
      - ./replica/postgresql.conf:/etc/replica/postgresql.conf
      - ./replica/replica-entrypoint.sh:/replica-entrypoint.sh
    entrypoint: ["/bin/bash", "/replica-entrypoint.sh"]
    ports:
      - "5433:5432"
    depends_on:
      postgres-primary:
        condition: service_healthy

volumes:
  primary-data:
  replica-data:

A few things worth noting. The healthcheck uses -h 0.0.0.0 rather than the default loopback. Without this the healthcheck passes the moment PostgreSQL is alive inside its own container, but that does not mean other containers can reach it over the Docker network. By checking 0.0.0.0 we are verifying that the primary is listening on the right interface, which is the actual condition the replica needs.
The start_period: 10s gives the primary time to finish running the init SQL before the healthcheck starts counting retries. Without it the healthcheck can begin firing before the replicator user exists and mark the container unhealthy prematurely.
The replica uses user: postgres because pg_basebackup requires the postgres system user to write into the data directory with the correct file ownership.

Step 6: Start everything

From inside your pg-replication folder, run:

docker compose up

Watch the output. The primary starts first, runs both init SQL files, and becomes healthy. Once the healthcheck passes the replica starts. You will see these lines appear in the replica logs:

Both containers are now running. The replica is streaming WAL changes from the primary in real time.

Step 7: Verify replication is working

Check that the replica is connected and streaming run the command below:

docker exec -it postgres-primary psql -U postgres -c \
  "SELECT client_addr, state, sent_lsn, replay_lsn
   FROM pg_stat_replication;"

You should see one row. The state column should say streaming. The sent_lsn and replay_lsn values should be identical or within a few bytes, the gap between them is your replication lag expressed as a WAL byte position.

Write on the primary, read from the replica:

# Write to primary on port 5432
docker exec -it postgres-primary psql -U app -d runtime_sentinel -c \
  "INSERT INTO hosts (hostname, cloud_provider, region)
   VALUES ('replication-test', 'aws', 'us-east-1');"


# Read from replica on port 5433 — appears within milliseconds
docker exec -it postgres-replica psql -U app -d runtime_sentinel -c \
  "SELECT hostname, region, registered_at FROM hosts
   WHERE hostname = 'replication-test';"

The row you wrote on port 5432 will be immediately readable on port 5433.

Prove the replica rejects writes:

docker exec -it postgres-replica psql -U app -d runtime_sentinel -c \
  "INSERT INTO hosts (hostname, cloud_provider, region)
   VALUES ('should-fail', 'gcp', 'us-central1');"

Expected output:
ERROR: cannot execute INSERT in a read-only transaction

This is correct behaviour. The replica enforces read-only access at the PostgreSQL engine level, not at the application level. There is no way to accidentally write to it even if your application connects to the wrong port.

Connecting your application

With both containers running, use two separate connection strings, one for writes and one for reads:

# Writes — always go to primary
POSTGRES_PRIMARY_DSN=postgresql://app:app_password@localhost:5432/runtime_sentinel

# Reads — always go to replica
POSTGRES_REPLICA_DSN=postgresql://app:app_password@localhost:5433/runtime_sentinel

In Python with asyncpg:

import asyncpg
from src.config import settings

# Two pools — one per role, initialised on application startup
primary_pool = await asyncpg.create_pool(dsn=settings.POSTGRES_PRIMARY_DSN)
replica_pool  = await asyncpg.create_pool(dsn=settings.POSTGRES_REPLICA_DSN)

# Writes go through the primary pool
async with primary_pool.acquire() as conn:
    await conn.execute(
        "INSERT INTO runtime_events (event_id, source_type, severity, captured_at) "
        "VALUES ($1, $2, $3, now())",
        event_id, source_type, severity
    )

# Reads go through the replica pool
async with replica_pool.acquire() as conn:
    rows = await conn.fetch(
        "SELECT * FROM runtime_events "
        "WHERE severity >= 4 "
        "ORDER BY captured_at DESC LIMIT 100"
    )

The discipline to enforce here is that no read pool reference ever leaks into a write repository, and no write pool reference ever leaks into a read repository. Keep them separated at the dependency injection layer and you eliminate an entire class of accidental primary load.

What changes in production

The local setup maps directly to a production architecture. The only differences are the network between servers, the number of replicas, and the tooling around them.

In production you would add:

PgBouncer in front of both primary and replica for connection pooling beacuse raw connections at scale are expensive
Monitoring on pg_stat_replication.replay_lag, alert if the replica falls more than a configurable threshold behind the primary
Replication slots to guarantee the primary retains WAL until each replica has consumed it, useful when replicas have unreliable connectivity
A promotion runbook, a documented procedure for promoting the replica to primary if the primary fails, and for reconfiguring the old primary as a new replica once it recovers

Cleaning up

When you are done, tear everything down:

docker compose down -v

What we covered

PostgreSQL streaming replication requires no extensions, no third-party tools, no licence. The whole mechanism is built into the database.

listen_addresses = '*' makes the primary reachable over the Docker network. wal_level = replica includes the change data the replica needs. pg_basebackup -R takes the snapshot and writes the standby config automatically. hot_standby = on lets the replica serve reads while applying changes. And the healthcheck over 0.0.0.0 rather than loopback is the small detail that makes the difference between a replica that starts cleanly and one that keeps retrying. Once you understand what each line does, you stop copying configs and start designing systems.

The full project, Docker Compose, configuration files, schema, and entrypoint script is available in the postgres-streaming-replication repository.

DEV Community