Alik

Posted on Jun 23

Stream Postgres WAL to Redis: Real-Time Read Models with pg2redis

#postgres #redis #cdc #cache

Real-time data pipelines are becoming a normal part of application architecture. Product pages need fresh stock counts, order status screens need fast reads, dashboards need recent transactional state, and background services often need to react as soon as data changes.

Postgres is usually the source of truth, but many applications still keep a second read model in Redis for low-latency access. The hard part is keeping Redis in sync without pushing cache update logic into every write path.

The tempting approach is to write to both systems in the same request handler:

type User struct {
    ID        int64     `json:"id"`
    Email     string    `json:"email"`
    FullName  string    `json:"full_name"`
    UpdatedAt time.Time `json:"updated_at"`
}

func SaveUser(ctx context.Context, db *pgxpool.Pool, redisDb *redis.Client, user User) error {
    const upsertUser = `
INSERT INTO users (id, email, full_name)
VALUES ($1, $2, $3)
ON CONFLICT (id) DO UPDATE
SET email = EXCLUDED.email,
    full_name = EXCLUDED.full_name,
    updated_at = now()
RETURNING updated_at`

    if err := db.QueryRow(
        ctx,
        upsertUser,
        user.ID,
        user.Email,
        user.FullName,
    ).Scan(&user.UpdatedAt); err != nil {
        return fmt.Errorf("save user in Postgres: %w", err)
    }

    payload, err := json.Marshal(user)
    if err != nil {
        return fmt.Errorf("marshal user: %w", err)
    }

    key := fmt.Sprintf("user:%d", user.ID)
    if err := redisDb.Set(ctx, key, payload, 0).Err(); err != nil {
        return fmt.Errorf("save user in Redis: %w", err)
    }

    return nil
}

This looks reasonable until the second write fails. If the Postgres upsert succeeds and the Redis SET times out because of a network issue, the source of truth now has the new user and Redis still has the old value or no value at all. The handler returns an error, but the database change already happened.

Retried requests can make this even harder to reason about. A retry may repair Redis, or it may race with another update. Moving the Redis write before the database write only flips the failure mode: Redis can be updated and then the database transaction can fail. A regular Postgres transaction cannot include Redis, so this is not an atomic operation.

This is why cache updates and read-model updates often belong on the committed change stream, not directly in the request path.

That is the problem pg2redis is designed to solve.

pg2redis reads Postgres logical replication changes from the Write-Ahead Log, or WAL, and turns row-level inserts, updates, deletes, and snapshots into Redis commands. You configure what should happen for each table, and pg2redis keeps Redis updated from the database change stream.

The documentation is available at pgwalk/pg2redis, and the Docker image is published at alikpgwalk/pg2redis.

Why You Might Need This

Consider a few common cases:

You keep product, customer, or order read models in Redis.
You want Redis hashes, sets, sorted sets, streams, or pub/sub notifications to follow Postgres changes.
You want to initialize Redis from existing tables and then continue streaming new changes.
You want a focused Postgres-to-Redis pipeline without Kafka, Debezium, or custom cache invalidation code.

There are great tools for large CDC platforms and broad event streaming. If you need schema registries, replayable topic history, and many independent downstream consumers, those tools may be the right fit.

But if the job is "Postgres changes should update Redis", a smaller tool can be easier to deploy, reason about, and operate.

How It Works

At a high level, pg2redis does four things:

Connects to Postgres using logical replication.
Reads WAL changes through the built-in pgoutput plugin.
Expands configured Redis command templates for inserts, updates, deletes, and snapshots.
Writes those commands to Redis using batched pipelines.

The mapping is explicit. For example, a row in public.products can become:

tables:
  - public.products:
      insert:
        commands:
          - ["HSET", "product:{id}", "{pairs:*}"]
          - ["SADD", "products:all", "{id}"]
          - ["ZADD", "products:stock", "{stock_quantity}", "{id}"]
      update:
        commands:
          - ["HSET", "product:{id}", "{pairs:*}"]
          - ["ZADD", "products:stock", "{stock_quantity}", "{id}"]
      delete:
        commands:
          - ["DEL", "product:{id}"]
          - ["SREM", "products:all", "{id}"]
          - ["ZREM", "products:stock", "{id}"]

With this configuration, Postgres remains the source of truth. Redis becomes a maintained projection.

Advanced Features Under The Hood

1. Flexible Redis command mapping

pg2redis supports common Redis command families, including:

SET, SETEX, MSET, INCR, DECR, INCRBY, DECRBY, APPEND, DEL, EXPIRE,
HSET, HINCRBY, HDEL,
SADD, SREM,
ZADD, ZINCRBY, ZREM,
XADD, XDEL,
PUBLISH

That means you can model different Redis access patterns from the same database change stream:

Hash per row with HSET
JSON value per row with SET
Membership indexes with SADD and SREM
Ranked or scored indexes with ZADD and ZREM
Event feeds with XADD
Realtime notifications with PUBLISH

Templates can use column values:

["HSET", "order:{id}", "{pairs:*}"]
["SET", "order_json:{id}", "{json:*}"]
["PUBLISH", "order_updates", "{json:id,customer_id,status,total_cents}"]

2. Operation-specific behavior

Inserts, updates, and deletes usually need different Redis behavior. An insert may add a key and index membership. An update may refresh a hash and sorted set score. A delete may remove the key and clean up indexes.

pg2redis lets each operation have its own command list:

insert:
  commands:
    - ["HSET", "customer:{id}", "{pairs:*}"]
    - ["SADD", "customers:all", "{id}"]
update:
  commands:
    - ["HSET", "customer:{id}", "{pairs:*}"]
delete:
  commands:
    - ["DEL", "customer:{id}"]
    - ["SREM", "customers:all", "{id}"]

3. Conditional commands

Sometimes a Redis index should only include rows that match a condition. For example, active products can be maintained as a set:

- command: ["SADD", "products:active", "{id}"]
  condition:
    column: active
    op: "="
    value: "true"
- command: ["SREM", "products:active", "{id}"]
  condition:
    column: active
    op: "="
    value: "false"

Conditions can also compare current and previous values when Postgres provides the old row data. That is useful for status transitions, counters, and selective notifications.

4. Initial snapshots

Redis often starts empty. A streaming-only process can handle new changes, but it will not automatically load data that already exists.

pg2redis can run an initial snapshot first:

snapshot:
  mode: onetime
  batchSize: 100
  parallelWorkers: 2
  abortOnError: true
  config:
    - name: public.customers
      type: full
    - name: public.products
      type: full
    - name: public.orders
      type: full
    - name: public.order_items
      type: full

Snapshot rows use the configured insert commands. After the snapshot finishes, pg2redis continues from the snapshot LSN and streams new WAL changes.

5. Batched Redis writes and retries

pg2redis buffers row changes and flushes them to Redis in pipelines. You can tune the batch size, flush interval, queue depth, and number of flush workers:

flushInterval: 100ms
flushBufferSize: 10
flushQueueDepth: 8
flushWorkers: 2
writeTimeout: 5s
maxWriteQueueSize: 10000

Larger batches reduce round trips. Smaller batches and shorter flush intervals reduce latency.

A practical ordering note: flushWorkers controls how many Redis pipeline batches can be executed at the same time. With flushWorkers > 1, more than one batch can be in flight, so a later batch can finish before an earlier batch. If you need best-effort batch ordering for keys touched by consecutive changes, use flushWorkers: 1. That makes normal batch execution single-worker and FIFO, but it is not a total ordering guarantee.

Note: Right now, ordering is not guaranteed when an entry is retried. The retry tracker schedules failed entries for later and re-enqueues them as smaller write requests, usually one failed row entry at a time. Later WAL entries may already have been flushed to Redis before the retry is applied. For that reason, mappings that depend on exactly once, exactly ordered side effects need extra care. Prefer idempotent writes such as HSET, SADD, ZADD, and DEL for derived state. Be careful with INCRBY, ZINCRBY, XADD, and PUBLISH if delayed or duplicate application would be a problem.

Postgres transaction boundaries are preserved for tracking, not for Redis atomicity. The logical replication listener collects row changes by XID and sends one write request after the Postgres commit. That request carries one LSN, XID, and commit time, and the application does not release that commit point until all row entries are acknowledged or skipped. The Redis writer, however, enqueues those row entries individually and batches them by flushBufferSize or flushInterval. A single Postgres transaction can share a Redis batch with other transactions, or if it is large enough it can be split across more than one Redis batch. For one row change, multiple configured Redis commands are wrapped in MULTI/EXEC; the entire Postgres transaction is not wrapped in one Redis MULTI/EXEC.

Stronger ordering guarantees and better preservation of Postgres transaction boundaries are work in progress and will be addressed in future versions.

Retry behavior is configurable as well:

retryPolicy:
  maxRetries: 10
  maxConnectionRetries: 0
  initialBackoff: 1500ms
  multiplier: 2
  jitter: 0.1
  maxBackoff: 15s

The tool stores processing state so it can continue from a known WAL position. The delivery model is at-least-once, so Redis mappings should be designed to be idempotent where possible.

Getting Started With The Example App

The repository includes a complete e-commerce demo at pg2redis-example.

It contains:

Postgres tables for customers, products, orders, and order_items
Seed data for a small product catalog
A Go HTTP API that writes to Postgres
A pg2redis configuration file that maps database changes into Redis
A Docker Compose stack with Postgres, Redis, pg2redis, and the API

Start the demo:

cd pg2redis-example
docker compose up --build

The API is available at:

http://localhost:8080

The Compose file runs Postgres with logical replication enabled:

postgres:
  image: postgres:17
  command:
    - postgres
    - -c
    - wal_level=logical
    - -c
    - max_replication_slots=10
    - -c
    - max_wal_senders=10

The pg2redis service mounts the YAML config and points the app at it:

pg2redis:
  image: alikpgwalk/pg2redis:latest
  environment:
    PG2REDIS_CONFIG_PATH: /etc/pg2redis/config.yaml
    PGWALK_LIC_PG2REDIS: <license-key>
  volumes:
    - ./pg2redis/config.yaml:/etc/pg2redis/config.yaml:ro

For local testing, the example uses the postgres user and a simple Docker network. For production, create a dedicated Postgres user with replication permissions and only the table access it needs.

What The Example Projects Into Redis

The demo maps product rows into a hash and a few indexes:

- public.products:
    insert:
      commands:
        - ["HSET", "product:{id}", "{pairs:*}"]
        - ["SADD", "products:all", "{id}"]
        - ["ZADD", "products:stock", "{stock_quantity}", "{id}"]
        - command: ["SADD", "products:active", "{id}"]
          condition:
            column: active
            op: "="
            value: "true"

Orders are mapped to hashes, customer order sets, a Redis stream, and a pub/sub notification:

- public.orders:
    insert:
      commands:
        - ["HSET", "order:{id}", "{pairs:*}"]
        - ["SADD", "orders:all", "{id}"]
        - ["SADD", "customer:{customer_id}:orders", "{id}"]
        - ["XADD", "order_events", "*", "{pairs:id,customer_id,status,total_cents,currency,created_at,updated_at}"]
        - ["PUBLISH", "order_updates", "{json:id,customer_id,status,total_cents,currency,created_at,updated_at}"]

Order items are stored in a hash keyed by item ID:

- public.order_items:
    insert:
      commands:
        - ["HSET", "order:{order_id}:items", "{id}", "{json:*}"]

The resulting Redis keys include:

product:{id}
products:all
products:active
products:stock
customer:{id}
customer:{id}:orders
order:{id}
order:{id}:items
order_events
order_updates

pg2redis In Action

List products from Postgres through the API:

curl -s http://localhost:8080/products

Read the Redis projection for product 1:

curl -s http://localhost:8080/cache/products/1

Now change the product stock in Postgres through the API:

curl -s -X PATCH \
  -H 'Content-Type: application/json' \
  -d '{"stock_quantity":37}' \
  http://localhost:8080/products/1

Read Redis again:

curl -s http://localhost:8080/cache/products/1

You should see the product hash updated and the sorted set score changed.

Create an order:

curl -s -X POST \
  -H 'Content-Type: application/json' \
  -d '{"customer_id":1,"items":[{"product_id":1,"quantity":2},{"product_id":2,"quantity":1}]}' \
  http://localhost:8080/orders

This API call writes to orders, order_items, and products in one Postgres transaction. pg2redis observes the committed changes and updates Redis.

Inspect the cached order:

curl -s http://localhost:8080/cache/orders/1

List cached orders for customer 1:

curl -s http://localhost:8080/cache/customers/1/orders

Or look directly at Redis from redis-cli:

HGETALL product:1
ZRANGE products:stock 0 -1 WITHSCORES
XRANGE order_events - +

Postgres Setup Notes

Postgres needs logical replication enabled:

wal_level = logical
max_replication_slots = 10
max_wal_senders = 10

pg2redis uses publications and a logical replication slot. You can let the app manage those objects:

postgres:
  repl:
    pub: pg2redis_shop_pub
    slot: pg2redis_shop_slot
    owner: app

Or you can manage them yourself with owner: user.

For deletes and previous-value comparisons, Postgres must send the data that your Redis templates need. In the example, the tables use:

ALTER TABLE products REPLICA IDENTITY FULL;
ALTER TABLE orders REPLICA IDENTITY FULL;
ALTER TABLE order_items REPLICA IDENTITY FULL;
ALTER TABLE customers REPLICA IDENTITY FULL;

Use REPLICA IDENTITY FULL carefully on high-write tables because it increases WAL volume.

Operational Notes

There are a few things to keep in mind before using this pattern in production:

Logical replication slots retain WAL until changes are consumed, so monitor slot lag and disk usage.
Redis writes are at-least-once, so prefer idempotent commands such as HSET, SADD, ZADD, and DEL where possible.
If you use counters such as INCRBY or ZINCRBY, think through retry behavior and duplicate processing.
Use snapshots to initialize empty Redis instances from existing Postgres data.
Use explicit delete commands so Redis keys and indexes are cleaned up predictably.
Keep your publication columns aligned with the placeholders used in Redis templates.
If you need exact decimal preservation in JSON payloads, consider postgres.numericMode: string.

Where To Go Next

You can find the project documentation at pgwalk/pg2redis.

The Docker image is available at alikpgwalk/pg2redis.

Try the pg2redis-example demo if you want to see the full loop: write to Postgres, stream WAL through pg2redis, and read the Redis projection from the API.

If your application already treats Postgres as the source of truth and Redis as a fast read model, this approach keeps that boundary clean. Your write path stays focused on the database, and Redis follows along from the committed WAL stream.

DEV Community