Ricardo Ferreira for Redis

Posted on Dec 17, 2025 • Edited on Dec 18, 2025

From PostgreSQL to Redis: Accelerating Your Applications with Redis Data Integration

#redis #rdi #database #etl

Here's a statistic that might surprise you: 90% of all relational OLTP workloads are pure reads. Let that sink in. Nine out of ten database operations in your transactional system are simply fetching data, not modifying it. Yet these reads are competing for the same resources as your critical write operations. Resources like CPU, disk I/O, and network bandwidth.

Let me illustrate the impact of this with a practical example. Say you are responsible for an e-commerce platform. Orders are flowing in, customers are browsing products, and your PostgreSQL database is handling transactions as expected. However, a problem lies beneath the surface, one that becomes apparent during peak shopping hours. Page load times creep up. Product searches feel sluggish. Cart updates lag just enough to frustrate users. Yes, this is a bad user experience for sure. In the world of e-commerce, where Amazon has accustomed customers to expect nothing less than sub-second responses, every millisecond of delay translates to lost revenue.

The root cause? Disk and network I/O are hindering your transactions. Your perfectly normalized PostgreSQL database, while excellent at maintaining data consistency and handling complex transactions, wasn't designed for the read-heavy, millisecond-response-time demands of modern applications. Every product view, every category browse, and every user profile fetch requires a round trip to disk-based storage, and this is expensive as it competes for resources with write operations.

Cache-Aside Pattern: A Band-Aid, Not a Cure

For years, developers have turned to the cache-aside pattern as the go-to solution for addressing the load challenges with read-intensive applications. The logic of this pattern seems sound: apps handle reads primarily with Redis, and only hit the source database on cache misses, and update the cache with fresh data. It's the "happy path" developers all dream about.

Everything is great until reality sets in. The cache-aside pattern quickly reveals three critical flaws:

1. Repetitive Update Logic: Every application must implement the same caching logic. Each microservice, each new feature, each development team reinvents the wheel. It's challenging to maintain best practices across projects, and database schema changes often break with every new release.

2. The Thundering Herd Problem: When cache keys expire simultaneously, imagine a flash sale starting at midnight with thousands of requests hammering your database at once. Your database must be sized not for average load, but for these sporadic read spikes. Query times slow to a crawl, eventually causing cascading failures.

3. Data Invalidation Nightmares: What happens when records are deleted from the database? How do you handle updates that affect multiple cached entries? There's no atomic way to write to both Redis and your database, leading to inconsistency windows that corrupt user experiences.

After years of experiencing problems like this, developers came up with another pattern that aims to extend the cache aside with a more proactive approach. This pattern is known as refresh ahead.

Refresh-Ahead Pattern: You Don't Call Me; I Call You!

Right, so you know reads must be served by Redis as it is faster than disk-based databases. But with cache aside, you must wait until a read request comes in to effectively read from the source database. Why can't we change this paradigm and let the cache be populated proactively using a dedicated update engine?

This is what the refresh-ahead pattern is all about. You leverage an engine that will be responsible for pulling records from your source database and moving the data to Redis for eventual reads. The same engine must also be responsible for periodically monitoring the source database to identify changes and updating Redis accordingly. This includes monitoring deleted records to trigger key invalidation at Redis.

This is a great pattern to implement in conjunction with Redis for read-intensive use cases. Some teams turn to Change Data Capture (CDC) using tools like Apache Kafka and Debezium to achieve this. Others decide to implement complex ETL pipelines. Regardless of the implementation stack, the idea is the same: capture database changes as events and stream them to Redis. However, this approach introduces what we call the "distributed systems hole".

A complexity trap that consumes entire development teams whose main job is not always to maintain data pipelines like that. Implementing the refresh ahead pattern manually often creates the following problems:

Developer Overutilization: Your best engineers will spend months building and maintaining data pipelines instead of working on the systems that are actually tied to the company's revenue, creating a perception that they are not working toward the organization's goals.

The Expertise Tax: Apache Kafka, Debezium, and ETL experts command premium salaries; it's hard to keep them, and more importantly, replace them. If the team is not carefully planned from day one, it will be hard to justify to the business why they need to delay that important launch because someone from the team has left.

Operational Complexity: Every schema change necessitates pipeline updates, and every deployment carries the risk of data inconsistency. This requires teams to be on-call every time the domain model changes, as this will break the integration built to sustain the data pipeline that has been developed.

Let's go back to the original problem. All you wanted was to speed up your application because reads are more frequent than writes. Instead, you've created a distributed systems monster that requires constant feeding and care.

Implementing the Refresh-Ahead Pattern with RDI

This is where Redis Data Integration (RDI) changes the game entirely. RDI implements the refresh-ahead pattern with a future-proof solution that moves data proactively from your source database to Redis, keeping both in perfect sync without the complexity overhead. Unlike traditional CDC solutions, RDI requires no expertise in distributed systems. Its configuration, not code. It's operational simplicity, not complexity. It's a solution that doesn't hold you back.

Major enterprises, such as Axis Bank, are already utilizing RDI to accelerate their applications, and guess what: you can use it too. RDI is available for on-premise deployments via Redis Enterprise, and for cloud users via Redis Cloud.

Let's see how this works with a real e-commerce dataset stored at PostgreSQL to showcase RDI's capabilities. For this example, you can use the Docker Compose file below that creates a pre-configured PostgreSQL database with CDC enabled.

services:
  postgres:
    image: debezium/postgres:15-alpine
    hostname: postgres
    container_name: postgres
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=postgres
    ports:
      - 5432:5432
    healthcheck:
      test: ["CMD-SHELL", "pg_isready"]
      interval: 10s
      timeout: 5s
      retries: 5
    volumes:
      - ./pgdata:/var/lib/postgresql/data
      - ./scripts/initial-load.sql:/docker-entrypoint-initdb.d/initial-load.sql
  pgadmin:
    image: dpage/pgadmin4
    container_name: pgadmin4
    restart: always
    ports:
      - "8888:80"
    environment:
      PGADMIN_DEFAULT_EMAIL: admin@postgres.com
      PGADMIN_DEFAULT_PASSWORD: pgadmin4pwd
    healthcheck:
      test: ["CMD", "wget", "-O", "-", "http://localhost:80/misc/ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    volumes:
      - ./pgadmin:/var/lib/pgadmin

As the container with the PostgreSQL databases starts up, a script is executed to create the necessary tables. You can find this script here.

The database contains normalized tables, some of which contain foreign key relationships. Some of these tables are exactly what you'd expect in a typical transactional system:

Categories and Products with a one-to-many relationship
Customers placing Orders containing multiple OrderItems
Suppliers connected to Products through a many-to-many relationship

These tables represent your source of truth—but optimized for consistency, not speed.

How RDI Works?

Once you have installed RDI and deployed your data pipeline, here is what happens behind the scenes. First, RDI performs an initial cache loading, populating Redis with all your existing data. In the demo below, you can see 78 records being synchronized initially, creating data streams for each table. The RDI dashboard displays real-time metrics, including records inserted, updated, and deleted, with timestamps accurate to the millisecond.

After this, every time you insert a new user through pgAdmin, it appears in Redis within milliseconds. When you update an order status, Redis reflects the change instantly. When you delete a product, it is automatically removed from Redis. This isn't eventual consistency with fingers crossed. It's guaranteed synchronization through CDC.

By default, records will be written into Redis using Hashes data type. For simple entities, such as categories, that have flat, predictable fields, you can use Redis hashes. With Hashes, the primary key forms part of the Redis key (e.g., category:1). This provides O(1) access to any field, without the need for table scans, and no more index lookups. This is an example of how the data will look in Redis using Hashes.

For more complicated data entities that require the usage of nested data, arrays, or flexible schemas, you can use the JSON data type in Redis to store data with more flexibility. This is an example of how the data will look in Redis using JSON:

But RDI goes beyond simple replication. The stream processor layer continuously translates data to your preferred data model, as well as the layout you implement via transformations. For example, consider the users table with the following record:

id: 52
username: riferrei  
first_name: Ricardo
last_name: Ferreira
email: ricardo.ferreira@example.com

You want to create two additional fields in the final record that will be written to Redis. You can create a transformation using YAML like this:

name: custom-job
source:
  schema: public
  table: user
transform:
  - uses: add_field
    with:
      expression: first_name || ' ' || last_name
      field: display_name
      language: sql
  - uses: add_field
    with:
      expression:
        CASE
          WHEN email LIKE '%@example.com' THEN 'internal'
          ELSE 'external'
        END
      field: user_type
      language: sql
output:
  - uses: redis.write
    with:
      connection: target
      data_type: json

After the transformation that implements data enrichment, the final record becomes a Redis JSON document with computed fields:

{
  "id": 52,
  "username": "riferrei",
  "first_name": "Ricardo", 
  "last_name": "Ferreira",
  "email": "ricardo.ferreira@example.com",
  "display_name": "Ricardo Ferreira",
  "user_type": "internal"
}

Notice the two new fields that don't exist in PostgreSQL:

display_name: Concatenated from first and last names
user_type: Computed based on email domain logic

This transformation happens in the RDI stream processor, which parses your YAML configuration with the transformations. No code is required in your application. No cache invalidation needed as well. Just pure, configuration-driven transformation.

If you want to try this demo yourself, you can do so by following the instructions in the following GitHub repository:

https://github.com/redis-developer/postgres-to-redis-rdi-demo

The beauty of this repository is that you can run it entirely on your local machine using Kubernetes. The repository includes everything you need:

Automated deployment scripts for RDI (both local and cloud options)
A pre-configured PostgreSQL database with sample e-commerce data
Transformation job examples showing JSON and Hash outputs
Step-by-step instructions with visual guides

Within minutes, you'll have a complete CDC pipeline streaming data from PostgreSQL to Redis, transforming relational tables into high-performance key-value structures.

Beyond Simple Caching: A Living Data Layer

What sets Redis apart from other caching solutions is its ability to provide a future-proof solution for problems. This isn't a cache that might be stale or needs complex invalidation logic. It's a real-time materialized view of your source database, transformed and optimized for high-speed access. The configuration-driven approach RDI provides means you can evolve your data pipeline without touching application code:

Need to add a new computed field? Just update the YAML configuration file.
Want to change how data is structured in Redis? Modify the transformation job.
Schema changed in the source? RDI adapts automatically. No action needed.

No redeployment, no code changes, no downtime. Just operational simplicity that lets developers focus on innovation instead of infrastructure. RDI solves the fundamental tension in modern application architecture. You no longer have to choose between consistency and speed, simplicity and performance, and developer productivity and operational excellence.

RDI delivers exactly that: an out-of-the-box data pipeline that offloads reads to Redis, speeding up both your applications and your learning curve.

The Future Is Refresh-Ahead

As we move toward software architectures where data is always in flux, where latency is measured in microseconds, and where scale is assumed rather than planned, the ability to seamlessly synchronize and transform data between complementary stores becomes essential.

Redis Data Integration represents more than a technical solution. It's a paradigm shift in how we think about data architecture. It's the realization that we don't need to accept the trade-offs we've lived with for years. We can achieve transactional consistency with disk-based databases, along with blazing-fast reads in Redis, without compromising on complexity.

The question isn't whether you need real-time data synchronization; it's when you need it. It's whether you can afford to keep solving the 90% problem with yesterday's solutions. Welcome to the refresh-ahead revolution. Your applications and your users will thank you.