Seamless PostgreSQL Migration: How We Moved 60 million Rows in 45 Minutes with Zero Lag

#postgres #kafka #database #devops

"Learn how to achieve a near-zero-downtime migration from PostgreSQL 13 to 15. This guide covers the architecture and process using Helyx for real-time replication, ensuring no data loss while handling millions of records."

When the mandate came down to migrate our core analytics platform from an on-premise PostgreSQL 13 cluster to a cloud-based PostgreSQL 15 instance, the team was rightfully anxious. The dataset was massive—over 6 Crore (60 million) rows of data. The requirements were strict: near-zero downtime, zero data loss, and no replication lag for a system processing thousands of transactions.

Traditional methods like pg_dump/restore would have meant hours of downtime. Logical replication with native PostgreSQL tools required complex setup and risked falling behind on writes. We needed a more robust solution. We needed Helyx.

Below is exact architecture and process we used to achieve this seamless migration.
The Challenge: High Stakes and Tight Tolerances
Source: PostgreSQL 13 (On-premise)

Target: PostgreSQL 15 (Cloud)

Data Volume: 6 Crore+ rows

Key Requirement: Zero data loss, minimal latency during cutover

Performance Goal: Sustain high write throughput without impacting source performance.

The Helyx-Powered Migration Architecture

Helyx operates as a lightweight engine. It reads the PostgreSQL logs, transforms the data in-memory, and applies it to the target database in real-time.
The entire process can be visualized in three key phases:

Phase 1: The Initial Sync — 6 Crore Rows in 45 Minutes
The first step was to get a consistent baseline copy of all data to the target.

Established a Snapshot: Helyx first created a consistent snapshot of the source database, ensuring data integrity at a specific point in time.

Parallelized Data Transfer: Using multiple workers, Helyx streamed the snapshot data to the PostgreSQL 15 target. The key here was parallelism and efficient batching, which allowed us to move the entire 6 Crore rows in a mere 40 minutes.

No Source Impact: Because Helyx reads the database logs rather than constantly querying the tables, the performance impact on our live source database was minimal .

Phase 2: Real-Time Replication — The Magic of Helyx
Once the initial sync was complete, Helyx seamlessly transitioned into its primary mode: real-time replication.

Continuous Log Reading: Helyx continuously monitors the PostgreSQL 13 logs, instantly capturing every INSERT, UPDATE, and DELETE in-Memory Processing & Delivery: Changes are processed in-memory and immediately queued for delivery to the target database. This is how we maintained a steady-state replication latency of under 500 milliseconds.

Automatic Schema Evolution: Had we needed to apply a schema change (like adding a column) during the replication, Helyx would have detected it and adapted the data stream automatically, preventing a replication break.

Phase 3:

The Cutover — From "In-Sync" to "Live"

After validating data consistency and ensuring replication lag was negligible, we executed the cutover.

Brief Application Pause: We temporarily set the application to read-only mode (a process of mere minutes) to ensure no new transactions were written to the old source.

Final Sync: We confirmed Helyx had applied all in-flight transactions.

Connection Switch: We updated the application's connection string to point to the new PostgreSQL 15 cluster.

Application Resume: The application resumed full operations, now writing to the new cloud database.

The result was a successful migration with zero data loss and only minutes of read-only downtime.
💡 Why This Approach Beats Native Tools
While PostgreSQL has built-in logical replication, Helyx provided critical advantages for this high-stakes migration:

Simpler Configuration: No need to manually manage publications, subscriptions, and replication identities for every table.
Superior Performance: Optimized for high-throughput, low-latency scenarios, consistently handling over 220,000 rows per second without breaking a sweat.
Resilience: Better handling of network glitches and schema changes, reducing the risk of replication failure.

Key Technical Takeaways:

Data Capture is King for Live Migrations: For large, active databases, log-based data is the only way to achieve near-zero downtime.
Test with Production-Grade Data: Always run a full PoC with a production-sized dataset to uncover bottlenecks related to data volume and write patterns.
Monitor Everything: Track key metrics like replication latency, throughput (rows/sec), and error rates in real-time throughout the process.

Plan the Cutover Meticulously: A well-rehearsed, step-by-step cutover plan is crucial for minimizing the final application downtime window.

See Helyx in Action

This migration proved that moving terabyte-scale databases doesn't have to be a nightmare. The right tool transforms a risky, multi-day operation into a predictable, automated process.

Visit: https://www.helyx.quobotic.com. Try Helyx Free and experience seamless database migration for yourself.

We hope this detailed architecture breakdown is helpful. Have you tackled a similar database migration? What challenges did you face? Share your stories in the comments below!