DEV Community

Pokhraj Das
Pokhraj Das

Posted on

How We Migrated 60 million PostgreSQL Rows in 40 Minutes & Now Replicate 220K Rows/Second

Learn how we achieved zero-downtime PostgreSQL 13 to 15 migration, replicating 60 million rows in 40 minutes and now handling 220,000 rows/second with sub-500ms latency using Helyx.

The 2-Hour Maintenance Window That Changed Everything

"We have a hard deadline. The analytics platform migration from PostgreSQL 13 to 15 needs to happen next week. The maintenance window? Two hours. Total."

My heart sank. We were looking at:

6 Crore (60 million) rows of time-series data

2 TB of total database size

Zero tolerance for data loss

Minimal performance impact on live services

Traditional pg_dump/pg_restore would take 6-8 hours. Logical replication with native PostgreSQL tools would require hours of configuration and still risk falling behind. We needed a different approach.

Why This Migration Was Particularly Tricky

The Technical Challenges:
Version Jump: Moving from PostgreSQL 13 โ†’ 15 introduces new data types and behavior changes

Live Data: The database was receiving 50,000+ writes per second from production applications

Complex Schema: Multiple hypertables, complex indexes, and JSONB columns

Business Critical: This analytics platform drove real-time dashboards for enterprise clients

Why Native Tools Fell Short:

pg_dump: ~4 hours
network transfer: ~1 hour

pg_restore: ~3 hours
total: ~8 hours + replication catch-up time

We had 2 hours. Not 8.

The Helyx Solution: A Two-Phase Approach

Phase 1: The Initial Bulk Migration

We decided to use Helyx for both the initial migration and ongoing replication. The setup was surprisingly simple.

The moment of truth: We triggered the initial sync at 2:00 PM during our maintenance window.

2:10 PM: 1 Crore rows replicated

2:25 PM: 3 Crore rows replicated

2:40 PM: 6 Crore rows replicated - DONE

Yes, you read that correctly. 6 Crore rows migrated in 40 minutes flat.

Phase 2: Real-Time Replication Activation

Once the initial sync completed, we seamlessly switched to Change Data Capture (CDC) mode.

The Performance Numbers That Made Us Smile
Migration Performance
Metric Result
Total Rows Migrated 6,00,00,000 (60 Million)
Migration Time 40 minutes
Effective Throughput ~2.5 Million rows/minute
Peak Memory Usage 4.2 GB
CPU Utilization 65%

Ongoing Replication Performance:
Metric Result
Current Throughput 220,000 rows/second
Replication Latency < 500 ms (p95)
CPU Overhead on Source < 3%
Network Bandwidth ~180 MBps

Key technical insights:

Parallel Workers: Helyx used 8 parallel workers to maximize I/O throughput

Smart Batching: 50,000-row batches optimized network transfer efficiency

Memory Mapping: Efficient memory usage prevented OOM errors

Continuous Validation: Real-time checksum validation ensured data integrity

Take the Next Step
If you're facing an upcoming database migration or building a high-volume replication system, I'd recommend:

๐Ÿ‘‰ Try Helyx with your own dataset - Start Free Trial
๐Ÿ‘‰ Read our technical deep dive - PostgreSQL CDC Architecture Guide
๐Ÿ‘‰ Schedule a custom PoC - Book a Demo (https://helyx.quobotic.com/)

What's your experience with large database migrations? Have you faced similar challenges or found other creative solutions? Share your stories in the comments below - let's learn from each other!

Top comments (0)