DEV Community: Postgres Pro

We’ve learned how to migrate databases from Oracle to Postgres Pro at 41 TB/day

Postgres Pro — Fri, 22 Aug 2025 09:35:57 +0000

Imagine a typical scenario: your company runs on Oracle Database, and over the years, you’ve accumulated tens of terabytes of data. Now you want to move to Postgres Pro — but the migration process quickly turns into a nightmare:

It takes forever to move all the data.
The source system can’t stop — it has to keep running, even though the data is constantly changing.
There’s a real risk of data loss or corruption — from transmission errors to type mismatches and other gremlins.

We decided to tackle these problems head-on and created ProGate, a toolkit that makes life a lot easier for DBAs and speeds up the migration to Postgres Pro.

What’s ProGate?

ProGate is an all-in-one solution for moving data into Postgres Pro. It’s a set of specialized tools that cover every step of the migration process:

ProCopy — high-speed initial data load.
ProSync — continuous change synchronization (CDC — Change Data Capture).
ProCheck — post-migration data quality and integrity checks.

Let’s go through them one by one.

Step 1. Initial data load with ProCopy

ProCopy is a console utility for blasting large amounts of data into Postgres Pro as fast as possible — without having to stop the source system. That means minimal downtime and minimal impact on running applications.

In our recent synthetic tests, we reached 200–500 MB/sec for Oracle → Postgres Pro migrations, which works out to about 41 TB/day. For PostgreSQL → Postgres Pro, we hit around 1 GB/sec.

How does it work

ProCopy is written in Go, so it’s lightweight, highly concurrent, and ready to chew through databases of any size. Under the hood, it uses:

A pool of parallel read/write processes.
An internal data bus for fast process communication.
Robust error handling with retries for problematic records.

ProCopy use case

Say you’ve got a CUSTOMERS table in Oracle with billions of rows. With ProCopy, you can:

Run the migration in parallel, tuning the number of reader/writer processes and batch sizes.
Exclude unnecessary columns.
Rename columns or change their data types.
Transform NULLs into default values on the fly.

Highlights

Blazing fast data transfer.
Flexible configuration (YAML/JSON).
Resume from where you left off.
Support for complex data types (LOB, XML, JSON, etc.).

Step 2. Change synchronization with ProSync (CDC)

ProSync keeps Oracle and Postgres Pro in sync by continuously capturing changes (CDC) and applying them to the target database. This makes it possible to migrate with very short downtime.

How does it work

Oracle stores every data change — inserts, updates, deletes — in redo logs. ProSync tails these logs in real time, detects each change, and immediately applies it to Postgres Pro.

As a result, your Postgres Pro instance is always up-to-date with Oracle, down to the last committed transaction.

Real-world example

A bank can’t afford even an hour of downtime. With ProSync, they can migrate data and apps gradually, while the old Oracle system stays live. Customers keep working with Oracle, and every transaction is instantly mirrored to Postgres Pro. Once everything’s in sync, switching over takes minutes.

Highlights

Minimal load on the source DB.
Reliable error handling and replication health monitoring.

Step 3. Data quality check with ProCheck

ProCheck verifies that the migration went smoothly and that the data in Postgres Pro is exactly the same as in Oracle.

How does it work

It compares tables, rows, and columns between the two databases. It catches discrepancies, conversion errors, and missing data — and gives you a detailed report.

ProGate use case

After migrating a financial system, you want to make sure every balance matches to the last cent. ProCheck will go row-by-row to confirm that nothing’s been lost or altered.

ProGate is perfect for:

Huge databases (terabytes and up).
Hot migrations with minimal downtime.
Strict data consistency requirements.

Known Limitations

While ProGate solves a lot of headaches, you should keep in mind:

Schema changes during migration may require manual handling.
Custom data types may need special mapping.
Tables without primary keys — ProSync works best with unique identifiers.

What’s next

The public release of ProGate is planned for this fall. Our roadmap includes:

A graphical UI, backend, and API.
New sources/targets (MS SQL Server, MySQL, Shardman).

In short: ProGate will let you move from Oracle to Postgres Pro fast, safe, and with zero drama — no endless downtime, no missing data, no nasty surprises.

We’ll share more about the technical internals after the public launch.

On reordering expressions in Postgres

Postgres Pro — Fri, 01 Aug 2025 10:28:11 +0000

Today, I want to talk about one of those sneaky tricks that can help speed up query execution. Specifically, this is about reordering conditions in WHERE clauses, JOINs, HAVING clauses, and so on.

The idea is simple: if a condition in an AND chain turns out to be false, or if one in an OR chain turns out to be true, there's no need to evaluate the rest. That means saved CPU cycles — and sometimes, a lot of them. Let’s break this down.

From time to time, I come across queries with complex filters like this:

SELECT * FROM table
WHERE
 date > min_date AND
 date < now() - interval '1 day' AND
 value IN Subplan AND
 id = 42;

And in real life, just changing the order of these conditions can noticeably improve performance. Why? Individually, each condition is cheap. But when you're applying them to millions of rows, that "cheap" quickly adds up. Especially once you've dealt with the usual suspects — like making sure the table fits into shared buffers.

This effect is easiest to spot in wide tables — the kind with dozens of variable-length columns. I sometimes see sluggish IndexScans, and when you dig in, it turns out the performance hit comes from filtering on a column that's the 20th field in the tuple. Just figuring out the offset of that column takes CPU time.

Looking at Postgres source code, it’s clear the community thought about this long ago. Back in 2002, Tom Lane reluctantly committed 3779f7f, which added a basic reordering of expressions (see order_qual_clauses) to push subplans to the end of the list. That made sense — subplans can depend on runtime parameters and are generally expensive to evaluate.

Later, in 2007, commit 5a7471c changed the logic. From that point on, expressions were ordered strictly by their estimated cost. This still holds today, with a small tweak from 215b43c for row-level security (RLS), where evaluation order needed more control within each plan node.

Let’s see what we get in upstream Postgres right now:

CREATE TABLE test (
  x integer, y numeric, w timestamp DEFAULT CURRENT_TIMESTAMP, z integer
);
INSERT INTO test (x, y) SELECT gs, gs FROM generate_series(1, 1E3) AS gs;
VACUUM ANALYZE test;

EXPLAIN (COSTS ON)
SELECT * FROM test
WHERE
 z > 0 AND
 w > now() AND
 x < (SELECT avg(y) FROM generate_series(1,1E2) y WHERE y % 2 = x % 3) AND
 x NOT IN (SELECT avg(y) FROM generate_series(1,1E2) y OFFSET 0) AND
 w IS NOT NULL AND
 x = 42;

If we check the Filter line in the plan, we get:

Filter: ((w IS NOT NULL) AND (z > 0) AND (x = 42) AND (w > now()) AND
         ((x)::numeric = (InitPlan 2).col1) AND ((x)::numeric < (SubPlan 1)))

Postgres evaluates them in this exact order, left to right.

Here are the estimated costs:

z > 0: 0.0025
w > now(): 0.005
x < SubPlan 1: 2.0225
x NOT IN SubPlan 2: 0.005
w IS NOT NULL: 0.0
x = 42: 0.0025

This ordering looks mostly reasonable. But could we do better?

There are at least two low-hanging fruits here:

1. Column position cost. If the column is far to the right in the tuple layout, accessing it costs more. We could add a tiny cost factor based on ordinal position — enough to help the planner choose x = 42 over z = 42, for example, all other things equal.
Selectivity-based ordering. When two expressions have similar cost — say, x = 42 and z < 50 — it's better to put the more selective one first. If x = 42 is true less often, the planner should evaluate it before the rest.

Let’s test how much performance gain we could actually get from this. First, we’ll build a table where some columns are far apart but have the same selectivity, and others are close together but differ in selectivity.

CREATE TEMP TABLE test_2 (x1 numeric, x2 numeric, x3 numeric, x4 numeric);
INSERT INTO test_2 (x1, x2, x3, x4)
  SELECT x, (x::integer)%2, (x::integer)%100,x FROM 
    (SELECT random()*1E7 FROM generate_series(1,1E7) AS x) AS q(x);
ANALYZE;

Let’s check what happens when we search for values in this "wide" tuple. The columns x1 and x4 are identical in every way, except that the offset of x1 within the tuple is known in advance, while for x4, the system has to calculate it for each row:

EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF)
SELECT * FROM test_2 WHERE x1 = 42 AND x4 = 42;

EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF)
SELECT * FROM test_2 WHERE x4 = 42 AND x1 = 42;

/*
 Seq Scan on test_2  (actual rows=0.00 loops=1)
   Filter: ((x1 = '42'::numeric) AND (x4 = '42'::numeric))
   Buffers: local read=94357
  Execution Time: 2372.032 ms

 Seq Scan on test_2  (actual rows=0.00 loops=1)
   Filter: ((x4 = '42'::numeric) AND (x1 = '42'::numeric))
   Buffers: local read=94357
 Execution Time: 2413.633 ms
*/

So, all other things being equal, even a moderately wide tuple can cause a 2–3% difference in execution time. That’s roughly the same impact you'd expect from enabling JIT in typical cases.

Now let’s look at how selectivity affects performance. The columns x1 and x2 are located close to each other in the tuple, but there's a key difference: x1 holds almost unique values, while x2 is filled with near-duplicates:

EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF)
SELECT * FROM test_2 WHERE x2 = 1 AND x1 = 42;

EXPLAIN (ANALYZE, TIMING OFF, COSTS OFF)
SELECT * FROM test_2 WHERE x1 = 42 AND x2 = 1;
/*
 Seq Scan on test_2  (actual rows=0.00 loops=1)
   Filter: ((x2 = '1'::numeric) AND (x1 = '42'::numeric))
   Buffers: local read=74596
 Execution Time: 2363.903 ms

 Seq Scan on test_2  (actual rows=0.00 loops=1)
   Filter: ((x1 = '42'::numeric) AND (x2 = '1'::numeric))
   Buffers: local read=74596
 Execution Time: 2034.873 ms
*/
That’s a 10% difference.

If you assume that these small effects can stack across the entire plan tree — with scans, joins, groupings, etc. — then yes, reordering conditions might actually be worth doing, even if it adds a little planning overhead.

So let’s try implementing it. As an extension, this won’t work — there’s no hook in the planner to intercept final plan creation. We could really use a create_plan_hook() in create_plan(), but the community hasn’t discussed it seriously yet.

So I went ahead and made a core patch. You need to touch two places:

cost_qual_eval() — where Postgres estimates the cost of conditions
order_qual_clauses — the function that sorts them

You can find the final code in a branch on GitHub.

Running the earlier examples on that branch, expressions are now ordered better — taking both column order and selectivity into account. No extra planning overhead observed so far.