If you’ve worked with databases, pipelines, or migrations long enough, you’ve hit this moment:
“Why don’t these numbers match?”
Production says:
10,291 orders
Warehouse says:
10,248 orders
Now you’re stuck figuring out why.
The usual process (aka pain)
Every time this happens, the workflow looks something like:
- check row counts
- write a LEFT JOIN
- hunt for missing rows
- compare columns
- export CSVs when it gets too messy
- repeat for way longer than you expected
You end up writing queries like:
SELECT *
FROM a
LEFT JOIN b ON a.id = b.id
WHERE a.amount != b.amount;
And every time, it’s slightly different… but basically the same problem.
The real problem
It’s not just comparing data.
It’s:
- comparing across different databases
- comparing across environments
- dealing with schema drift
- figuring out if differences are random or systemic
Most tools either:
- don’t handle this well, or
- require way more setup than the problem deserves
So I built something focused
I wanted a tool where I could just run one command:
diffgnome compare --left mysql://prod/orders --right postgres://warehouse/orders --key id
…and immediately see:
Table: orders
Row count mismatch
prod: 10,291
warehouse: 10,248
Missing rows: 43
Changed rows: 88
Columns affected:
amount (17 rows)
status (9 rows)
No giant queries.
No exporting data.
No digging through joins just to get started.
What it actually does
At a high level, it:
- compares schema differences
- compares row counts
- splits data into chunks
- hashes each chunk
- drills into mismatched ranges
- shows sample row differences
So instead of scanning everything manually, it quickly narrows down where the data diverges.
Where this is useful
This comes up constantly in real systems:
- validating ETL pipelines
- verifying migrations
- debugging production issues
- comparing environments
- reconciling warehouse data
If data moves, this problem eventually shows up.
Why I built it
I’ve spent a lot of time working with:
- ETL pipelines
- data warehouses
- messy production databases
And I kept running into the same situation:
I know something is wrong — I just need to find where.
So I built a tool that answers that question directly.
Current status
I’ve been using this against real datasets while building it, and it’s already replaced a lot of one-off SQL and manual comparison work.
There’s still more I want to add, but the core workflow is solid.
If this sounds familiar
If you’ve ever:
- compared prod vs staging
- verified a migration
- debugged a broken pipeline
- tried to reconcile two systems
…then you’ve probably run into this exact problem.
👉 You can check it out here: https://diffgnome.com
Top comments (0)