DEV Community

Cover image for Why don’t my database numbers match? I built a tool to find out.
Harry Wynn
Harry Wynn

Posted on

Why don’t my database numbers match? I built a tool to find out.

If you’ve worked with databases, pipelines, or migrations long enough, you’ve hit this moment:

“Why don’t these numbers match?”

Production says:
10,291 orders

Warehouse says:
10,248 orders

Now you’re stuck figuring out why.


The usual process (aka pain)

Every time this happens, the workflow looks something like:

  • check row counts
  • write a LEFT JOIN
  • hunt for missing rows
  • compare columns
  • export CSVs when it gets too messy
  • repeat for way longer than you expected

You end up writing queries like:

SELECT *
FROM a
LEFT JOIN b ON a.id = b.id
WHERE a.amount != b.amount;
Enter fullscreen mode Exit fullscreen mode

And every time, it’s slightly different… but basically the same problem.


The real problem

It’s not just comparing data.

It’s:

  • comparing across different databases
  • comparing across environments
  • dealing with schema drift
  • figuring out if differences are random or systemic

Most tools either:

  • don’t handle this well, or
  • require way more setup than the problem deserves

So I built something focused

I wanted a tool where I could just run one command:

diffgnome compare --left mysql://prod/orders --right postgres://warehouse/orders --key id
Enter fullscreen mode Exit fullscreen mode

…and immediately see:

Table: orders

Row count mismatch
prod:      10,291
warehouse: 10,248

Missing rows: 43
Changed rows: 88

Columns affected:
amount (17 rows)
status (9 rows)
Enter fullscreen mode Exit fullscreen mode

No giant queries.

No exporting data.

No digging through joins just to get started.


What it actually does

At a high level, it:

  • compares schema differences
  • compares row counts
  • splits data into chunks
  • hashes each chunk
  • drills into mismatched ranges
  • shows sample row differences

So instead of scanning everything manually, it quickly narrows down where the data diverges.


Where this is useful

This comes up constantly in real systems:

  • validating ETL pipelines
  • verifying migrations
  • debugging production issues
  • comparing environments
  • reconciling warehouse data

If data moves, this problem eventually shows up.


Why I built it

I’ve spent a lot of time working with:

  • ETL pipelines
  • data warehouses
  • messy production databases

And I kept running into the same situation:

I know something is wrong — I just need to find where.

So I built a tool that answers that question directly.


Current status

I’ve been using this against real datasets while building it, and it’s already replaced a lot of one-off SQL and manual comparison work.

There’s still more I want to add, but the core workflow is solid.


If this sounds familiar

If you’ve ever:

  • compared prod vs staging
  • verified a migration
  • debugged a broken pipeline
  • tried to reconcile two systems

…then you’ve probably run into this exact problem.


👉 You can check it out here: https://diffgnome.com

Top comments (0)