Euro Format for Euro Format

Posted on Feb 18

How We Tamed 300+ Stored Procedures in a Legacy PostgreSQL Project

#postgres #python #refactoring #database

A client came to us with a deceptively simple task.

Legacy PostgreSQL project.
300+ stored procedures.
Remove a deprecated field account_id from the transactions table.

Sounds easy — until you realize no one knows where that field is used.

The business logic lived inside the database — not in the repository, not in migrations, not in version control. The database was the source of truth.

And you can’t Ctrl+Shift+F inside a running database.

First idea: use PostgreSQL system catalogs

PostgreSQL is not a black box. You can inspect:

pg_proc
pg_depend
information_schema
pg_catalog

You can even build partial dependency graphs.

We explored that path.

The problem? It works well for structural dependencies. But it starts breaking down with:

dynamic SQL (EXECUTE)
string-based references
functions calling other functions indirectly
conditional logic embedded in procedures

You get partial visibility. And partial visibility in legacy systems is often worse than no visibility — it creates false confidence.

We needed something deterministic.

The pragmatic solution

We wrote a small Python script.

It connects to the database, extracts every stored procedure using pg_get_functiondef, and saves each one as a file:

catalog / schema / function|procedure / routine_name__identity_args.sql

To handle overloaded routines deterministically, we export them by OID and include their identity arguments in the filename to ensure stable and predictable output.

Then we committed everything into the repository.

Now the database layer was searchable.

A simple Find in Files in the IDE surfaced every usage of account_id in seconds.

No guesswork. No fragile dependency assumptions. Just text search over actual definitions.

Why this mattered more than it seems

On the surface, this looks like a utility script.

In practice, it did something more important:

It created a snapshot of the database logic.
It moved business logic into version control.
It reduced operational blindness.
It gave the team ownership of the DB layer again.

Legacy is not old code.

Legacy is code without ownership.

Exporting procedures into git was the first step toward reclaiming that ownership.

Is this perfect?

No.

If someone edits a procedure directly in pgAdmin and doesn’t re-run the script, the files go stale.

We considered:

cron jobs
CI/CD triggers
automatic diff checks

None of those solve the root issue: the database should not be the primary editing surface.

But here’s the trade-off we accepted:

Staleness is cheaper than blindness.

The files don’t need to be perfectly up to date — only up to date when auditing or planning structural changes.

And stored procedures in legacy systems tend to change far less frequently than application code.

So we re-run the script before any structural change.
That’s enough.

What would be the “right” way?

Ideally:

SQL lives in the repository.
Changes go through versioned migrations.
Flyway, Liquibase, or plain versioned .sql files define the schema.
The database becomes a result of code execution — not the source of truth.

That’s a different project.

We had a production system and a task to complete safely.

The goal wasn’t architectural purity.
The goal was controlled change.

Full script available here:
https://github.com/euroformatsro/postgresql-procedure-export

The outcome

The script took less than an hour to write.

The audit took minutes instead of days.

We removed the field with confidence.

We didn’t modernize the system.

We just made it understandable.

Sometimes that’s all you need.

DEV Community