DEV Community

ひとし 田畑
ひとし 田畑

Posted on

Order your tables for a safe TRUNCATE with graphlib — circular FKs come free

I wanted a panel that answers one boring, dangerous question: in what order can I empty these tables without a foreign key stopping me?

If orders references customers, you can't TRUNCATE customers first — a row in orders still points at it. You empty the children before the parents. With three tables you do this in your head. With forty, laced with a many-to-many join table and one employees.manager_id that points back at employees, you do it wrong, get ERROR: update or delete on table "…" violates foreign key constraint, and start bisecting by hand.

That order has a name — it's a topological sort of the foreign-key graph — and I was about to write the visited/back-edge walk from memory when I remembered Python has shipped one since 3.9. graphlib. Nobody imports it. It's perfect for exactly this.

The whole ordering is one stdlib call

Model each foreign key as an edge: the child (the table holding the FK) depends on the parent (the table it references). Feed those dependencies to graphlib.TopologicalSorter and ask for a static order:

from graphlib import TopologicalSorter, CycleError

def build(tables, edges):
    sorter = TopologicalSorter()
    for t in tables:
        sorter.add(t)                    # add every table, so isolated ones appear
    for e in edges:
        if e.child == e.parent:
            continue                     # self-reference — handled below
        sorter.add(e.child, e.parent)    # child depends on parent
    load_order = list(sorter.static_order())   # parents first
    return {
        "load_order": load_order,               # safe INSERT / restore order
        "truncate_order": list(reversed(load_order)),  # safe TRUNCATE / DELETE order
    }
Enter fullscreen mode Exit fullscreen mode

static_order() yields dependencies before the things that depend on them, so you get parents first — that's the order you'd load data in (or restore a dump). The safe order to empty tables is the exact reverse: children first. One reversed() and you have both directions from the same sort.

Two small but load-bearing details:

Add every table before adding edges. A table with no foreign key in or out never appears in an edge, so if you only add edges it silently drops off your list. sorter.add(t) with no predecessors registers it as a free-floating node that still comes out in the order.

The edge direction is the thing to get right. sorter.add(child, parent) reads as "child depends on parent." Flip it and your TRUNCATE order confidently tells you to empty the parent first — the exact mistake the panel exists to prevent. Worth a comment at the call site.

The cycle detection is the part you didn't write

Here's the payoff. Some schemas have circular foreign keys — a references b and b references a (usually via a nullable column you're meant to fill in a second pass). When that happens no valid order exists, and a hand-rolled topological sort's failure mode is to either loop forever or silently emit a partial, wrong answer.

TopologicalSorter refuses to guess. It raises:

try:
    load_order = list(sorter.static_order())
    cycle = []
except CycleError as exc:
    # CycleError.args == (message, [n1, n2, …, n1]) — the second item is the
    # node list of the cycle, first element repeated at the end.
    load_order = []
    cycle = list(exc.args[1]) if len(exc.args) > 1 else []
Enter fullscreen mode Exit fullscreen mode

CycleError doesn't just tell you a cycle exists — exc.args[1] is the actual list of tables caught in it, with the first repeated at the end so you can read the loop. So the same call that computes the order also detects the one schema footgun that makes the order impossible, and names the tables involved, with zero extra code. You get a real diagnostic for free out of an exception you were catching anyway.

The self-reference trap

employees.manager_id → employees is a foreign key from a table to itself. If you hand that to the sorter as add("employees", "employees"), you've declared that employees is its own predecessor, and it trips a false CycleError — a one-node "cycle" that isn't the kind you care about.

But you can't just drop self-references either, because they're real — they just constrain a different level. A self-referential FK doesn't stop you from emptying the table in one shot: a single-table TRUNCATE employees (or DELETE FROM employees with no WHERE) removes every row at once and the constraint is satisfied. Where it bites is row by row — you can't delete one manager while a subordinate row still points at it, so a partial delete has to walk the table in dependency order. So the honest move is to exclude self-references from the table-level ordering and report them separately:

self_refs = sorted({e.child for e in edges if e.child == e.parent})
Enter fullscreen mode Exit fullscreen mode

They don't constrain table-level order — every row's parent is in the same table — but they're worth surfacing as their own note, not swept into the graph where they'd manufacture a phantom cycle.

Getting the edges: one catalog query

The graph is engine-agnostic once you have the edges; only the extraction is Postgres-specific. Foreign keys live in pg_constraint where contype = 'f', with conrelid = the child and confrelid = the parent:

SELECT con.conname,
       ns.nspname  || '.' || cl.relname  AS child,
       fns.nspname || '.' || fcl.relname AS parent
FROM pg_catalog.pg_constraint con
JOIN pg_catalog.pg_class cl      ON cl.oid = con.conrelid
JOIN pg_catalog.pg_namespace ns  ON ns.oid = cl.relnamespace
JOIN pg_catalog.pg_class fcl     ON fcl.oid = con.confrelid
JOIN pg_catalog.pg_namespace fns ON fns.oid = fcl.relnamespace
WHERE con.contype = 'f'
  AND ns.nspname  NOT IN ('pg_catalog', 'information_schema')
  AND fns.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY child, con.conname;
Enter fullscreen mode Exit fullscreen mode

Rows in, ForeignKeyEdge(child, parent) out, straight into the builder above. The graph logic never learns which database it came from.

The same shape in MySQL

MySQL keeps the same relationship in information_schema.KEY_COLUMN_USAGE — a foreign-key row is one where REFERENCED_TABLE_NAME is not null:

SELECT kcu.CONSTRAINT_NAME,
       CONCAT(kcu.TABLE_SCHEMA, '.', kcu.TABLE_NAME)                       AS child,
       CONCAT(kcu.REFERENCED_TABLE_SCHEMA, '.', kcu.REFERENCED_TABLE_NAME) AS parent
FROM information_schema.KEY_COLUMN_USAGE kcu
WHERE kcu.TABLE_SCHEMA = ?
  AND kcu.REFERENCED_TABLE_NAME IS NOT NULL
GROUP BY kcu.CONSTRAINT_NAME, kcu.TABLE_SCHEMA, kcu.TABLE_NAME,
         kcu.REFERENCED_TABLE_SCHEMA, kcu.REFERENCED_TABLE_NAME;
Enter fullscreen mode Exit fullscreen mode

Same two columns, same edge, same TopologicalSorter. The graphlib half doesn't change one line — swapping the query is the entire port. That's the nice thing about pushing all the database-specific knowledge into "produce a list of (child, parent) pairs" and letting the standard library own the graph.

The honest part

This computes and shows the order — it doesn't run a TRUNCATE. That's deliberate: the value is answering "what's the safe order, and do you even have a valid one?" without touching a byte, and a few caveats keep it honest. A DEFERRABLE constraint can be legally violated mid-transaction, so a cycle that looks fatal here may be resolvable with SET CONSTRAINTS … DEFERRED. Composite and multi-column FKs are still just one edge between two tables. And the topological order is one valid order, not the only one — independent subgraphs can be interleaved however you like.

But the core is solid and it's almost free: the shortest correct path from "a pile of foreign keys" to "a safe delete order, plus a named list of the circular ones" is a for loop and a try/except around a standard-library class most people have never imported. You were about to write the graph algorithm. You didn't have to.


This is one piece of cli2ui — a local-only web UI over the psql commands you keep half-remembering. No AI, no SaaS. It's MIT-licensed on GitHub. What's the last stdlib module you reimplemented before realizing it already existed?

Top comments (0)