Gabriel Anhaia

Posted on Jun 13

Postgres Partitioning in 2026: When the Complexity Pays Off

#postgres #database #performance #backend

Book: Database Playbook: Choosing the Right Store for Every System You Build
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Your events table has 400 million rows. Deletes of old data take an hour and bloat the table. A DELETE FROM events WHERE created_at < ... locks rows the app still wants. Someone on the team says the word "partitioning" in standup, and now you have a project.

Partitioning is one of the few Postgres features that can make your database faster and your life worse at the same time. It splits one logical table into many physical ones. The planner can skip whole partitions. Dropping old data becomes a metadata operation instead of a row-by-row delete. That part is real.

The cost is that partitioning changes the rules around foreign keys, unique constraints, and how the planner reasons about your queries. Get those wrong and you ship a slower database with a harder schema. So the question isn't "how do I partition." It's "should I."

What partitioning actually is

A partitioned table is a parent that holds no rows. Every row lives in a child partition chosen by a key. You query the parent. Postgres routes reads and writes to the right children.

Three strategies ship in the box.

Range splits by a continuous key, almost always a timestamp or a monotonic ID:

CREATE TABLE events (
  id          bigserial,
  created_at  timestamptz NOT NULL,
  payload     jsonb,
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_06
  PARTITION OF events
  FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');

CREATE TABLE events_2026_07
  PARTITION OF events
  FOR VALUES FROM ('2026-07-01') TO ('2026-08-01');

List splits by a discrete set of values, usually a tenant, region, or status:

CREATE TABLE orders (
  id       bigserial,
  region   text NOT NULL,
  total    int,
  PRIMARY KEY (id, region)
) PARTITION BY LIST (region);

CREATE TABLE orders_eu
  PARTITION OF orders
  FOR VALUES IN ('de', 'fr', 'es', 'it');

CREATE TABLE orders_us
  PARTITION OF orders
  FOR VALUES IN ('us', 'ca');

Hash spreads rows evenly across a fixed number of partitions by a hash of the key. You reach for it when you want to split write load with no natural range or list:

CREATE TABLE sessions (
  user_id  bigint NOT NULL,
  token    text,
  PRIMARY KEY (user_id)
) PARTITION BY HASH (user_id);

CREATE TABLE sessions_0 PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_1 PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_2 PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_3 PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Range is what most teams need. List is for multi-tenant and geographic splits. Hash is the rarest, useful when you want even distribution and have no other axis.

The row-count threshold nobody wants to give you

Teams want a number. The honest version: row count alone does not trigger partitioning. A 500-million-row table with a good B-tree index and no archival need can run fine for years. Partitioning earns its keep when one of these is true, not when a row counter crosses a line.

You delete or archive old data on a schedule. This is the strongest signal. DROP TABLE events_2026_01 is instant and reclaims disk immediately. A DELETE of the same rows runs for an hour, bloats the heap, and leaves work for autovacuum.
Your queries almost always filter on the partition key. Time-series dashboards that read "last 7 days," tenant queries scoped to one tenant. The planner skips everything else.
The table is large enough that index maintenance and vacuum hurt. As a rough floor, think tens of gigabytes and 100M+ rows before the operational wins outweigh the added schema cost. Below that, a partial index or BRIN usually does the same job with none of the gotchas.

If none of those hold, you are adding complexity for a benchmark you will not feel. Skip it.

Partition pruning, the thing you are paying for

Pruning is the planner skipping partitions that cannot match. It is where the speedup comes from, and it only happens when your WHERE clause references the partition key.

EXPLAIN
SELECT count(*) FROM events
WHERE created_at >= '2026-06-10'
  AND created_at <  '2026-06-11';

Aggregate
  ->  Seq Scan on events_2026_06 events_1
        Filter: ((created_at >= '2026-06-10') AND ...)

One partition scanned. Every other partition never gets touched. That is the win.

Now the failure mode. Query without the partition key and pruning is gone:

-- no created_at predicate, so every partition is scanned
SELECT * FROM events WHERE payload @> '{"type":"login"}';

Append
  ->  Seq Scan on events_2026_06 events_1
  ->  Seq Scan on events_2026_07 events_2
  ->  Seq Scan on events_2026_08 events_3
  ... one scan per partition

This is worse than an unpartitioned table, because now you have N scans and N index lookups instead of one. The lesson: partition on the column your queries actually filter on. If your hot queries filter on user_id but you partitioned on created_at, you built the wrong split.

Two settings keep pruning honest. enable_partition_pruning is on by default. Runtime pruning (for parameterized queries where the value is known only at execution) works in prepared statements and IN lists, but check the plan with real parameters, not literals.

The foreign-key gotcha

This is the one that surprises people. The rule has two directions.

A foreign key from a partitioned table to a normal table works fine:

-- this is fine
ALTER TABLE events
  ADD CONSTRAINT fk_user
  FOREIGN KEY (user_id) REFERENCES users (id);

A foreign key pointing at a partitioned table is the trap. Before PG12 you could not create one at all. Modern Postgres supports it, but with a catch. The referenced partitioned table must have a unique constraint that includes the partition key, which leads straight into the next gotcha. And if a child table is later detached, references to rows in it are no longer enforced the way you expect. Test the detach path before you rely on it.

If another table needs to reference your partitioned events, ask whether it really needs a hard FK or whether the relationship can be enforced in application code. Often the answer is the latter, and you save yourself a class of migration headaches.

The unique-constraint gotcha

A unique constraint or primary key on a partitioned table must include every column of the partition key. There is no way around it. Postgres cannot enforce global uniqueness across partitions without scanning all of them, so it refuses.

This bites the most common schema in the book. You want id to be the primary key. You partition by created_at. Postgres rejects it:

-- this fails
CREATE TABLE events (
  id bigserial PRIMARY KEY,
  created_at timestamptz NOT NULL
) PARTITION BY RANGE (created_at);
-- ERROR: unique constraint on partitioned table must
-- include all partitioning columns

The fix is to make the key composite, which is why the examples above used PRIMARY KEY (id, created_at):

CREATE TABLE events (
  id bigserial,
  created_at timestamptz NOT NULL,
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

Now id is unique only within a (id, created_at) pair. If your application or another service treats id as a globally unique handle and looks rows up by id alone, two things happen: that lookup cannot prune, and you have given up the database-level guarantee that id is unique on its own. For most append-only tables with a serial id this is fine in practice, because the sequence never repeats. But it is no longer the database enforcing it. Know that before you ship.

A maintenance pattern that works

Partitions do not create themselves. The clean approach is to pre-create the next partition ahead of time and drop old ones on a schedule. Many teams run pg_partman for this, which automates creation and retention:

CREATE EXTENSION IF NOT EXISTS pg_partman;

SELECT partman.create_parent(
  p_parent_table  => 'public.events',
  p_control       => 'created_at',
  p_type          => 'range',
  p_interval      => '1 month',
  p_premake       => 3
);

p_premake => 3 keeps three future partitions ready so an insert never lands with no home. Without that buffer, a row whose created_at falls outside every defined range hits the default partition (if you made one) or errors. Always define a default partition or pre-create generously.

Retention then becomes a one-liner instead of a long-running delete:

-- instant, reclaims disk immediately
DROP TABLE events_2026_01;

When to skip partitioning entirely

Skip it when:

The table is under ~50–100GB and you have no archival schedule. A partial index or BRIN gives you most of the read win with none of the schema cost.
Your queries do not filter on a single consistent key. No common predicate means no pruning, and partitioning makes those queries slower.
You need id to be globally unique and looked up by id alone across all data. You can still partition, but weigh the composite-key cost first.
You are reaching for partitioning to fix slow queries that a missing index would fix. Add the index. Measure. Partition only if the table's size is the problem, not the query plan.

Partitioning is a maintenance and scale tool, not a query-tuning tool. If your pain is "this query is slow," start with EXPLAIN (ANALYZE, BUFFERS) and indexing. If your pain is "this table is too big to vacuum, archive, or drop from cleanly," that is when partitioning pays off.

What pushed you to partition, or what made you back out of it? The composite-key surprise, the pruning that never kicked in, the FK you could not create? Drop it in the comments.

If this was useful

This post pulls from the chapter on scaling Postgres in the Database Playbook: Choosing the Right Store for Every System You Build. The book sits one level up from partitioning: when partitioning beats a read replica, when it beats sharding, and which workloads belong in Postgres at all versus a column store or a key-value store. If you have ever partitioned a table that did not need it, the book is the slower version of that decision.