Gabriel Anhaia

Posted on May 24

JSONB vs Relational in 2026: 5 Query Shapes, 5 Verdicts

#postgres #database #performance #backend

Book: Database Playbook: Choosing the Right Store for Every System You Build
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

JSONB landed in Postgres 9.4 back in 2014. A decade later your team is still arguing about it. Half of them want every new table to have a data jsonb column "for flexibility." The other half think every JSONB column is a future migration tax. Both camps are right about half the time, which is the most unhelpful version of being right.

So let's stop arguing in the abstract. There are five query shapes that cover roughly 90% of what apps actually do against Postgres. Each one has a clear winner. The row count where the verdict flips is also knowable. Here it is.

When JSONB actually ships

Before the verdicts, a sanity check. JSONB is the right call when you genuinely don't know the shape yet, when the shape is polymorphic per row, or when you need to store an opaque blob that you'll read back as-is.

Three patterns that age well:

Audit logs: event_payload jsonb because every event type has different fields and you want to dump the whole envelope. Nobody queries individual keys at scale.
Rapid prototyping: week one of a product, you have no idea what fields the integration partner will send. Catch it in JSONB, see what the data looks like, extract columns in week three.
Polymorphic per-row data: a settings column where a kafka integration has different keys than a slack one. Forcing both into a shared schema costs more than it saves.

Outside those three, the default should be columns. The five query shapes below show why.

Shape 1: Equality on a known field (relational wins by a lot)

The most common app query in existence: "find the user with email x." If email lives inside a JSONB column, this is what it looks like:

-- email lives in profile->>'email'
SELECT id FROM users
WHERE profile->>'email' = 'sarah@example.com';

You can index it with an expression index:

CREATE INDEX users_profile_email_idx
  ON users ((profile->>'email'));

That works. It's also the moment you realize you've reinvented a column with extra steps. Every query that filters by email now has to spell the JSON path correctly, the index won't apply if anyone writes profile -> 'email' instead of profile ->> 'email', and the planner's row estimates for expression indexes are routinely worse than for plain columns.

Benchmarked on 5M rows with the email present on every row, the column version is consistently 2–4x faster on cold cache and emits a saner plan. The verdict here doesn't flip with size. If you're filtering on a known scalar field, that field belongs in a column.

Shape 2: Existence check (`?` operator), where JSONB with `jsonb_path_ops` is fine

Now the actual JSONB use case: "find every row whose features JSON has a beta_billing key."

SELECT id FROM accounts
WHERE features ? 'beta_billing';

Without an index, Postgres scans every row and parses every JSONB document. On 5M rows that's not a query you want on a hot path. The fix is a GIN index, and this is where the index choice matters more than people realise:

-- the default GIN class indexes everything: keys, values, paths.
-- bigger, supports more operators (@>, ?, ?|, ?&, @?, @@).
CREATE INDEX accounts_features_gin
  ON accounts USING gin (features);

-- jsonb_path_ops indexes only the hashed path-value pairs.
-- about 30% smaller, faster builds, but only supports @>.
CREATE INDEX accounts_features_gin_pathops
  ON accounts USING gin (features jsonb_path_ops);

If your workload is pure @> containment lookups, jsonb_path_ops is the right pick: smaller index, faster builds, faster lookups. If you need ? existence checks too, you're stuck with the default opclass. Or you keep both, which I've seen teams do when the containment index is the hot path and the existence one runs on admin pages.

A nasty subtlety: ? checks for a key at the top level only. WHERE features ? 'beta_billing' won't find {"flags": {"beta_billing": true}}. People learn this the hard way at 2am. For nested keys you need @? with a jsonpath, or restructure.

Shape 3: Aggregation across a JSONB field (relational wins below ~1M rows)

"Sum amount_cents from every order placed last week."

If amount_cents lives inside order_data jsonb, every aggregation row has to extract and cast:

SELECT SUM((order_data->>'amount_cents')::bigint)
FROM orders
WHERE created_at >= NOW() - INTERVAL '7 days';

The ->> operator returns text. The cast to bigint happens for every row in the scan. On a bigint column, the same query reads a fixed-width number directly from the tuple. On a 500k-row weekly window, the column version finishes in tens of milliseconds, while the JSONB version takes seconds. Easily a 20–40x gap on typical hardware.

The crossover point is somewhere between 100k and 1M rows depending on payload size and how warm the cache is. Below that you'll mostly notice it on slow endpoints. Above that, aggregation queries start showing up in your slow log as the dominant cost. Hot numeric fields you aggregate on belong in columns. Full stop.

Shape 4: Containment query (`@>`), where JSONB shines

"Find every event whose payload contains {"source": "stripe", "type": "invoice.paid"}." Try expressing that with relational columns and you end up with N filter predicates per query plus index combinations the planner has to choose between. With JSONB it's one operator:

SELECT id FROM events
WHERE payload @> '{"source": "stripe", "type": "invoice.paid"}'::jsonb;

With the jsonb_path_ops GIN index from Shape 2, this is fast. Really fast. The index stores hashed path-value pairs, so the lookup is essentially a hash probe against a posting list. On a 50M-row event table, this returns in a couple of milliseconds for selective predicates.

This is the shape JSONB was built for. Multi-key containment against semi-structured data with an unbounded set of possible keys. If your app does a lot of "find the subset of records whose JSON matches this prototype," JSONB plus jsonb_path_ops is genuinely the right answer and there's no relational shape that beats it without a denormalisation explosion.

Shape 5: Sparse / polymorphic schema, where JSONB is the right answer

"Store user-defined custom fields per account, where account A has {vat_id, billing_contact} and account B has {purchase_order_number, accounting_email, ap_phone}."

The relational shape is the EAV anti-pattern: a custom_fields table with (account_id, field_name, field_value) and a million JOINs to assemble one record. That works at small scale and rots at large scale. The other relational option is nullable columns for every possible custom field, which means a schema migration every time a customer asks for a new one.

JSONB makes this trivial:

CREATE TABLE accounts (
  id         bigserial PRIMARY KEY,
  name       text NOT NULL,
  custom     jsonb NOT NULL DEFAULT '{}'::jsonb
);

-- queryable when you need it
CREATE INDEX accounts_custom_gin
  ON accounts USING gin (custom jsonb_path_ops);

You get type-flexible storage, indexed containment lookups, and no schema churn. The trade is that you have no schema enforcement at the column level. That responsibility moves to the app layer, ideally with a versioned validator (JSON Schema, a Pydantic/Zod model, whatever your stack uses). When the schema is genuinely open-ended, this trade is good.

The migration path: extract hot keys to columns, keep cold ones in JSONB

The common mistake is treating this as binary. It isn't. A table can have a relational core and a JSONB tail. The pattern that ages best in production:

Start with data jsonb when shape is unclear.
Run for a few weeks. Look at your slow query log. The fields showing up in WHERE and ORDER BY are the hot ones.
Extract those fields into typed columns. Keep the long tail in JSONB.

Here's a migration script for that extraction, written so you can paste it as a template:

BEGIN;

-- 1. add the new typed columns, nullable so the backfill can run hot.
ALTER TABLE orders
  ADD COLUMN customer_id bigint,
  ADD COLUMN amount_cents bigint,
  ADD COLUMN currency char(3),
  ADD COLUMN placed_at timestamptz;

-- 2. backfill from the JSONB blob. batch for big tables.
UPDATE orders
SET
  customer_id  = (data->>'customer_id')::bigint,
  amount_cents = (data->>'amount_cents')::bigint,
  currency     = data->>'currency',
  placed_at    = (data->>'placed_at')::timestamptz
WHERE customer_id IS NULL;

-- 3. enforce NOT NULL once the backfill is verified.
ALTER TABLE orders
  ALTER COLUMN customer_id  SET NOT NULL,
  ALTER COLUMN amount_cents SET NOT NULL,
  ALTER COLUMN currency     SET NOT NULL,
  ALTER COLUMN placed_at    SET NOT NULL;

-- 4. drop the now-redundant keys from the JSONB blob to save space.
--    skip this step if anything still reads them via the JSON path.
UPDATE orders SET data = data
  - 'customer_id' - 'amount_cents' - 'currency' - 'placed_at';

-- 5. indexes appropriate for the new columns.
CREATE INDEX orders_customer_id_idx ON orders (customer_id);
CREATE INDEX orders_placed_at_idx   ON orders (placed_at);

COMMIT;

For tables above ~5M rows, do step 2 in batches with LIMIT + a WHERE primary_key BETWEEN clause, and run them in a loop outside a single transaction. A single UPDATE over 50M rows holds a row exclusive lock long enough to wreck your p99 latency, even with the rest of the workload otherwise idle.

Also worth saying: dropping keys from JSONB in step 4 rewrites every row. On a big table, that's a lot of bloat and a vacuum after. Skip it if disk is cheap and reads through the JSON path are still happening anywhere in the codebase. The schema-as-documentation value isn't worth a multi-hour VACUUM FULL.

The gotcha: JSONB columns can't be a foreign key target

This is the trap that catches teams who push too hard into JSONB. Foreign keys can only reference columns. Not JSONB paths. Not expressions.

So this fails:

-- doesn't work. there's no syntax for "FK targeting a JSONB path."
ALTER TABLE invoices
  ADD CONSTRAINT invoices_customer_fk
  FOREIGN KEY ((data->>'customer_id'))
  REFERENCES customers(id);
-- ERROR: syntax error at or near "("

You can't fake it with a check constraint, either. Check constraints in Postgres can't run subqueries, so CHECK (EXISTS (SELECT 1 FROM customers WHERE id = (data->>'customer_id')::bigint)) is rejected. You're stuck with either trigger-based enforcement (slow and easy to bypass) or no referential integrity (and the orphaned-record problems that follow).

The escape hatch is exactly what the migration script above does: extract the FK target to a real column.

ALTER TABLE invoices
  ADD COLUMN customer_id bigint;

UPDATE invoices
SET customer_id = (data->>'customer_id')::bigint;

ALTER TABLE invoices
  ALTER COLUMN customer_id SET NOT NULL,
  ADD CONSTRAINT invoices_customer_fk
    FOREIGN KEY (customer_id) REFERENCES customers(id);

Now you have a real FK with ON DELETE behaviour, plan-time row estimates, and the rest of what relational integrity gives you. The JSONB column can still hold the long tail.

If you find yourself wanting FKs from three different JSONB keys, that's the database telling you those keys should have been columns from day one. Listen to it.

The five verdicts on one card

Query shape	Winner	Crossover row count
Equality on a known scalar	Relational column	Never flips. Column wins at all sizes
Existence check (`?`)	JSONB + GIN	Below ~10k rows the seq scan is fine; above, you need GIN
Numeric aggregation	Relational column	Flips around 100k–1M rows; columns win above
Containment (`@>`) on semi-structured data	JSONB + `jsonb_path_ops`	JSONB wins at all sizes when the schema is open
Polymorphic / per-row sparse keys	JSONB	JSONB wins at all sizes; alternative is EAV pain

The decision isn't "JSONB or relational." It's "which fields per table." Hot, typed, queried-by-equality, foreign-keyed: column. Cold, sparse, polymorphic, containment-searched: JSONB. The migration path between them is well-trodden. Use it.

What's the worst JSONB-vs-column call you've seen ship to prod, and how long did it take to undo?

If this was useful

Picking the right storage shape is the chapter most teams skim and the one that costs them six months later. The Database Playbook: Choosing the Right Store for Every System You Build walks the JSONB-versus-relational decision in more depth — the GIN opclass trade-offs, the schema-evolution patterns that don't paint you into a corner, and the queries you can't write because you picked the wrong store. If you spend any of your week tuning Postgres or arguing about it in PRs, it's probably worth the read.

DEV Community

JSONB vs Relational in 2026: 5 Query Shapes, 5 Verdicts

When JSONB actually ships

Shape 1: Equality on a known field (relational wins by a lot)

Shape 2: Existence check (`?` operator), where JSONB with `jsonb_path_ops` is fine

Shape 3: Aggregation across a JSONB field (relational wins below ~1M rows)

Shape 4: Containment query (`@>`), where JSONB shines

Shape 5: Sparse / polymorphic schema, where JSONB is the right answer

The migration path: extract hot keys to columns, keep cold ones in JSONB

The gotcha: JSONB columns can't be a foreign key target

The five verdicts on one card

If this was useful

Top comments (0)

When JSONB actually ships

Shape 1: Equality on a known field (relational wins by a lot)

Shape 2: Existence check (? operator), where JSONB with jsonb_path_ops is fine

Shape 3: Aggregation across a JSONB field (relational wins below ~1M rows)

Shape 4: Containment query (@>), where JSONB shines

Shape 5: Sparse / polymorphic schema, where JSONB is the right answer

The migration path: extract hot keys to columns, keep cold ones in JSONB

The gotcha: JSONB columns can't be a foreign key target

The five verdicts on one card

If this was useful

Shape 2: Existence check (`?` operator), where JSONB with `jsonb_path_ops` is fine

Shape 4: Containment query (`@>`), where JSONB shines