PostgreSQL JSONB Indexing: GIN, Expression & Partial Index Strategies
JSONB is one of PostgreSQL's killer features. You get schema-less flexibility inside a relational database -- user preferences, API payloads, feature flags, event metadata, all stored without defining every column up front. The problem is that most developers treat JSONB as a black box: throw data in, query it with -> and ->>, maybe slap a GIN index on it, and assume PostgreSQL will figure out how to make it fast. It will not. Let's walk through the three indexing strategies and when to use each.
The Fundamental Confusion
The most common JSONB indexing mistake: creating a GIN index and expecting it to speed up ->> equality queries. It doesn't.
-- You create this index
CREATE INDEX idx_events_metadata_gin ON events USING gin (metadata);
-- And expect this query to use it
SELECT * FROM events WHERE metadata->>'status' = 'active';
-- NOPE. Sequential scan. The GIN index doesn't support ->>
The default GIN index (jsonb_ops) supports @>, ?, ?|, and ?& operators -- not ->>. To accelerate that query, you either need an expression index or must rewrite the query to use @> containment.
Detecting the Problem
Find JSONB columns that are triggering sequential scans:
SELECT
t.schemaname,
t.relname AS table_name,
a.attname AS column_name,
pg_size_pretty(pg_relation_size(t.relid)) AS table_size,
t.seq_scan,
t.seq_tup_read
FROM pg_stat_user_tables t
JOIN pg_attribute a ON a.attrelid = t.relid
JOIN pg_type ty ON ty.oid = a.atttypid
WHERE ty.typname = 'jsonb'
AND a.attnum > 0
AND NOT a.attisdropped
AND t.seq_scan > 100
ORDER BY t.seq_tup_read DESC;
Confirm with EXPLAIN:
-- This does NOT use a GIN index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events WHERE metadata->>'status' = 'active';
-- This DOES use a GIN index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events WHERE metadata @> '{"status": "active"}';
If you see Seq Scan on the first query despite having a GIN index, the operator mismatch is your problem.
Strategy 1: GIN Index for Containment (@>)
When your queries use containment, GIN is the right choice:
-- Default operator class: supports @>, ?, ?|, ?&
CREATE INDEX CONCURRENTLY idx_events_metadata_gin
ON events USING gin (metadata);
-- jsonb_path_ops: supports only @>, but 2-3x smaller and faster
CREATE INDEX CONCURRENTLY idx_events_metadata_pathops
ON events USING gin (metadata jsonb_path_ops);
Use jsonb_path_ops when you only need @> containment. It hashes full paths rather than indexing every key/value, creating a significantly smaller index. On 10M rows with complex documents, the difference can be 3-4x.
Rewrite ->> equality to containment to leverage GIN:
-- Before (no GIN support)
SELECT * FROM events WHERE metadata->>'status' = 'active';
-- After (uses GIN index)
SELECT * FROM events WHERE metadata @> '{"status": "active"}';
Strategy 2: Expression Index for Specific Keys
When you repeatedly query one specific key, an expression index is more efficient:
-- B-tree on a specific extracted key
CREATE INDEX CONCURRENTLY idx_events_status
ON events ((metadata->>'status'));
-- Now this works
SELECT * FROM events WHERE metadata->>'status' = 'active';
Expression indexes are smaller than GIN (one value per row vs. decomposing the entire document), support range queries (<, >, BETWEEN), and support ORDER BY. They're the right choice when you query known, specific keys.
For JSONB arrays:
CREATE INDEX CONCURRENTLY idx_events_tags_gin
ON events USING gin ((metadata->'tags'));
-- Query: find events tagged "important"
SELECT * FROM events WHERE metadata->'tags' @> '"important"';
Strategy 3: Partial Index for Selective Conditions
When only a fraction of rows match your query, avoid indexing the entire table:
-- Index only active events (if 90% are archived)
CREATE INDEX CONCURRENTLY idx_events_active_metadata
ON events USING gin (metadata jsonb_path_ops)
WHERE metadata->>'status' = 'active';
Dramatically smaller, faster to maintain, and writes to archived rows skip the index entirely. Combine with expression indexes for laser-targeted optimization:
-- Expression index on user_id, only for purchase events
CREATE INDEX CONCURRENTLY idx_events_purchase_user
ON events ((metadata->>'user_id'))
WHERE metadata->>'type' = 'purchase';
The Decision Table
| Query Pattern | Index Type | Example |
|---|---|---|
@> containment |
GIN (jsonb_path_ops) |
WHERE data @> '{"k": "v"}' |
->> equality on known key |
Expression (B-tree) | WHERE data->>'status' = 'x' |
Key existence (?) |
GIN (default jsonb_ops) |
WHERE data ? 'email' |
| Range on extracted value | Expression (B-tree) | WHERE (data->>'score')::int > 90 |
| Array containment | GIN on sub-path | WHERE data->'tags' @> '"x"' |
The Write Performance Trade-off
GIN index maintenance is expensive. Every INSERT or UPDATE touching the JSONB column must decompose the entire document to update the index. On write-heavy tables with large documents, this can cut insert throughput by 30-50%.
If you only query 2-3 keys, expression indexes on those keys are dramatically cheaper to maintain. Monitor insert latency after adding GIN indexes -- if INSERTs slow down significantly, switch to targeted expression or partial indexes.
Prevention
Document which JSONB keys will be queried when you add the column. This drives index selection from day one instead of retrofitting after performance degrades.
Monitor GIN index size relative to table size. If the GIN index approaches the table size, you're indexing too much.
Review JSONB query patterns quarterly. Application features evolve, and the keys you query change. Unused JSONB indexes waste space and slow writes for zero benefit.
Originally published at mydba.dev/blog/postgres-jsonb-indexing
Top comments (0)