Philip McClarence

Posted on Apr 18

PostgreSQL JSONB Indexing: GIN, Expression & Partial Index Strategies

#database #performance #postgres #sql

PostgreSQL JSONB Indexing: GIN, Expression & Partial Index Strategies

JSONB is one of PostgreSQL's killer features. You get schema-less flexibility inside a relational database -- user preferences, API payloads, feature flags, event metadata, all stored without defining every column up front. The problem is that most developers treat JSONB as a black box: throw data in, query it with -> and ->>, maybe slap a GIN index on it, and assume PostgreSQL will figure out how to make it fast. It will not. Let's walk through the three indexing strategies and when to use each.

The Fundamental Confusion

The most common JSONB indexing mistake: creating a GIN index and expecting it to speed up ->> equality queries. It doesn't.

-- You create this index
CREATE INDEX idx_events_metadata_gin ON events USING gin (metadata);

-- And expect this query to use it
SELECT * FROM events WHERE metadata->>'status' = 'active';
-- NOPE. Sequential scan. The GIN index doesn't support ->>

The default GIN index (jsonb_ops) supports @>, ?, ?|, and ?& operators -- not ->>. To accelerate that query, you either need an expression index or must rewrite the query to use @> containment.

Detecting the Problem

Find JSONB columns that are triggering sequential scans:

SELECT
    t.schemaname,
    t.relname AS table_name,
    a.attname AS column_name,
    pg_size_pretty(pg_relation_size(t.relid)) AS table_size,
    t.seq_scan,
    t.seq_tup_read
FROM pg_stat_user_tables t
JOIN pg_attribute a ON a.attrelid = t.relid
JOIN pg_type ty ON ty.oid = a.atttypid
WHERE ty.typname = 'jsonb'
  AND a.attnum > 0
  AND NOT a.attisdropped
  AND t.seq_scan > 100
ORDER BY t.seq_tup_read DESC;

Confirm with EXPLAIN:

-- This does NOT use a GIN index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events WHERE metadata->>'status' = 'active';

-- This DOES use a GIN index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events WHERE metadata @> '{"status": "active"}';

If you see Seq Scan on the first query despite having a GIN index, the operator mismatch is your problem.

Strategy 1: GIN Index for Containment (`@>`)

When your queries use containment, GIN is the right choice:

-- Default operator class: supports @>, ?, ?|, ?&
CREATE INDEX CONCURRENTLY idx_events_metadata_gin
    ON events USING gin (metadata);

-- jsonb_path_ops: supports only @>, but 2-3x smaller and faster
CREATE INDEX CONCURRENTLY idx_events_metadata_pathops
    ON events USING gin (metadata jsonb_path_ops);

Use jsonb_path_ops when you only need @> containment. It hashes full paths rather than indexing every key/value, creating a significantly smaller index. On 10M rows with complex documents, the difference can be 3-4x.

Rewrite ->> equality to containment to leverage GIN:

-- Before (no GIN support)
SELECT * FROM events WHERE metadata->>'status' = 'active';

-- After (uses GIN index)
SELECT * FROM events WHERE metadata @> '{"status": "active"}';

Strategy 2: Expression Index for Specific Keys

When you repeatedly query one specific key, an expression index is more efficient:

-- B-tree on a specific extracted key
CREATE INDEX CONCURRENTLY idx_events_status
    ON events ((metadata->>'status'));

-- Now this works
SELECT * FROM events WHERE metadata->>'status' = 'active';

Expression indexes are smaller than GIN (one value per row vs. decomposing the entire document), support range queries (<, >, BETWEEN), and support ORDER BY. They're the right choice when you query known, specific keys.

For JSONB arrays:

CREATE INDEX CONCURRENTLY idx_events_tags_gin
    ON events USING gin ((metadata->'tags'));

-- Query: find events tagged "important"
SELECT * FROM events WHERE metadata->'tags' @> '"important"';

Strategy 3: Partial Index for Selective Conditions

When only a fraction of rows match your query, avoid indexing the entire table:

-- Index only active events (if 90% are archived)
CREATE INDEX CONCURRENTLY idx_events_active_metadata
    ON events USING gin (metadata jsonb_path_ops)
    WHERE metadata->>'status' = 'active';

Dramatically smaller, faster to maintain, and writes to archived rows skip the index entirely. Combine with expression indexes for laser-targeted optimization:

-- Expression index on user_id, only for purchase events
CREATE INDEX CONCURRENTLY idx_events_purchase_user
    ON events ((metadata->>'user_id'))
    WHERE metadata->>'type' = 'purchase';

The Decision Table

Query Pattern	Index Type	Example
`@>` containment	GIN (`jsonb_path_ops`)	`WHERE data @> '{"k": "v"}'`
`->>` equality on known key	Expression (B-tree)	`WHERE data->>'status' = 'x'`
Key existence (`?`)	GIN (default `jsonb_ops`)	`WHERE data ? 'email'`
Range on extracted value	Expression (B-tree)	`WHERE (data->>'score')::int > 90`
Array containment	GIN on sub-path	`WHERE data->'tags' @> '"x"'`

The Write Performance Trade-off

GIN index maintenance is expensive. Every INSERT or UPDATE touching the JSONB column must decompose the entire document to update the index. On write-heavy tables with large documents, this can cut insert throughput by 30-50%.

If you only query 2-3 keys, expression indexes on those keys are dramatically cheaper to maintain. Monitor insert latency after adding GIN indexes -- if INSERTs slow down significantly, switch to targeted expression or partial indexes.

Prevention

Document which JSONB keys will be queried when you add the column. This drives index selection from day one instead of retrofitting after performance degrades.

Monitor GIN index size relative to table size. If the GIN index approaches the table size, you're indexing too much.

Review JSONB query patterns quarterly. Application features evolve, and the keys you query change. Unused JSONB indexes waste space and slow writes for zero benefit.

Originally published at mydba.dev/blog/postgres-jsonb-indexing

DEV Community

PostgreSQL JSONB Indexing: GIN, Expression & Partial Index Strategies

PostgreSQL JSONB Indexing: GIN, Expression & Partial Index Strategies

The Fundamental Confusion

Detecting the Problem

Strategy 1: GIN Index for Containment (`@>`)

Strategy 2: Expression Index for Specific Keys

Strategy 3: Partial Index for Selective Conditions

The Decision Table

The Write Performance Trade-off

Prevention

Top comments (0)

PostgreSQL JSONB Indexing: GIN, Expression & Partial Index Strategies

The Fundamental Confusion

Detecting the Problem

Strategy 1: GIN Index for Containment (@>)

Strategy 2: Expression Index for Specific Keys

Strategy 3: Partial Index for Selective Conditions

The Decision Table

The Write Performance Trade-off

Prevention

Strategy 1: GIN Index for Containment (`@>`)