DEV Community

Cover image for PostgreSQL JSONB GIN Indexes: Why Your Queries Are Slow (And How to Fix Them)
Polliog
Polliog

Posted on

PostgreSQL JSONB GIN Indexes: Why Your Queries Are Slow (And How to Fix Them)

You added a JSONB column. You created a GIN index. Your queries are still doing sequential scans.

Sound familiar?

Here's the problem: Most developers don't understand which operators GIN indexes actually support. You're probably using the wrong one.

I spent days of my life debugging slow JSONB queries before I figured this out. Here's what I learned.


The Problem: You're Using the Wrong Operator

Let's say you have a table with JSONB data:

CREATE TABLE products (
  id BIGINT PRIMARY KEY,
  data JSONB
);

-- Insert some products
INSERT INTO products (id, data) VALUES
  (1, '{"brand": "Nike", "category": "Shoes", "price": 99.99}'),
  (2, '{"brand": "Adidas", "category": "Shoes", "price": 89.99}'),
  (3, '{"brand": "Nike", "category": "Apparel", "price": 49.99}');
Enter fullscreen mode Exit fullscreen mode

You create a GIN index:

CREATE INDEX idx_products_data ON products USING GIN (data);
Enter fullscreen mode Exit fullscreen mode

Then you run a query:

SELECT data->>'brand' 
FROM products 
WHERE data->>'brand' = 'Nike';
Enter fullscreen mode Exit fullscreen mode

PostgreSQL does a sequential scan. The index is ignored.

Why?


What the Hell is a GIN Index?

GIN = Generalized Inverted Index.

Think of it like the index at the back of a book. Instead of "Page 42 contains this word," GIN says "The key brand with value Nike appears in rows 1, 3, 7, 42."

For JSONB, GIN breaks apart the JSON document and indexes every key-value pair separately:

{"brand": "Nike", "category": "Shoes"}
Enter fullscreen mode Exit fullscreen mode

Gets indexed as:

brand:Nike     → [row 1, row 3]
category:Shoes → [row 1, row 5]
Enter fullscreen mode Exit fullscreen mode

This is perfect for "containment" queries: "Find all documents that contain {"brand": "Nike"}".

But here's the catch: GIN doesn't support all JSONB operators.


The Supported Operators (This is Critical)

PostgreSQL has two GIN operator classes for JSONB:

1. jsonb_ops (Default)

Supports these operators:

Operator What it does Example
@> Contains data @> '{"brand": "Nike"}'
? Key exists data ? 'brand'
`? ` Any key exists
?& All keys exist data ?& array['brand', 'price']
@? JSONPath exists data @? '$.brand'
@@ JSONPath match data @@ '$.price > 50'

Notably missing: ->, ->>, #>, #>>

2. jsonb_path_ops

Supports only these operators:

Operator What it does
@> Contains
@? JSONPath exists
@@ JSONPath match

Smaller index, faster queries, but fewer operators.


Why Your Query Doesn't Use the Index

Remember this query?

SELECT data->>'brand' 
FROM products 
WHERE data->>'brand' = 'Nike';
Enter fullscreen mode Exit fullscreen mode

The ->> operator is NOT GIN-indexable.

PostgreSQL sees this and says: "I can't use the GIN index for ->>. Sequential scan it is."

The Fix: Use @> (Containment)

SELECT data->>'brand' 
FROM products 
WHERE data @> '{"brand": "Nike"}';
Enter fullscreen mode Exit fullscreen mode

Now PostgreSQL uses the index:

Bitmap Heap Scan on products
  Recheck Cond: (data @> '{"brand": "Nike"}'::jsonb)
  -> Bitmap Index Scan on idx_products_data
       Index Cond: (data @> '{"brand": "Nike"}'::jsonb)
Enter fullscreen mode Exit fullscreen mode

15-20x faster on large datasets.


Real Benchmark: 100k Documents

Setup:

CREATE TABLE logs (
  id BIGSERIAL PRIMARY KEY,
  metadata JSONB
);

-- Insert 100k log documents
INSERT INTO logs (metadata)
SELECT jsonb_build_object(
  'user_id', 'user_' || (random() * 1000)::int,
  'service', (ARRAY['api', 'web', 'worker'])[floor(random() * 3 + 1)],
  'duration_ms', (random() * 5000)::int,
  'status', (ARRAY['success', 'error'])[floor(random() * 2 + 1)]
)
FROM generate_series(1, 100000);
Enter fullscreen mode Exit fullscreen mode

Test 1: Without Index

EXPLAIN ANALYZE 
SELECT metadata->>'user_id'
FROM logs
WHERE metadata->>'user_id' = 'user_42';
Enter fullscreen mode Exit fullscreen mode
Seq Scan on logs
  Filter: ((metadata ->> 'user_id') = 'user_42')
  Rows Removed by Filter: 99900
Execution Time: 45.2 ms
Enter fullscreen mode Exit fullscreen mode

Test 2: With GIN Index (Wrong Operator)

CREATE INDEX idx_logs_metadata ON logs USING GIN (metadata);

EXPLAIN ANALYZE 
SELECT metadata->>'user_id'
FROM logs
WHERE metadata->>'user_id' = 'user_42';
Enter fullscreen mode Exit fullscreen mode
Seq Scan on logs  -- STILL sequential scan!
Execution Time: 44.8 ms
Enter fullscreen mode Exit fullscreen mode

No improvement. Index not used.

Test 3: With GIN Index (Correct Operator)

EXPLAIN ANALYZE 
SELECT metadata->>'user_id'
FROM logs
WHERE metadata @> '{"user_id": "user_42"}';
Enter fullscreen mode Exit fullscreen mode
Bitmap Index Scan on idx_logs_metadata
  Index Cond: (metadata @> '{"user_id": "user_42"}'::jsonb)
Execution Time: 2.8 ms
Enter fullscreen mode Exit fullscreen mode

16x faster. Index used.


jsonb_ops vs jsonb_path_ops: Which to Use?

I tested both on a 100k row table with typical JSONB documents (~500 bytes each).

Index Size

Operator Class Index Size Table Size
No index 0 MB 27 MB
jsonb_ops 18 MB 27 MB
jsonb_path_ops 6 MB 27 MB

jsonb_path_ops is 3x smaller.

Why? jsonb_ops indexes every key separately. jsonb_path_ops uses a hash-based approach for paths.

Write Performance (INSERT 100k rows)

Operator Class Time vs No Index
No index 1.58s baseline
jsonb_ops 2.84s +79%
jsonb_path_ops 1.84s +16%

jsonb_path_ops has way less write overhead.

Query Performance (Containment @>)

WHERE metadata @> '{"service": "api"}'
Enter fullscreen mode Exit fullscreen mode
Operator Class Time
No index 45ms
jsonb_ops 3.2ms
jsonb_path_ops 2.4ms

jsonb_path_ops is 25% faster for containment queries.


Decision Matrix

Use jsonb_ops (default) if:

  • You need ?, ?|, ?& (key existence checks)
  • You query unpredictable paths (wildcards)
  • Index size doesn't matter

Use jsonb_path_ops if:

  • You only use @> (containment)
  • Write performance matters (high INSERT rate)
  • Index size matters (smaller = better)

For 90% of use cases, jsonb_path_ops is better.


Common Gotchas

1. Type Casting Hell

This doesn't work:

WHERE metadata->>'duration_ms' > 1000
Enter fullscreen mode Exit fullscreen mode

Why? ->> returns text, not a number. PostgreSQL can't use the index on a text comparison.

Fix:

WHERE (metadata->>'duration_ms')::int > 1000
Enter fullscreen mode Exit fullscreen mode

Or better, use JSONPath:

WHERE metadata @@ '$.duration_ms > 1000'
Enter fullscreen mode Exit fullscreen mode

2. Nested Field Queries

This doesn't use the index:

WHERE metadata->'user'->>'id' = '42'
Enter fullscreen mode Exit fullscreen mode

Fix with @>:

WHERE metadata @> '{"user": {"id": "42"}}'
Enter fullscreen mode Exit fullscreen mode

3. Partial Matches Don't Work

This doesn't use the index:

WHERE metadata->>'name' LIKE '%john%'
Enter fullscreen mode Exit fullscreen mode

GIN indexes are for exact key-value matches, not substring searches.

Workaround: Use pg_trgm extension for trigram indexes on extracted fields:

CREATE EXTENSION pg_trgm;

CREATE INDEX idx_name_trgm 
ON logs USING GIN ((metadata->>'name') gin_trgm_ops);
Enter fullscreen mode Exit fullscreen mode

Expression Indexes for Specific Keys

If you always query the same 1-3 keys, use expression indexes instead:

-- Instead of GIN on entire JSONB column
CREATE INDEX idx_user_id ON logs ((metadata->>'user_id'));
CREATE INDEX idx_service ON logs ((metadata->>'service'));
Enter fullscreen mode Exit fullscreen mode

Why?

  • Smaller index (B-tree)
  • Faster queries (for those specific keys)
  • Lower write overhead

When to use:

  • Stable schema (keys don't change)
  • Query patterns are predictable

When NOT to use:

  • Unpredictable query keys
  • Many different keys queried

GIN Fast Update (Write Optimization)

By default, GIN uses "fast update" mode. Inserts go to a pending list, then bulk-updated later.

Pros:

  • Faster INSERTs (batched updates)
  • Less lock contention

Cons:

  • Reads must scan pending list (slower)
  • Unpredictable latency spikes (when pending list flushes)

Disable for Consistent Read Performance

ALTER INDEX idx_products_data SET (fastupdate = off);
Enter fullscreen mode Exit fullscreen mode

Use when:

  • Read latency predictability > write throughput
  • Low INSERT rate
  • Real-time dashboards

Production Tips

1. Always EXPLAIN ANALYZE

Don't assume the index is used:

EXPLAIN (ANALYZE, BUFFERS) 
SELECT ...
Enter fullscreen mode Exit fullscreen mode

Look for:

  • Bitmap Index Scan on <index_name>
  • Index Cond: (not Filter:)

2. Monitor Index Bloat

GIN indexes can bloat with heavy writes:

SELECT 
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%gin%'
ORDER BY pg_relation_size(indexrelid) DESC;
Enter fullscreen mode Exit fullscreen mode

Fix: REINDEX CONCURRENTLY

3. Vacuum Regularly

GIN pending lists need cleanup:

SELECT gin_clean_pending_list('idx_products_data');
Enter fullscreen mode Exit fullscreen mode

Or rely on autovacuum (recommended).


When NOT to Use JSONB + GIN

Use dedicated columns if:

  • Schema is stable (keys won't change)
  • You need foreign key constraints
  • You need strong typing (dates, numbers)
  • You're querying 80%+ of the same fields

Use JSONB + GIN if:

  • Schema is evolving (startup mode)
  • Sparse data (many optional fields)
  • Flexible querying across unknown keys
  • Storing metadata, logs, events

Real-World Example

I use this pattern in Logtide, an open-source log management platform:

CREATE TABLE logs (
  time TIMESTAMPTZ NOT NULL,
  level TEXT NOT NULL,
  service TEXT NOT NULL,
  message TEXT,
  metadata JSONB  -- User-defined fields
);

CREATE INDEX idx_logs_metadata 
ON logs USING GIN (metadata jsonb_path_ops);
Enter fullscreen mode Exit fullscreen mode

Production stats (30M logs):

  • Index size: 2.5 GB (table: 15 GB compressed)
  • Query time: P95 78ms for metadata @> '{...}'
  • Write throughput: 18.5k logs/sec

GIN on JSONB handles production load without breaking a sweat.


Summary

Stop using ->> in WHERE clauses. Use @> (containment) instead.

Quick reference:

-- Wrong (index not used)
WHERE data->>'key' = 'value'

-- Right (index used)
WHERE data @> '{"key": "value"}'

-- Create index
CREATE INDEX idx_data ON table USING GIN (data jsonb_path_ops);
Enter fullscreen mode Exit fullscreen mode

Key takeaways:

  1. GIN supports @>, ?, @?, @@ — NOT -> or ->>
  2. Use jsonb_path_ops unless you need key existence operators
  3. Always EXPLAIN ANALYZE to verify index usage
  4. Consider expression indexes for stable schemas

PostgreSQL JSONB + GIN is fast. You just need to use it correctly.


some benchmarks may vary on your machine/queries

Top comments (0)