Polliog

Posted on Feb 27

PostgreSQL JSONB GIN Indexes: Why Your Queries Are Slow (And How to Fix Them)

#programming #database #postgres #performance

You added a JSONB column. You created a GIN index. Your queries are still doing sequential scans.

Sound familiar?

Here's the problem: Most developers don't understand which operators GIN indexes actually support. You're probably using the wrong one.

I spent days of my life debugging slow JSONB queries before I figured this out. Here's what I learned.

The Problem: You're Using the Wrong Operator

Let's say you have a table with JSONB data:

CREATE TABLE products (
  id BIGINT PRIMARY KEY,
  data JSONB
);

-- Insert some products
INSERT INTO products (id, data) VALUES
  (1, '{"brand": "Nike", "category": "Shoes", "price": 99.99}'),
  (2, '{"brand": "Adidas", "category": "Shoes", "price": 89.99}'),
  (3, '{"brand": "Nike", "category": "Apparel", "price": 49.99}');

You create a GIN index:

CREATE INDEX idx_products_data ON products USING GIN (data);

Then you run a query:

SELECT data->>'brand' 
FROM products 
WHERE data->>'brand' = 'Nike';

PostgreSQL does a sequential scan. The index is ignored.

Why?

What the Hell is a GIN Index?

GIN = Generalized Inverted Index.

Think of it like the index at the back of a book. Instead of "Page 42 contains this word," GIN says "The key brand with value Nike appears in rows 1, 3, 7, 42."

For JSONB, GIN breaks apart the JSON document and indexes every key-value pair separately:

{"brand": "Nike", "category": "Shoes"}

Gets indexed as:

brand:Nike     → [row 1, row 3]
category:Shoes → [row 1, row 5]

This is perfect for "containment" queries: "Find all documents that contain {"brand": "Nike"}".

But here's the catch: GIN doesn't support all JSONB operators.

The Supported Operators (This is Critical)

PostgreSQL has two GIN operator classes for JSONB:

1. `jsonb_ops` (Default)

Supports these operators:

Operator	What it does	Example
`@>`	Contains	`data @> '{"brand": "Nike"}'`
`?`	Key exists	`data ? 'brand'`
`?	`	Any key exists
`?&`	All keys exist	`data ?& array['brand', 'price']`
`@?`	JSONPath exists	`data @? '$.brand'`
`@@`	JSONPath match	`data @@ '$.price > 50'`

Notably missing: ->, ->>, #>, #>>

2. `jsonb_path_ops`

Supports only these operators:

Operator	What it does
`@>`	Contains
`@?`	JSONPath exists
`@@`	JSONPath match

Smaller index, faster queries, but fewer operators.

Why Your Query Doesn't Use the Index

Remember this query?

SELECT data->>'brand' 
FROM products 
WHERE data->>'brand' = 'Nike';

The ->> operator is NOT GIN-indexable.

PostgreSQL sees this and says: "I can't use the GIN index for ->>. Sequential scan it is."

The Fix: Use `@>` (Containment)

SELECT data->>'brand' 
FROM products 
WHERE data @> '{"brand": "Nike"}';

Now PostgreSQL uses the index:

Bitmap Heap Scan on products
  Recheck Cond: (data @> '{"brand": "Nike"}'::jsonb)
  -> Bitmap Index Scan on idx_products_data
       Index Cond: (data @> '{"brand": "Nike"}'::jsonb)

15-20x faster on large datasets.

Real Benchmark: 100k Documents

Setup:

CREATE TABLE logs (
  id BIGSERIAL PRIMARY KEY,
  metadata JSONB
);

-- Insert 100k log documents
INSERT INTO logs (metadata)
SELECT jsonb_build_object(
  'user_id', 'user_' || (random() * 1000)::int,
  'service', (ARRAY['api', 'web', 'worker'])[floor(random() * 3 + 1)],
  'duration_ms', (random() * 5000)::int,
  'status', (ARRAY['success', 'error'])[floor(random() * 2 + 1)]
)
FROM generate_series(1, 100000);

Test 1: Without Index

EXPLAIN ANALYZE 
SELECT metadata->>'user_id'
FROM logs
WHERE metadata->>'user_id' = 'user_42';

Seq Scan on logs
  Filter: ((metadata ->> 'user_id') = 'user_42')
  Rows Removed by Filter: 99900
Execution Time: 45.2 ms

Test 2: With GIN Index (Wrong Operator)

CREATE INDEX idx_logs_metadata ON logs USING GIN (metadata);

EXPLAIN ANALYZE 
SELECT metadata->>'user_id'
FROM logs
WHERE metadata->>'user_id' = 'user_42';

Seq Scan on logs  -- STILL sequential scan!
Execution Time: 44.8 ms

No improvement. Index not used.

Test 3: With GIN Index (Correct Operator)

EXPLAIN ANALYZE 
SELECT metadata->>'user_id'
FROM logs
WHERE metadata @> '{"user_id": "user_42"}';

Bitmap Index Scan on idx_logs_metadata
  Index Cond: (metadata @> '{"user_id": "user_42"}'::jsonb)
Execution Time: 2.8 ms

16x faster. Index used.

`jsonb_ops` vs `jsonb_path_ops`: Which to Use?

I tested both on a 100k row table with typical JSONB documents (~500 bytes each).

Index Size

Operator Class	Index Size	Table Size
No index	0 MB	27 MB
`jsonb_ops`	18 MB	27 MB
`jsonb_path_ops`	6 MB	27 MB

jsonb_path_ops is 3x smaller.

Why? jsonb_ops indexes every key separately. jsonb_path_ops uses a hash-based approach for paths.

Write Performance (INSERT 100k rows)

Operator Class	Time	vs No Index
No index	1.58s	baseline
`jsonb_ops`	2.84s	+79%
`jsonb_path_ops`	1.84s	+16%

jsonb_path_ops has way less write overhead.

Query Performance (Containment `@>`)

WHERE metadata @> '{"service": "api"}'

Operator Class	Time
No index	45ms
`jsonb_ops`	3.2ms
`jsonb_path_ops`	2.4ms

jsonb_path_ops is 25% faster for containment queries.

Decision Matrix

Use jsonb_ops (default) if:

You need ?, ?|, ?& (key existence checks)
You query unpredictable paths (wildcards)
Index size doesn't matter

Use jsonb_path_ops if:

You only use @> (containment)
Write performance matters (high INSERT rate)
Index size matters (smaller = better)

For 90% of use cases, jsonb_path_ops is better.

Common Gotchas

1. Type Casting Hell

This doesn't work:

WHERE metadata->>'duration_ms' > 1000

Why? ->> returns text, not a number. PostgreSQL can't use the index on a text comparison.

Fix:

WHERE (metadata->>'duration_ms')::int > 1000

Or better, use JSONPath:

WHERE metadata @@ '$.duration_ms > 1000'

2. Nested Field Queries

This doesn't use the index:

WHERE metadata->'user'->>'id' = '42'

Fix with @>:

WHERE metadata @> '{"user": {"id": "42"}}'

3. Partial Matches Don't Work

This doesn't use the index:

WHERE metadata->>'name' LIKE '%john%'

GIN indexes are for exact key-value matches, not substring searches.

Workaround: Use pg_trgm extension for trigram indexes on extracted fields:

CREATE EXTENSION pg_trgm;

CREATE INDEX idx_name_trgm 
ON logs USING GIN ((metadata->>'name') gin_trgm_ops);

Expression Indexes for Specific Keys

If you always query the same 1-3 keys, use expression indexes instead:

-- Instead of GIN on entire JSONB column
CREATE INDEX idx_user_id ON logs ((metadata->>'user_id'));
CREATE INDEX idx_service ON logs ((metadata->>'service'));

Why?

Smaller index (B-tree)
Faster queries (for those specific keys)
Lower write overhead

When to use:

Stable schema (keys don't change)
Query patterns are predictable

When NOT to use:

Unpredictable query keys
Many different keys queried

GIN Fast Update (Write Optimization)

By default, GIN uses "fast update" mode. Inserts go to a pending list, then bulk-updated later.

Pros:

Faster INSERTs (batched updates)
Less lock contention

Cons:

Reads must scan pending list (slower)
Unpredictable latency spikes (when pending list flushes)

Disable for Consistent Read Performance

ALTER INDEX idx_products_data SET (fastupdate = off);

Use when:

Read latency predictability > write throughput
Low INSERT rate
Real-time dashboards

Production Tips

1. Always EXPLAIN ANALYZE

Don't assume the index is used:

EXPLAIN (ANALYZE, BUFFERS) 
SELECT ...

Look for:

Bitmap Index Scan on <index_name>
Index Cond: (not Filter:)

2. Monitor Index Bloat

GIN indexes can bloat with heavy writes:

SELECT 
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%gin%'
ORDER BY pg_relation_size(indexrelid) DESC;

Fix: REINDEX CONCURRENTLY

3. Vacuum Regularly

GIN pending lists need cleanup:

SELECT gin_clean_pending_list('idx_products_data');

Or rely on autovacuum (recommended).

When NOT to Use JSONB + GIN

Use dedicated columns if:

Schema is stable (keys won't change)
You need foreign key constraints
You need strong typing (dates, numbers)
You're querying 80%+ of the same fields

Use JSONB + GIN if:

Schema is evolving (startup mode)
Sparse data (many optional fields)
Flexible querying across unknown keys
Storing metadata, logs, events

Real-World Example

I use this pattern in Logtide, an open-source log management platform:

CREATE TABLE logs (
  time TIMESTAMPTZ NOT NULL,
  level TEXT NOT NULL,
  service TEXT NOT NULL,
  message TEXT,
  metadata JSONB  -- User-defined fields
);

CREATE INDEX idx_logs_metadata 
ON logs USING GIN (metadata jsonb_path_ops);

Production stats (30M logs):

Index size: 2.5 GB (table: 15 GB compressed)
Query time: P95 78ms for metadata @> '{...}'
Write throughput: 18.5k logs/sec

GIN on JSONB handles production load without breaking a sweat.

Summary

Stop using ->> in WHERE clauses. Use @> (containment) instead.

Quick reference:

-- Wrong (index not used)
WHERE data->>'key' = 'value'

-- Right (index used)
WHERE data @> '{"key": "value"}'

-- Create index
CREATE INDEX idx_data ON table USING GIN (data jsonb_path_ops);

Key takeaways:

GIN supports @>, ?, @?, @@ — NOT -> or ->>
Use jsonb_path_ops unless you need key existence operators
Always EXPLAIN ANALYZE to verify index usage
Consider expression indexes for stable schemas

PostgreSQL JSONB + GIN is fast. You just need to use it correctly.

some benchmarks may vary on your machine/queries

DEV Community

PostgreSQL JSONB GIN Indexes: Why Your Queries Are Slow (And How to Fix Them)

The Problem: You're Using the Wrong Operator

What the Hell is a GIN Index?

The Supported Operators (This is Critical)

1. `jsonb_ops` (Default)

2. `jsonb_path_ops`

Why Your Query Doesn't Use the Index

The Fix: Use `@>` (Containment)

Real Benchmark: 100k Documents

Test 1: Without Index

Test 2: With GIN Index (Wrong Operator)

Test 3: With GIN Index (Correct Operator)

`jsonb_ops` vs `jsonb_path_ops`: Which to Use?

Index Size

Write Performance (INSERT 100k rows)

Query Performance (Containment `@>`)

Decision Matrix

Common Gotchas

1. Type Casting Hell

2. Nested Field Queries

3. Partial Matches Don't Work

Expression Indexes for Specific Keys

GIN Fast Update (Write Optimization)

Disable for Consistent Read Performance

Production Tips

1. Always EXPLAIN ANALYZE

2. Monitor Index Bloat

3. Vacuum Regularly

When NOT to Use JSONB + GIN

Real-World Example

Summary

Top comments (0)

The Problem: You're Using the Wrong Operator

What the Hell is a GIN Index?

The Supported Operators (This is Critical)

1. jsonb_ops (Default)

2. jsonb_path_ops

Why Your Query Doesn't Use the Index

The Fix: Use @> (Containment)

Real Benchmark: 100k Documents

Test 1: Without Index

Test 2: With GIN Index (Wrong Operator)

Test 3: With GIN Index (Correct Operator)

jsonb_ops vs jsonb_path_ops: Which to Use?

Index Size

Write Performance (INSERT 100k rows)

Query Performance (Containment @>)

Decision Matrix

Common Gotchas

1. Type Casting Hell

2. Nested Field Queries

3. Partial Matches Don't Work

Expression Indexes for Specific Keys

GIN Fast Update (Write Optimization)

Disable for Consistent Read Performance

Production Tips

1. Always EXPLAIN ANALYZE

2. Monitor Index Bloat

3. Vacuum Regularly

When NOT to Use JSONB + GIN

Real-World Example

Summary

1. `jsonb_ops` (Default)

2. `jsonb_path_ops`

The Fix: Use `@>` (Containment)

`jsonb_ops` vs `jsonb_path_ops`: Which to Use?

Query Performance (Containment `@>`)