Postgres JSONB indexes: GIN vs BTREE on the same column

#postgressql #database #performance #json

caught this in production last quarter and the answer is more boring than i expected: GIN and BTREE on the same JSONB column solve different problems, and the right choice depends on the SHAPE of your queries, not the size of the data.

heres the actual benchmark + when each one wins.

the setup

table: users with 2.3M rows. one column attributes JSONB. typical row contains:

{
  "plan": "pro",
  "country": "ID",
  "signup_source": "organic",
  "feature_flags": ["beta_search", "new_billing"],
  "preferences": {"theme": "dark", "lang": "id"}
}

queries i run regularly:

Q1: WHERE attributes->>'plan' = 'pro' — find all pro users
Q2: WHERE attributes @> '{"feature_flags": ["beta_search"]}' — find users with a feature flag
Q3: WHERE attributes->>'country' = 'ID' AND attributes->>'plan' = 'pro' — pro users in indonesia
Q4: WHERE attributes ? 'preferences' — users who have the preferences key at all

four queries. different optimal indexes for each.

the three index options

-- option A: GIN on whole column
CREATE INDEX idx_attrs_gin ON users USING GIN (attributes);

-- option B: GIN with jsonb_path_ops (faster but only for @>)
CREATE INDEX idx_attrs_gin_path ON users USING GIN (attributes jsonb_path_ops);

-- option C: BTREE on extracted scalar
CREATE INDEX idx_attrs_plan ON users ((attributes->>'plan'));

actual EXPLAIN ANALYZE results (2.3M rows)

Q1: WHERE attributes->>'plan' = 'pro'

no index: 480ms (seq scan)
GIN (option A): 220ms (bitmap scan, recheck) — GIN is generic, returns false positives, needs the recheck step
GIN path_ops (option B): doesn't work for ->> extraction. ignored.
BTREE on (attributes->>'plan') (option C): 8ms (index scan, no recheck) — winner by 27x

Q2: WHERE attributes @> '{"feature_flags": ["beta_search"]}'

no index: 510ms (seq scan)
GIN: 14ms (bitmap scan) — solid
GIN path_ops: 6ms — winner. path_ops loses other operators but is 2.3x faster for @> specifically because it stores hashed paths only.
BTREE on extracted: cant express this query. n/a.

Q3: WHERE plan='pro' AND country='ID' (compound on JSONB)

BTREE on plan only: 120ms (index on plan, then filter for country in heap)
BTREE composite on ((attributes->>'plan'), (attributes->>'country')): 3ms — winner.

if you find yourself running compound queries on JSONB scalars, the COMPOSITE BTREE on extracted columns beats every other option for that exact shape.

Q4: WHERE attributes ? 'preferences'

GIN (option A): 18ms — works because the ? operator is supported
GIN path_ops: doesnt work for the ? operator. path_ops only indexes @>.
BTREE on extracted: cant express. n/a.

the actual lesson

GIN with default ops is the safe default if you don't know your query shape yet. handles @>, ?, ?|, ?&. flexibility tax: ~2x slower than path_ops on @>, way slower than BTREE on ->>.

GIN with jsonb_path_ops is the specialist if you ONLY use @> (contains). most apps doing feature flag / array-contains queries fit here.

BTREE on extracted scalar (((attributes->>'X'))) is always the fastest for that specific scalar predicate. price: one BTREE index per accessed scalar field.

compound BTREE on multiple extracted scalars is the god mode for queries shaped like Q3. costs you write-amplification but reads are 40-100x faster than alternatives.

what i actually use in production

after benchmarking:

composite BTREE on the 2 high-cardinality scalars (plan, country) that show up in 80% of WHERE clauses → 3ms for the hot path
one GIN path_ops index for the @> feature_flags queries → 6ms for flag lookups
no general-purpose GIN. the ? operator queries (Q4) are rare; we accept seq scan for them.

trade-off: 2 BTREE + 1 GIN = 3 indexes on a single JSONB column. write amplification per row: ~30% slower INSERTs vs no indexes. acceptable because writes are 1% of our traffic.

the rule of thumb i give other engineers

if you know the query shape, use a BTREE on the extracted scalar. if you don't, use a default GIN. only add jsonb_path_ops if you've measured + you ONLY use @>.

GIN-on-everything is the lazy answer that costs you 20-50x on simple equality queries. and most JSONB queries in business code are simple equality queries.

one more thing

the BTREE on (attributes->>'plan') index works ONLY if your query writes the predicate as attributes->>'plan' = 'pro'. if you write (attributes->'plan')::text = '"pro"' (note the double quote inside the literal — JSONB vs text comparison), it won't use the index. lost an hour to this once. expression on the LEFT side of = must match the indexed expression exactly.