PostgreSQL GIN Indexes: JSONB, Arrays & Full-Text Search
A GIN (Generalized Inverted Index) in PostgreSQL maps individual values inside composite data types -- JSONB keys, array elements, text search lexemes, and trigrams -- to the rows that contain them. If you're searching inside JSONB, arrays, or text, GIN is the only index type that helps. Without one, every query does a full table scan.
Why B-tree Can't Help Here
B-tree indexes work great for scalar comparisons: WHERE user_id = 123, WHERE created_at > '2025-01-01'. One value per row, standard comparison operators.
But PostgreSQL supports rich data types where a single column contains multiple searchable values:
- A JSONB column with a document containing dozens of keys
- An array column with a list of tags
- A tsvector column with lexemes from an entire text document
- A text column searched with
LIKE '%pattern%'
B-tree indexes can't decompose these into searchable parts. So without a GIN index, queries like WHERE metadata @> '{"status": "active"}' scan every row.
The performance hit is hidden during development. With 1,000 rows in dev, the seq scan is instant. With millions of rows in production, the same query takes seconds.
GIN for JSONB
There are two operator classes, and the choice matters:
-- Default (jsonb_ops): supports @>, ?, ?|, ?& operators
CREATE INDEX CONCURRENTLY idx_products_attributes_gin
ON products USING gin (attributes);
-- Queries it serves:
SELECT * FROM products WHERE attributes @> '{"color": "blue"}';
SELECT * FROM products WHERE attributes ? 'warranty';
SELECT * FROM products WHERE attributes ?| array['color', 'size'];
-- jsonb_path_ops: supports ONLY @> but is 2-3x smaller and faster
CREATE INDEX CONCURRENTLY idx_products_attributes_path
ON products USING gin (attributes jsonb_path_ops);
-- Only serves containment queries:
SELECT * FROM products WHERE attributes @> '{"color": "blue"}';
-- Does NOT support ?, ?|, ?& operators
If your queries only use @> (containment), jsonb_path_ops is the clear winner -- smaller index, faster lookups. If you also need key existence checks (?, ?|, ?&), you need the default.
The most common mistake: creating a jsonb_path_ops index then querying with ? (key existence). The index is silently ignored and the query does a seq scan.
GIN for Arrays
CREATE INDEX CONCURRENTLY idx_articles_tags_gin
ON articles USING gin (tags);
-- Contains all specified elements
SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'performance'];
-- Contains any of the specified elements
SELECT * FROM articles WHERE tags && ARRAY['postgresql', 'mysql'];
-- Is contained by
SELECT * FROM articles WHERE tags <@ ARRAY['postgresql', 'performance', 'indexing'];
GIN for Full-Text Search
First, create a generated tsvector column (PG12+), then index it:
ALTER TABLE articles ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED;
CREATE INDEX CONCURRENTLY idx_articles_search_gin
ON articles USING gin (search_vector);
-- Full-text search query
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC;
The setweight function assigns different weights to title vs body matches, so title matches rank higher.
GIN for Trigram Similarity (pg_trgm)
This one is a game-changer for LIKE queries with leading wildcards:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX CONCURRENTLY idx_customers_name_trgm
ON customers USING gin (customer_name gin_trgm_ops);
-- Now these use the index instead of seq scan:
SELECT * FROM customers WHERE customer_name LIKE '%smith%';
SELECT * FROM customers WHERE customer_name ILIKE '%john%';
-- Similarity searches work too:
SELECT * FROM customers
WHERE customer_name % 'Jon Smith'
ORDER BY similarity(customer_name, 'Jon Smith') DESC;
Without the trigram GIN index, LIKE '%pattern%' always does a sequential scan because B-tree indexes require a fixed prefix.
Detecting Missing GIN Indexes
Find tables with JSONB/array columns that lack GIN indexes:
SELECT
t.schemaname,
t.relname AS table_name,
a.attname AS column_name,
pg_catalog.format_type(a.atttypid, a.atttypmod) AS data_type,
t.seq_scan,
t.seq_tup_read,
pg_size_pretty(pg_relation_size(t.relid)) AS table_size
FROM pg_stat_user_tables t
JOIN pg_attribute a ON a.attrelid = t.relid
WHERE a.atttypid IN ('jsonb'::regtype, 'json'::regtype)
OR pg_catalog.format_type(a.atttypid, a.atttypmod) LIKE '%[]'
OR a.atttypid = 'tsvector'::regtype
ORDER BY t.seq_tup_read DESC;
Confirm with EXPLAIN:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM products
WHERE attributes @> '{"color": "blue", "size": "large"}';
-- If you see "Seq Scan" with "Filter", you need a GIN index
-- If you see "Bitmap Index Scan" on a GIN index, you're good
Tuning fastupdate
GIN indexes use a pending list to batch insertions. This makes inserts fast but can cause unpredictable query latency when the list is flushed:
-- Disable for consistent query latency (slower inserts)
ALTER INDEX idx_products_attributes_gin SET (fastupdate = off);
-- Or tune the pending list size
ALTER INDEX idx_products_attributes_gin SET (gin_pending_list_limit = 256);
If you need predictable query performance over insert throughput, disable fastupdate.
The GIN Index Checklist
Add this to your schema design process:
- JSONB column queried with
@>,?operators -> GIN index - Array column queried with
@>,&&operators -> GIN index - tsvector column -> GIN index
- Text column with
LIKE '%pattern%'-> trigram GIN index (pg_trgm) - Only need containment (
@>)? -> usejsonb_path_ops(2-3x smaller) - Need key existence (
?)? -> use default operator class
Don't wait for production performance complaints. If the column type and query pattern match, add the GIN index in the same migration that creates the column.
Top comments (0)