Philip McClarence

Posted on Apr 10

PostgreSQL GIN Indexes: JSONB, Arrays & Full-Text Search

#postgres #database #sql #performance

PostgreSQL GIN Indexes: JSONB, Arrays & Full-Text Search

A GIN (Generalized Inverted Index) in PostgreSQL maps individual values inside composite data types -- JSONB keys, array elements, text search lexemes, and trigrams -- to the rows that contain them. If you're searching inside JSONB, arrays, or text, GIN is the only index type that helps. Without one, every query does a full table scan.

Why B-tree Can't Help Here

B-tree indexes work great for scalar comparisons: WHERE user_id = 123, WHERE created_at > '2025-01-01'. One value per row, standard comparison operators.

But PostgreSQL supports rich data types where a single column contains multiple searchable values:

A JSONB column with a document containing dozens of keys
An array column with a list of tags
A tsvector column with lexemes from an entire text document
A text column searched with LIKE '%pattern%'

B-tree indexes can't decompose these into searchable parts. So without a GIN index, queries like WHERE metadata @> '{"status": "active"}' scan every row.

The performance hit is hidden during development. With 1,000 rows in dev, the seq scan is instant. With millions of rows in production, the same query takes seconds.

GIN for JSONB

There are two operator classes, and the choice matters:

-- Default (jsonb_ops): supports @>, ?, ?|, ?& operators
CREATE INDEX CONCURRENTLY idx_products_attributes_gin
    ON products USING gin (attributes);

-- Queries it serves:
SELECT * FROM products WHERE attributes @> '{"color": "blue"}';
SELECT * FROM products WHERE attributes ? 'warranty';
SELECT * FROM products WHERE attributes ?| array['color', 'size'];

-- jsonb_path_ops: supports ONLY @> but is 2-3x smaller and faster
CREATE INDEX CONCURRENTLY idx_products_attributes_path
    ON products USING gin (attributes jsonb_path_ops);

-- Only serves containment queries:
SELECT * FROM products WHERE attributes @> '{"color": "blue"}';
-- Does NOT support ?, ?|, ?& operators

If your queries only use @> (containment), jsonb_path_ops is the clear winner -- smaller index, faster lookups. If you also need key existence checks (?, ?|, ?&), you need the default.

The most common mistake: creating a jsonb_path_ops index then querying with ? (key existence). The index is silently ignored and the query does a seq scan.

GIN for Arrays

CREATE INDEX CONCURRENTLY idx_articles_tags_gin
    ON articles USING gin (tags);

-- Contains all specified elements
SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'performance'];

-- Contains any of the specified elements
SELECT * FROM articles WHERE tags && ARRAY['postgresql', 'mysql'];

-- Is contained by
SELECT * FROM articles WHERE tags <@ ARRAY['postgresql', 'performance', 'indexing'];

GIN for Full-Text Search

First, create a generated tsvector column (PG12+), then index it:

ALTER TABLE articles ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'B')
    ) STORED;

CREATE INDEX CONCURRENTLY idx_articles_search_gin
    ON articles USING gin (search_vector);

-- Full-text search query
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC;

The setweight function assigns different weights to title vs body matches, so title matches rank higher.

GIN for Trigram Similarity (pg_trgm)

This one is a game-changer for LIKE queries with leading wildcards:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX CONCURRENTLY idx_customers_name_trgm
    ON customers USING gin (customer_name gin_trgm_ops);

-- Now these use the index instead of seq scan:
SELECT * FROM customers WHERE customer_name LIKE '%smith%';
SELECT * FROM customers WHERE customer_name ILIKE '%john%';

-- Similarity searches work too:
SELECT * FROM customers
WHERE customer_name % 'Jon Smith'
ORDER BY similarity(customer_name, 'Jon Smith') DESC;

Without the trigram GIN index, LIKE '%pattern%' always does a sequential scan because B-tree indexes require a fixed prefix.

Detecting Missing GIN Indexes

Find tables with JSONB/array columns that lack GIN indexes:

SELECT
    t.schemaname,
    t.relname AS table_name,
    a.attname AS column_name,
    pg_catalog.format_type(a.atttypid, a.atttypmod) AS data_type,
    t.seq_scan,
    t.seq_tup_read,
    pg_size_pretty(pg_relation_size(t.relid)) AS table_size
FROM pg_stat_user_tables t
JOIN pg_attribute a ON a.attrelid = t.relid
WHERE a.atttypid IN ('jsonb'::regtype, 'json'::regtype)
   OR pg_catalog.format_type(a.atttypid, a.atttypmod) LIKE '%[]'
   OR a.atttypid = 'tsvector'::regtype
ORDER BY t.seq_tup_read DESC;

Confirm with EXPLAIN:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM products
WHERE attributes @> '{"color": "blue", "size": "large"}';

-- If you see "Seq Scan" with "Filter", you need a GIN index
-- If you see "Bitmap Index Scan" on a GIN index, you're good

Tuning fastupdate

GIN indexes use a pending list to batch insertions. This makes inserts fast but can cause unpredictable query latency when the list is flushed:

-- Disable for consistent query latency (slower inserts)
ALTER INDEX idx_products_attributes_gin SET (fastupdate = off);

-- Or tune the pending list size
ALTER INDEX idx_products_attributes_gin SET (gin_pending_list_limit = 256);

If you need predictable query performance over insert throughput, disable fastupdate.

The GIN Index Checklist

Add this to your schema design process:

JSONB column queried with @>, ? operators -> GIN index
Array column queried with @>, && operators -> GIN index
tsvector column -> GIN index
Text column with LIKE '%pattern%' -> trigram GIN index (pg_trgm)
Only need containment (@>)? -> use jsonb_path_ops (2-3x smaller)
Need key existence (?)? -> use default operator class

Don't wait for production performance complaints. If the column type and query pattern match, add the GIN index in the same migration that creates the column.

DEV Community

PostgreSQL GIN Indexes: JSONB, Arrays & Full-Text Search

PostgreSQL GIN Indexes: JSONB, Arrays & Full-Text Search

Why B-tree Can't Help Here

GIN for JSONB

GIN for Arrays

GIN for Full-Text Search

GIN for Trigram Similarity (pg_trgm)

Detecting Missing GIN Indexes

Tuning fastupdate

The GIN Index Checklist

Top comments (0)