DEV Community

Cover image for When Your Database Became a Library: Guide to PostgreSQL Indexes
Igor Nosatov
Igor Nosatov

Posted on

When Your Database Became a Library: Guide to PostgreSQL Indexes

Tags: #postgresql #database #performance #sql

Meta Description: Discover PostgreSQL indexes through the lens of a vast library system. Learn when to use B-tree, Hash, GIN, GiST, BRIN, and more with real-world examples and performance insights.

Cover Image Concept: A surreal library with books floating in organized patterns, each shelf labeled with different index types, with a confused librarian (developer) holding a magnifying glass.


The Day My Query Took 47 Minutes (And What a Librarian Taught Me)

It was 2 AM on a Tuesday. Production was melting. Our analytics dashboard—the one the CEO checks every morning with his coffee—was timing out. The query that usually took 2 seconds was now approaching its 47th minute.

I did what any panicked developer does: I added EXPLAIN ANALYZE to the front of the query and watched in horror as PostgreSQL decided to scan all 12 million rows of our events table. No index. Just a good old-fashioned sequential scan, reading every single row like someone searching for a specific book by checking every single volume in the Library of Congress.

That's when it clicked: databases are libraries, and indexes are the card catalog system.

Let me show you what I mean.

The Library Metaphor: Understanding Index Philosophy

Imagine you walk into a massive library with 10 million books. You need to find "The Art of PostgreSQL" by Dimitri Fontaine. You have three options:

  1. Sequential Scan: Start at the entrance and check every single book until you find it (12 million rows later...)
  2. Index Scan: Walk to the card catalog, look up the book by title, and go directly to shelf B-42
  3. No Library at All: Keep all 10 million books in your backpack (memory) and hope you don't need to walk anywhere

PostgreSQL gives you six types of card catalogs (indexes), each designed for different ways people search for books. Let's meet them all.


The Six Guardians of Query Performance

1. B-tree: The Classic Card Catalog

What it is: The default. The workhorse. The "if you're not sure, use this" index.

Real-world analogy: A dictionary. Want words starting with "Q"? Flip to the Q section. Want words between "quantum" and "quartz"? They're all grouped together, sorted, ready to go.

When to use:

  • Equality comparisons (WHERE user_id = 42)
  • Range queries (WHERE created_at > '2024-01-01')
  • Sorting operations (ORDER BY price DESC)
  • Pattern matching with left-anchored patterns (WHERE email LIKE 'john%')

Real example from my production disaster:

-- Before: 47 minutes ☠️
SELECT user_id, COUNT(*) 
FROM events 
WHERE event_type = 'page_view' 
  AND created_at >= '2024-11-01'
GROUP BY user_id;

-- The fix: B-tree composite index
CREATE INDEX idx_events_type_date 
ON events(event_type, created_at);

-- After: 1.2 seconds 🎉
Enter fullscreen mode Exit fullscreen mode

Why it worked: B-trees store data in sorted order. PostgreSQL could jump directly to 'page_view' entries, then scan only November dates within that subset.

The gotcha I learned the hard way:

-- ❌ Index NOT used (function on indexed column)
WHERE LOWER(email) = 'john@example.com'

-- ✅ Index used (function-based index)
CREATE INDEX idx_email_lower ON users(LOWER(email));
WHERE LOWER(email) = 'john@example.com'
Enter fullscreen mode Exit fullscreen mode

Column order matters in composite indexes:

-- Index: (event_type, created_at)
-- ✅ Fast: Uses index fully
WHERE event_type = 'click' AND created_at > '2024-01-01'

-- ✅ Fast: Uses index partially (first column)
WHERE event_type = 'click'

-- ❌ Slow: Can't use index (skips first column)
WHERE created_at > '2024-01-01'
Enter fullscreen mode Exit fullscreen mode

2. Hash: The Lightning-Fast Exact Matcher

What it is: A hash table. One-to-one lookups at near-constant time.

Real-world analogy: A locker room where your locker number is calculated from your name. You don't search—you compute hash("John") = Locker 42 and go straight there.

When to use:

  • Equality comparisons ONLY
  • Columns with high cardinality (many unique values)
  • When you never need range queries or sorting

The catch: Hash indexes can't help with >, <, ORDER BY, or LIKE. They're specialists.

-- Perfect use case: UUID lookups
CREATE INDEX idx_users_uuid_hash 
ON users USING hash(uuid);

SELECT * FROM users WHERE uuid = 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11';

-- Benchmark from my tests:
-- B-tree:  0.042ms
-- Hash:    0.031ms
-- Gain: ~26% faster for exact matches
Enter fullscreen mode Exit fullscreen mode

When I use it: Session tokens, API keys, UUIDs—anything where I only ever check "does this exact value exist?"

When I don't: Pretty much everything else. B-tree is almost always better unless you're doing millions of equality checks per second.

Hash index gotcha:
Before PostgreSQL 10, hash indexes weren't WAL-logged (no crash recovery!). Now they're safe, but many devs still avoid them out of habit.


3. GIN: The Full-Text Search Wizard

What it is: Generalized Inverted Index. Think of it as an index of indexes.

Real-world analogy: A book's index page. Instead of organizing by page number, it organizes by every word, pointing to all pages where that word appears.

"PostgreSQL" → pages: 1, 15, 23, 47, 89
"index"      → pages: 1, 2, 3, 15, 16, 23...
Enter fullscreen mode Exit fullscreen mode

When to use:

  • JSONB queries
  • Array contains operations
  • Full-text search
  • Any scenario with "multi-valued" columns

The game-changer for my e-commerce project:

-- Product table with JSONB attributes
CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    name TEXT,
    attributes JSONB  -- {"color": "blue", "size": "L", "tags": ["cotton", "summer"]}
);

-- Before GIN: Sequential scan on 500K products = 8 seconds
SELECT * FROM products 
WHERE attributes @> '{"color": "blue"}';

-- The magic spell:
CREATE INDEX idx_products_attrs 
ON products USING gin(attributes);

-- After: 0.023 seconds 🪄
Enter fullscreen mode Exit fullscreen mode

GIN for arrays (my favorite trick):

CREATE TABLE posts (
    id BIGSERIAL PRIMARY KEY,
    tags TEXT[]
);

CREATE INDEX idx_posts_tags ON posts USING gin(tags);

-- Find posts with ANY of these tags
SELECT * FROM posts WHERE tags && ARRAY['postgresql', 'performance'];

-- Find posts with ALL these tags
SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'performance'];

-- Find posts with exactly this tag
SELECT * FROM posts WHERE tags @> ARRAY['postgresql'];
Enter fullscreen mode Exit fullscreen mode

Full-text search example:

CREATE TABLE articles (
    id BIGSERIAL PRIMARY KEY,
    title TEXT,
    content TEXT,
    search_vector tsvector GENERATED ALWAYS AS 
        (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(content, ''))) STORED
);

CREATE INDEX idx_articles_search ON articles USING gin(search_vector);

-- Natural language search
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, 
     plainto_tsquery('english', 'postgresql performance tuning') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

The trade-off:

  • GIN indexes are LARGE (2-3x data size)
  • Slower to INSERT/UPDATE (must update inverted index)
  • But for reads? Lightning fast
  • Use gin_pending_list_limit to balance write performance

4. GiST: The Geometric Genius

What it is: Generalized Search Tree. The Swiss Army knife for complex data types.

Real-world analogy: A map with zones. "Find all pizza places within 5 miles" doesn't require checking every restaurant—just the ones in nearby zones.

When to use:

  • Geometric data (PostGIS)
  • Range types (int4range, tstzrange)
  • Full-text search (alternative to GIN)
  • Nearest-neighbor searches
  • Any custom data type with spatial/hierarchical properties

Real example: Building a delivery zone checker

CREATE EXTENSION IF NOT EXISTS postgis;

CREATE TABLE delivery_zones (
    id SERIAL PRIMARY KEY,
    restaurant_id INT,
    zone GEOMETRY(POLYGON, 4326)
);

CREATE INDEX idx_zones_geom 
ON delivery_zones USING gist(zone);

-- Find which restaurants deliver to a location (San Francisco)
SELECT restaurant_id 
FROM delivery_zones 
WHERE ST_Contains(
    zone, 
    ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)
);

-- Without GiST: checks every polygon = O(n)
-- With GiST: spatial pruning = O(log n)
Enter fullscreen mode Exit fullscreen mode

The time-range trick I use for booking systems:

CREATE TABLE bookings (
    id BIGSERIAL PRIMARY KEY,
    resource_id INT,
    time_range tstzrange  -- '[2024-11-25 14:00, 2024-11-25 16:00)'
);

CREATE INDEX idx_bookings_range 
ON bookings USING gist(time_range);

-- Find overlapping bookings (conflict detection)
SELECT * FROM bookings 
WHERE time_range && tstzrange('2024-11-25 15:00', '2024-11-25 17:00');

-- This is how calendar apps stay fast with thousands of events
Enter fullscreen mode Exit fullscreen mode

Nearest neighbor search (k-NN):

-- Find 5 closest coffee shops to your location
SELECT name, location <-> ST_MakePoint(-122.4194, 37.7749) AS distance
FROM coffee_shops
ORDER BY location <-> ST_MakePoint(-122.4194, 37.7749)
LIMIT 5;

-- The <-> operator with GiST makes this incredibly efficient
Enter fullscreen mode Exit fullscreen mode

GIN vs GiST for full-text search:

Feature GIN GiST
Query speed Faster (3x) Slower
Index size Larger (3x) Smaller
Update speed Slower Faster
Use when Read-heavy, fast searches Write-heavy, space-constrained

5. SP-GiST: The Hierarchical Organizer

What it is: Space-Partitioned GiST. For data with natural hierarchies or non-balanced structures.

Real-world analogy: A phone tree or quad-tree. Instead of balanced binary splits, it partitions space intelligently based on data distribution.

When to use:

  • IP addresses (inet types)
  • Phone numbers
  • Geographic quad-trees
  • Data with natural clustering

The IP address indexing revelation:

CREATE TABLE access_logs (
    id BIGSERIAL PRIMARY KEY,
    ip_address INET,
    accessed_at TIMESTAMPTZ,
    user_agent TEXT
);

-- SP-GiST is perfect for INET types
CREATE INDEX idx_logs_ip 
ON access_logs USING spgist(ip_address);

-- Find all requests from a subnet
SELECT COUNT(*) 
FROM access_logs 
WHERE ip_address << '192.168.1.0/24';

-- Check if IP is in specific ranges
SELECT * 
FROM access_logs 
WHERE ip_address <<= '10.0.0.0/8'  -- contained in or equal to
   OR ip_address <<= '172.16.0.0/12'
   OR ip_address <<= '192.168.0.0/16';
Enter fullscreen mode Exit fullscreen mode

Performance comparison on 10M rows:

-- No index: 8.2 seconds (seq scan)
-- B-tree:   1.1 seconds (not ideal for ranges)
-- GiST:     0.3 seconds (good)
-- SP-GiST:  0.08 seconds (excellent!) 🎯
Enter fullscreen mode Exit fullscreen mode

Text pattern matching (my secret weapon for autocomplete):

CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    sku TEXT
);

CREATE INDEX idx_products_sku_spgist 
ON products USING spgist(sku text_pattern_ops);

-- Lightning-fast prefix matching
SELECT sku 
FROM products 
WHERE sku ^@ 'ELEC';  -- starts with 'ELEC'

-- This beats LIKE 'ELEC%' on large datasets
Enter fullscreen mode Exit fullscreen mode

When to choose SP-GiST:

  • Your data has natural prefixes (IP addresses, phone numbers, hierarchical codes)
  • You need efficient prefix or suffix searches
  • Standard B-tree isn't cutting it for your specific data pattern

6. BRIN: The Time-Series Hero

What it is: Block Range Index. Instead of indexing every row, it indexes ranges of physical blocks.

Real-world analogy: Instead of cataloging every book individually, you label entire shelves: "Shelf A1 contains books from 1900-1950, Shelf A2 contains 1951-2000."

When to use:

  • Very large tables (1TB+)
  • Data that's naturally ordered (timestamps, auto-increment IDs)
  • When you prioritize tiny index size over maximum speed
  • Time-series data, log tables, append-only tables

The shocking size difference:

CREATE TABLE sensor_readings (
    id BIGSERIAL PRIMARY KEY,
    sensor_id INT,
    reading NUMERIC,
    recorded_at TIMESTAMPTZ DEFAULT NOW()
);

-- Insert 100 million rows (time-ordered)
-- Table size: 8.2 GB

-- B-tree index on recorded_at
CREATE INDEX idx_readings_time_btree ON sensor_readings(recorded_at);
-- Index size: 2.1 GB 😱

-- BRIN index on recorded_at
CREATE INDEX idx_readings_time_brin ON sensor_readings 
USING brin(recorded_at) WITH (pages_per_range = 128);
-- Index size: 2.3 MB 🤯

-- That's 900x smaller!
Enter fullscreen mode Exit fullscreen mode

Real-world performance:

-- Query: Get last hour of data
SELECT AVG(reading) 
FROM sensor_readings 
WHERE recorded_at > NOW() - INTERVAL '1 hour';

-- B-tree:  23ms
-- BRIN:    89ms
-- No index: 45 seconds

-- BRIN is 3.8x slower than B-tree but 500x faster than no index
-- And it takes up 0.03% of the space!
Enter fullscreen mode Exit fullscreen mode

The correlation requirement:

-- Check if your data is suitable for BRIN
SELECT 
    correlation 
FROM pg_stats 
WHERE tablename = 'sensor_readings' 
  AND attname = 'recorded_at';

-- correlation close to 1.0 or -1.0 = PERFECT for BRIN
-- correlation close to 0.0 = DON'T use BRIN (use B-tree)

-- Example results:
-- recorded_at:  0.99  ✅ (naturally ordered by time)
-- sensor_id:    0.03  ❌ (randomly distributed)
Enter fullscreen mode Exit fullscreen mode

Tuning BRIN indexes:

-- Smaller pages_per_range = larger index, faster queries
CREATE INDEX idx_brin_fine ON logs 
USING brin(created_at) WITH (pages_per_range = 32);

-- Larger pages_per_range = smaller index, slower queries
CREATE INDEX idx_brin_coarse ON logs 
USING brin(created_at) WITH (pages_per_range = 512);

-- Default is 128, which is usually perfect
Enter fullscreen mode Exit fullscreen mode

My BRIN use cases:

  • Application logs (append-only, timestamp-ordered)
  • IoT sensor data (millions of readings per day)
  • Financial transaction tables (always increasing IDs)
  • Archival data (historical records that never change)

The Decision Tree: Which Index Should You Use?

Start here
    ↓
Do you have time-series or append-only data that's physically ordered?
    ├─ YES → Consider BRIN (especially if table > 100GB)
    └─ NO → Continue
              ↓
Is it a geometry/geography column?
    ├─ YES → Use GiST (or PostGIS-specific indexes)
    └─ NO → Continue
              ↓
Is it JSONB, array, or full-text search?
    ├─ YES → Use GIN (primary choice) or GiST (if write-heavy)
    └─ NO → Continue
              ↓
Is it an IP address, phone number, or hierarchical text?
    ├─ YES → Consider SP-GiST
    └─ NO → Continue
              ↓
Do you ONLY need equality checks (=) and never ranges or sorting?
    ├─ YES → Consider HASH (minor speedup over B-tree)
    └─ NO → Use B-TREE (the reliable default)
Enter fullscreen mode Exit fullscreen mode

Advanced Techniques: Index Optimization Secrets

1. Partial Indexes: Index Only What Matters

-- Don't index all users—only active ones
CREATE INDEX idx_users_active_email 
ON users(email) 
WHERE status = 'active' AND deleted_at IS NULL;

-- 80% of queries filter for active users
-- Index is 80% smaller
-- Queries are just as fast
-- Win-win-win
Enter fullscreen mode Exit fullscreen mode

My favorite partial index trick:

-- Only index recent orders (last 90 days)
CREATE INDEX idx_orders_recent 
ON orders(created_at, customer_id) 
WHERE created_at > NOW() - INTERVAL '90 days';

-- Old orders are rarely queried
-- Index stays small and fast
-- Rebuild monthly to maintain 90-day window
Enter fullscreen mode Exit fullscreen mode

2. Expression Indexes: When Functions Attack

-- ❌ This won't use an index
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- ✅ Create an expression index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now it's blazing fast
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
Enter fullscreen mode Exit fullscreen mode

JSON path expression indexes:

CREATE INDEX idx_user_settings_theme 
ON users((settings->>'theme'));

SELECT * FROM users WHERE settings->>'theme' = 'dark';
-- Fast lookup on nested JSON field
Enter fullscreen mode Exit fullscreen mode

3. Covering Indexes: The Index-Only Scan

-- Query that needs user_id and email
SELECT user_id, email 
FROM users 
WHERE status = 'active';

-- Regular index: PostgreSQL must visit the table
CREATE INDEX idx_users_status ON users(status);

-- Covering index: Everything needed is IN the index
CREATE INDEX idx_users_status_covering 
ON users(status) INCLUDE (user_id, email);

-- Result: Index-only scan (no table access!)
-- 3-5x faster on large tables
Enter fullscreen mode Exit fullscreen mode

Check if you're getting index-only scans:

EXPLAIN (ANALYZE, BUFFERS) 
SELECT user_id, email 
FROM users 
WHERE status = 'active';

-- Look for:
-- Index Only Scan using idx_users_status_covering
-- Heap Fetches: 0  ← This is the magic number!
Enter fullscreen mode Exit fullscreen mode

4. Multi-Column Index Strategy

Left-to-right rule:

CREATE INDEX idx_events_multi ON events(user_id, event_type, created_at);

-- ✅ Uses index fully
WHERE user_id = 123 AND event_type = 'click' AND created_at > '2024-01-01'

-- ✅ Uses index partially (user_id, event_type)
WHERE user_id = 123 AND event_type = 'click'

-- ✅ Uses index minimally (user_id only)
WHERE user_id = 123

-- ❌ Can't use index (skips first column)
WHERE event_type = 'click' AND created_at > '2024-01-01'
Enter fullscreen mode Exit fullscreen mode

The cardinality trick: Put high-cardinality columns first (unless you have specific query patterns).

-- user_id has 1M unique values
-- status has 3 values ('active', 'pending', 'deleted')
-- created_at has 10M unique values

-- ✅ Best order for most queries
CREATE INDEX idx_best ON users(user_id, created_at, status);

-- ❌ Worst order (low cardinality first)
CREATE INDEX idx_worst ON users(status, user_id, created_at);
Enter fullscreen mode Exit fullscreen mode

Monitoring: Is Your Index Actually Being Used?

The Unused Index Detective

-- Find indexes that are never used
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Drop the dead weight
-- I found 47GB of unused indexes in one audit!
Enter fullscreen mode Exit fullscreen mode

Index Bloat Check

-- Check index health
SELECT 
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    ROUND(100.0 * idx_scan / NULLIF(idx_scan + seq_scan, 0), 2) AS index_usage_pct
FROM pg_stat_user_indexes
JOIN pg_stat_user_tables USING (relid)
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
Enter fullscreen mode Exit fullscreen mode

The EXPLAIN Decoder Ring

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM events WHERE user_id = 123;

-- What to look for:
-- "Index Scan" or "Index Only Scan" = Good ✅
-- "Seq Scan" = Bad (unless table is tiny) ❌
-- "Bitmap Index Scan" = Good for OR conditions ✅
-- "actual time" vs "planned time" = Accuracy check
-- "Buffers: shared read" = Disk I/O (lower is better)
Enter fullscreen mode Exit fullscreen mode

The Performance Paradox: When NOT to Index

Yes, you read that right. Sometimes indexes hurt.

1. Small Tables (< 10,000 rows)

-- Table with 5,000 rows
-- Sequential scan: 2ms
-- Index scan: 3ms (overhead of index lookup)
-- Winner: No index (also saves space)
Enter fullscreen mode Exit fullscreen mode

2. Low Cardinality Columns

-- Column with 2-3 unique values
CREATE TABLE orders (status VARCHAR(20)); -- 'pending', 'completed', 'cancelled'

-- ❌ Index is useless
-- PostgreSQL will seq scan anyway if > 5-10% of rows match
-- Save the disk space
Enter fullscreen mode Exit fullscreen mode

3. Write-Heavy Tables

-- Insert-intensive logging table
-- 10,000 inserts/second
-- Each insert updates 5 indexes
-- = 50,000 index writes/second
-- Database melts 🔥

-- Solution: Fewer indexes, or use BRIN
Enter fullscreen mode Exit fullscreen mode

4. The Composite Index Trap

-- ❌ Anti-pattern: Too many similar indexes
CREATE INDEX idx1 ON table(a);
CREATE INDEX idx2 ON table(a, b);
CREATE INDEX idx3 ON table(a, b, c);

-- ✅ Better: One composite index
CREATE INDEX idx_all ON table(a, b, c);
-- Can serve all three query patterns
Enter fullscreen mode Exit fullscreen mode

Real-World Battle Stories

Story 1: The JOIN That Ate My Weekend

Problem: Query joining 3 tables taking 45 seconds.

SELECT o.id, u.name, p.title
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at > '2024-01-01'
  AND u.country = 'US'
  AND p.category = 'electronics';
Enter fullscreen mode Exit fullscreen mode

The Fix:

-- Foreign keys had no indexes! 🤦
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);

-- Plus a composite for the WHERE clause
CREATE INDEX idx_orders_date_optimized 
ON orders(created_at) 
WHERE created_at > '2023-01-01';  -- Partial index!

-- Related table indexes
CREATE INDEX idx_users_country ON users(country) WHERE country IS NOT NULL;
CREATE INDEX idx_products_category ON products(category);

-- Result: 45 seconds → 0.3 seconds
Enter fullscreen mode Exit fullscreen mode

Lesson: Always index foreign keys. Always.

Story 2: The JSONB Query That Wouldn't Quit

Problem: Product search in e-commerce site timing out.

-- 2 million products, attributes stored as JSONB
SELECT * FROM products 
WHERE attributes @> '{"brand": "Apple", "category": "laptops"}';

-- Query time: 12 seconds
Enter fullscreen mode Exit fullscreen mode

The Fix:

-- GIN index with jsonb_path_ops
CREATE INDEX idx_products_attrs_pathops 
ON products USING gin(attributes jsonb_path_ops);

-- Result: 12 seconds → 0.15 seconds
Enter fullscreen mode Exit fullscreen mode

Bonus optimization:

-- For queries with many conditions, use expression index
CREATE INDEX idx_products_brand_category 
ON products((attributes->>'brand'), (attributes->>'category'));

-- Now these are even faster:
SELECT * FROM products 
WHERE attributes->>'brand' = 'Apple' 
  AND attributes->>'category' = 'laptops';
Enter fullscreen mode Exit fullscreen mode

Story 3: The Time-Series Table That Grew Too Big

Problem: 500GB sensor data table, indexes taking 200GB, queries getting slower.

The Fix:

-- Drop B-tree, add BRIN
DROP INDEX idx_readings_timestamp;
CREATE INDEX idx_readings_timestamp_brin 
ON sensor_readings USING brin(recorded_at);

-- Index size: 200GB → 180MB
-- Query time: 50ms → 200ms
-- Trade-off: 4x slower queries, 1000x smaller index
-- Verdict: WORTH IT (freed up 200GB!)
Enter fullscreen mode Exit fullscreen mode

The Maintenance Checklist

Daily

-- Check for long-running queries
SELECT 
    pid,
    now() - query_start AS duration,
    query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Weekly

-- Analyze statistics (helps query planner)
ANALYZE;

-- Check index bloat
SELECT 
    indexrelname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan < 50  -- Rarely used
ORDER BY pg_relation_size(indexrelid) DESC;
Enter fullscreen mode Exit fullscreen mode

Monthly

-- REINDEX if bloated (usually not needed)
REINDEX INDEX CONCURRENTLY idx_name;

-- Vacuum to reclaim space
VACUUM ANALYZE;
Enter fullscreen mode Exit fullscreen mode

Key Takeaways: Your Index Cheat Sheet

  1. B-tree (default): 95% of use cases. Equality, ranges, sorting.
  2. Hash: Equality only. Slightly faster than B-tree for simple lookups.
  3. GIN: JSONB, arrays, full-text. Large but fast for reads.
  4. GiST: Geometry, ranges, nearest-neighbor. Swiss Army knife.
  5. SP-GiST: IP addresses, hierarchical data, prefix matching.
  6. BRIN: Time-series, append-only. Tiny indexes for huge tables.

The Golden Rules:

  • Index foreign keys (ALWAYS)
  • Index columns in WHERE, JOIN, ORDER BY clauses
  • Use composite indexes for multi-column queries
  • Check correlation before using BRIN
  • Monitor with pg_stat_user_indexes
  • Don't over-index (write performance suffers)
  • Partial indexes for filtered queries
  • Expression indexes for functions
  • Covering indexes for index-only scans

Go Forth and Index Wisely

That 2 AM crisis taught me something profound: indexes aren't just about making queries fast. They're about understanding your data's story—how it's organized, how it's accessed, how it grows over time.

Every table is a library. Every query is a reader searching for knowledge. Your job? Build the catalog system that helps them find it instantly.


Further Reading:

Tools I Use:

Top comments (0)