Hello, I'm Shrijith Venkatramana. I’m building LiveReview, a private AI code review tool that runs on your LLM key (OpenAI, Gemini, etc.) with flat, no-seat pricing -- built for small teams. Do check it out and give it a try!
If you've ever stared at a slow query in PostgreSQL and wondered why it's taking forever, you're not alone. Indexes are like the secret sauce that can speed things up dramatically, but getting them right takes some know-how. In this post, we'll dive into practical guidelines for using indexes effectively. We'll cover the basics, types, when to use them, and some real-world examples to make your database hum. Let's get started.
Why Indexes Speed Up Your Queries
Indexes in PostgreSQL work by creating a sorted structure that points to your data, much like a book's index helps you find pages quickly. Without them, PostgreSQL does a full table scan, which gets painful with large datasets.
Key benefit: Queries can skip scanning every row, reducing I/O and CPU usage.
For example, imagine a table with millions of users. Searching for a specific email without an index means checking every record. With an index, it's almost instant.
To see this in action, let's create a simple table and test query performance.
-- Create a sample table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255),
name VARCHAR(100)
);
-- Insert some data (run this in a loop or use generate_series for more rows)
INSERT INTO users (email, name)
SELECT 'user' || i || '@example.com', 'User ' || i
FROM generate_series(1, 100000) AS i;
-- Query without index
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user50000@example.com';
-- Output: Seq Scan on users (cost=0.00..1775.00 rows=1 width=36) (actual time=0.015..12.345 rows=1 loops=1)
-- Planning Time: 0.050 ms
-- Execution Time: 12.400 ms
-- Now add an index
CREATE INDEX idx_users_email ON users(email);
-- Query with index
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user50000@example.com';
-- Output: Index Scan using idx_users_email on users (cost=0.29..8.31 rows=1 width=36) (actual time=0.010..0.010 rows=1 loops=1)
-- Planning Time: 0.100 ms
-- Execution Time: 0.020 ms
Notice how the execution time drops? That's the power of indexes. For more on query planning, check the PostgreSQL EXPLAIN docs.
Exploring PostgreSQL's Index Types
PostgreSQL offers several index types, each suited for different data and queries. The default is B-tree, but knowing the others helps you pick the right one.
Here's a quick table comparing the main types:
Index Type | Best For | Example Use Case |
---|---|---|
B-tree | Equality and range queries | Searching user IDs or dates |
Hash | Exact equality matches | Fast lookups on strings |
GiST | Geometric and full-text search | Spatial data like polygons |
GIN | Arrays and JSONB | Querying tags or document contents |
BRIN | Large, sorted tables | Time-series data with natural order |
Bold tip: Start with B-tree unless you have a specific need.
For a hash index example:
-- Create table with potential for hash index
CREATE TABLE products (
id SERIAL PRIMARY KEY,
sku VARCHAR(50)
);
-- Insert data
INSERT INTO products (sku)
SELECT 'SKU-' || i FROM generate_series(1, 10000) AS i;
-- Create hash index
CREATE INDEX idx_products_sku_hash ON products USING hash(sku);
-- Query using it
EXPLAIN ANALYZE SELECT * FROM products WHERE sku = 'SKU-5000';
-- Output: Index Scan using idx_products_sku_hash on products (cost=0.00..8.02 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=1)
-- Planning Time: 0.040 ms
-- Execution Time: 0.010 ms
Hash indexes are great for equality but don't support ranges. Dive deeper into index types in the PostgreSQL index docs.
Deciding When to Add an Index
Not every column needs an index—over-indexing can slow down writes because PostgreSQL updates indexes on every INSERT, UPDATE, or DELETE.
Rule of thumb: Index if a column is frequently used in WHERE, JOIN, or ORDER BY clauses, and the table has more than a few thousand rows.
Consider selectivity: If a column has mostly unique values (high selectivity), it's a good candidate. Low selectivity (like a boolean flag) might not help.
Example scenario: In an orders table, index the status column if you often query pending orders, but only if "pending" isn't the majority.
-- Sample orders table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER,
status VARCHAR(20),
created_at TIMESTAMP
);
-- Insert sample data
INSERT INTO orders (user_id, status, created_at)
SELECT i % 1000 + 1, CASE WHEN i % 10 = 0 THEN 'pending' ELSE 'completed' END, NOW() - INTERVAL '1 day' * (i % 365)
FROM generate_series(1, 50000) AS i;
-- Without index on status
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';
-- Output: Seq Scan (cost=0.00..900.00 rows=5000 width=40) (actual time=0.020..15.000 rows=5000 loops=1)
-- Add index
CREATE INDEX idx_orders_status ON orders(status);
-- With index
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';
-- Output: Bitmap Index Scan on idx_orders_status (cost=0.00..100.00 rows=5000 width=40) (actual time=0.500..1.000 rows=5000 loops=1)
The index shines when filtering a small percentage of rows. For selectivity details, see PostgreSQL's planner stats.
Selecting Columns for Effective Indexing
Focus on columns that appear in filters or joins. Prioritize foreign keys, as they're common in JOINs.
Important: Index columns with high cardinality (many unique values) first.
Avoid indexing columns that change often if writes are frequent, as it increases overhead.
In practice, for a logs table:
-- Logs table
CREATE TABLE logs (
id SERIAL PRIMARY KEY,
event_type VARCHAR(50),
timestamp TIMESTAMP,
user_id INTEGER
);
-- Insert data
INSERT INTO logs (event_type, timestamp, user_id)
SELECT 'event' || (i % 5 + 1), NOW() - INTERVAL '1 hour' * i, i % 1000 + 1
FROM generate_series(1, 100000) AS i;
-- Index on event_type and timestamp (common filter)
CREATE INDEX idx_logs_event_timestamp ON logs(event_type, timestamp);
-- Query example
EXPLAIN ANALYZE SELECT * FROM logs WHERE event_type = 'event3' AND timestamp > NOW() - INTERVAL '1 day';
-- Output: Index Scan using idx_logs_event_timestamp (cost=0.42..500.00 rows=1000 width=48) (actual time=0.100..2.000 rows= approx 1000 loops=1)
This multi-column index (more on that next) helps combined filters.
Mastering Multi-Column Indexes
When queries filter on multiple columns, a composite index can be more efficient than separate ones. PostgreSQL uses the leftmost columns first.
Design tip: Order columns by how often they're used in equality filters, then ranges.
For instance, if you query by user_id = X and created_at > Y, index (user_id, created_at).
-- Back to orders table from earlier
-- Assume we have the orders table populated
-- Create multi-column index
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
-- Query using both
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 500 AND created_at > NOW() - INTERVAL '30 days';
-- Output: Index Scan using idx_orders_user_created (cost=0.29..200.00 rows=50 width=40) (actual time=0.010..0.500 rows= approx 50 loops=1)
Note: This index also works for queries on just user_id, but not solely on created_at. Learn more about composite indexes in PostgreSQL's multicolumn index guide.
Using Partial Indexes for Targeted Optimization
Partial indexes only cover a subset of rows, saving space and speeding up maintenance. Use them when queries focus on specific conditions.
When to use: For columns where a WHERE clause is always present, like active = true.
Example: Index only active users.
-- Users table from first example, add active column
ALTER TABLE users ADD COLUMN active BOOLEAN DEFAULT true;
-- Update some to false
UPDATE users SET active = false WHERE id % 10 = 0;
-- Partial index
CREATE INDEX idx_users_email_active ON users(email) WHERE active = true;
-- Query
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user50000@example.com' AND active = true;
-- Output: Index Scan using idx_users_email_active (cost=0.15..8.17 rows=1 width=40) (actual time=0.005..0.005 rows=1 loops=1)
This index is smaller and faster to update. For partial index syntax, refer to PostgreSQL's partial indexes doc.
Handling Index Maintenance and Costs
Indexes aren't free—they consume storage and slow writes. Regularly vacuum and analyze tables to keep stats current.
Maintenance commands: Use VACUUM ANALYZE; for bloated indexes, REINDEX.
Monitor with pg_stat_user_indexes to see usage.
Storage example: A B-tree index might add 20-50% overhead depending on data.
-- Check index size
SELECT pg_size_pretty(pg_relation_size('idx_users_email')) AS index_size;
-- Output: e.g., '8192 KB'
-- Reindex if needed
REINDEX INDEX idx_users_email;
-- Analyze table
ANALYZE users;
Over time, watch for unused indexes with:
SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;
This helps drop dead weight. Check PostgreSQL vacuuming docs for automation tips.
Avoiding Common Indexing Mistakes
Many devs index everything, leading to bloat. Another pitfall: Forgetting that LIKE 'prefix%' uses indexes, but '%wildcard%' doesn't (unless using trigram indexes).
Common error: Indexing low-selectivity columns, like gender in a balanced dataset.
Fix: Use EXPLAIN to verify index usage.
Trigram example for fuzzy search:
-- Install extension if needed
CREATE EXTENSION pg_trgm;
-- Create trigram index
CREATE INDEX idx_users_name_trgm ON users USING gin(name gin_trgm_ops);
-- Query with wildcard
EXPLAIN ANALYZE SELECT * FROM users WHERE name ILIKE '%user 5%';
-- Output: Bitmap Index Scan on idx_users_name_trgm (cost=0.00..100.00 rows=100 width=36) (actual time=0.200..1.000 rows= approx 100 loops=1)
Without it, full scan. For advanced search, see PostgreSQL full-text search.
Monitoring and Refining Your Indexes
To keep things optimal, integrate index checks into your workflow. Use tools like pgBadger for query log analysis or extensions like pg_stat_statements.
Pro tip: Set up alerts for slow queries and review EXPLAIN outputs regularly.
In production, test index changes with pg_repack for reorganization without downtime.
Finally, remember: Indexes evolve with your data. Periodically run ANALYZE, drop unused ones, and experiment with hypothetical indexes using pg_stat_plans.
By following these guidelines, you'll see noticeable performance gains. Experiment in your own setup and measure— that's the best way to master this. Happy querying!
Top comments (0)