Mustafa ERBAY

Posted on May 29 • Originally published at mustafaerbay.com.tr

Database Index Selection: Core Approaches for Performance

#tutorials #postgres #database #performance

Introduction: Why Database Index Selection Is So Important

When optimizing a production ERP system, the slow delivery reports were a serious problem. Analyzing the database queries, I saw certain tables performing full table scans. As the data volume grew, performance dropped to unacceptable levels. That’s when I realized that proper database index selection is not just an optimization trick—it’s the lifeline of the application.

Database indexes are used to speed up your queries. Think of them like the index at the back of a book; instead of reading the entire book to find a topic, you go straight to the relevant page. In databases, indexes let you locate the data you need as quickly as possible. However, choosing the wrong index can degrade performance and also slow down data‑writing operations (INSERT, UPDATE, DELETE). In this guide we’ll examine common index types, when to use them, and how they affect performance.

Core Index Types and Their Use Cases

There are many index types available in databases, but the most common are: B‑tree, Hash, GiST, GIN, and BRIN. Each has its own strengths and weaknesses. Picking the right index type directly influences query performance.

B‑tree Index

B‑tree is the most widely used index type in databases. It stores data in a sorted structure, making it highly effective for equality (=), less‑than (<), greater‑than (>), range (BETWEEN), and ordering (ORDER BY) operations. PostgreSQL’s default index type is B‑tree.

For example, imagine we create a B‑tree index on the email column of a users table. If we run a query like WHERE email = 'test@example.com', the database can locate the row directly via the index. Likewise, an ORDER BY registration_date request benefits from the index’s sorted nature, making the operation much faster.

-- Create a B‑tree index in PostgreSQL
CREATE INDEX idx_users_email ON users (email);

-- Example query that uses the index
SELECT * FROM users WHERE email = 'test@example.com';

-- Useful for ordering as well
SELECT user_id, registration_date
FROM users
ORDER BY registration_date DESC;

The downside of B‑tree indexes is that they are not optimized for complex data types (e.g., JSONB or full‑text search) or geometric types. In those scenarios other index types are more appropriate. Also, the index itself occupies disk space, and whenever rows are inserted, updated, or deleted the index must be updated too, which adds a modest write‑performance penalty.

Hash Index

Hash indexes transform the index key into a hash value using a specific function. That hash value points directly to the location of the data, making equality queries (=) extremely fast. They cannot be used for range queries or ordering because the hash function does not preserve order.

Suppose we create a hash index on the product_code column of a products table. A query like WHERE product_code = 'XYZ123' can be faster than even a B‑tree index. However, queries such as WHERE product_code LIKE 'XYZ%' or WHERE product_code > 'ABC' cannot be processed efficiently with a hash index.

-- Create a Hash index in PostgreSQL (B‑tree is usually preferred)
CREATE INDEX idx_products_code_hash ON products USING HASH (product_code);

-- Effective only for equality queries
SELECT * FROM products WHERE product_code = 'XYZ123';

While hash indexes can be useful in specific cases, B‑tree indexes are generally more flexible and performant for general‑purpose workloads. The biggest drawback of hash indexes is that they do not support range queries due to the random distribution of data. Additionally, hash collisions (different keys mapping to the same hash) can degrade performance.

ℹ️ Hash Index Disadvantage

Hash indexes support no queries other than equality comparisons. Therefore they cannot be used for range queries such as LIKE 'prefix%' or < , > , BETWEEN. In PostgreSQL, B‑tree indexes are usually a better choice because they support both equality and range queries.

Advanced Index Types: Solutions for Special Cases

When standard index types fall short, more specialized indexes come into play. These are optimized for particular data types or query patterns.

GiST (Generalized Search Tree) Index

GiST provides a generalized search structure for various data types. It is especially useful for geometric data, full‑text search, and hierarchical data. Many PostgreSQL extensions (e.g., PostGIS) rely on GiST indexes.

Imagine an application that works with a geographic information system (GIS) and needs to find all points within a certain radius. If we add a GiST index on the coordinates column (a geographic point type) of a locations table, we can use functions like ST_DWithin to perform the query very quickly.

-- Create a GiST index with the PostGIS extension
CREATE INDEX idx_locations_coordinates ON locations USING GIST (coordinates);

-- Query on geometric data
SELECT name
FROM locations
WHERE ST_DWithin(coordinates, ST_MakePoint(longitude, latitude), radius_in_meters);

GiST indexes enable efficient searching on complex data types, but they can consume more disk space than B‑tree indexes and may have a larger impact on write performance. Their effectiveness depends on the data type and how the index is configured.

GIN (Generalized Inverted Index) Index

GIN indexes are typically used for “multi‑value” data types such as arrays, JSONB documents, or full‑text search. A GIN index stores each unique element as a key and records which rows contain that element, allowing rapid retrieval of rows that contain a particular word or value.

Consider an e‑commerce site where we need to search product descriptions or tags. If we add a GIN index on the tags column (an array type) of a products table, we can quickly find all products with the tag 'red' or with the tag array ['electronics', 'gadget'].

-- Create a GIN index in PostgreSQL (for Array or JSONB)
CREATE INDEX idx_products_tags ON products USING GIN (tags);

-- Query on an array
SELECT * FROM products WHERE 'red' = ANY(tags);

-- Query on JSONB
CREATE INDEX idx_products_details ON products USING GIN (details); -- details is JSONB
SELECT * FROM products WHERE details @> '{"brand": "Acme"}';

Thanks to PostgreSQL’s JSONB support, searching complex JSON data becomes very efficient with GIN indexes. The downside is slower write performance and higher disk usage compared to B‑tree indexes, so they should be used cautiously on frequently updated data.

BRIN (Block Range Index) Index

BRIN indexes are designed for very large datasets and are extremely space‑efficient. They represent ranges of blocks and store information about whether values in those blocks satisfy a certain condition. If the data is physically ordered on disk (e.g., time‑series data), BRIN indexes can be very effective.

Imagine a time‑series database where data is collected daily and stored chronologically on disk. Adding a BRIN index on the timestamp column of a sensor_readings table allows the database to scan only the relevant data blocks when querying a specific time range.

-- Create a BRIN index in PostgreSQL
CREATE INDEX idx_sensor_readings_timestamp ON sensor_readings USING BRIN (timestamp);

-- Time‑range query
SELECT *
FROM sensor_readings
WHERE timestamp BETWEEN '2026-05-28 00:00:00' AND '2026-05-28 23:59:59';

BRIN indexes save disk space on massive tables (billions of rows) when the data is naturally ordered. However, they perform poorly on randomly distributed data or on tables that are updated frequently. Their effectiveness is directly tied to the physical layout of the data.

⚠️ Things to Consider When Using BRIN Indexes

BRIN indexes heavily depend on the physical ordering of data on disk. If the data is frequently inserted or deleted and does not remain orderly, BRIN indexes may fail to deliver the expected performance and can even be worse than a full table scan. Therefore, consider using BRIN indexes only when the data is naturally sorted and large in volume.

Index Selection Strategies: Finding the Right Index

Choosing the right index is a critical part of query performance optimization. It starts with understanding which queries run most often and selecting the index type that best supports those queries.

Query Analysis and Planning

The first step is to identify the most frequently executed and most time‑consuming queries. Tools like PostgreSQL’s pg_stat_statements help by showing which queries consume the most CPU, I/O, or execution time.

Suppose I discovered that the product listing pages of an e‑commerce platform generate the heaviest load. Those queries typically filter by product name, category, and price range, and they also sort results. In that case, creating a composite B‑tree index on category_id, name, and price in the products table makes sense.

-- Composite index to support common queries
CREATE INDEX idx_products_name_cat_price ON products (category_id, name, price);

-- Example query that can be optimized with the index
SELECT product_id, name, price
FROM products
WHERE category_id = 123 AND price BETWEEN 50 AND 100
ORDER BY name;

The order of columns in a composite index matters. The most selective column (the one with the most distinct values) should usually appear first. The query planner uses the first column to start filtering, then proceeds to the next, and so on. If your query uses only the later columns, the index may not be beneficial.

Index Type and Data Type Compatibility

The chosen index type must match the column’s data type. For full‑text search, a GIN index on a tsvector column is more appropriate than a B‑tree. For geometric data, GiST is preferred, while numeric or string data often works well with B‑tree.

Imagine a CRM application with a comments table that stores customer feedback. To search for specific keywords, we can add a tsvector column derived from comment_text and attach a GIN index for full‑text search.

-- Full‑text search with tsvector and GIN index
ALTER TABLE comments ADD COLUMN comment_tsv tsvector;
UPDATE comments SET comment_tsv = to_tsvector('turkish', comment_text); -- Turkish language

-- Automatic updates via trigger or function
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON comments FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(comment_tsv, 'pg_catalog.simple', comment_text);

-- GIN index creation
CREATE INDEX idx_comments_tsv ON comments USING GIN(comment_tsv);

-- Full‑text search query
SELECT * FROM comments WHERE comment_tsv @@ to_tsquery('turkish', 'product & suitable');

This approach provides advanced search capabilities on text data and is far more efficient than scanning the whole table. Aligning index type with data type maximizes index effectiveness.

Write Performance and Index Cost

Indexes improve read performance but add a cost to write operations (INSERT, UPDATE, DELETE). Every data modification requires the associated indexes to be updated. Therefore, it’s important to avoid over‑indexing tables that are written to frequently but read rarely.

Consider a logging system that ingests millions of rows per second. Adding separate indexes for every column would overload the database with index‑maintenance work, dramatically slowing down writes. In such scenarios, limiting indexes to columns actually used in queries—or opting for low‑cost indexes like BRIN for timestamp searches—makes more sense.

-- Reduce index overhead on a heavily written table
-- Use BRIN if queries are only on the timestamp
CREATE INDEX idx_logs_timestamp ON logs USING BRIN (log_timestamp);

-- Avoid adding indexes on other columns unless truly needed

Regularly identifying and dropping unused indexes is also essential. Views like pg_stat_user_indexes reveal how often each index is scanned. Removing rarely used indexes frees disk space and improves write performance.

Index Maintenance and Optimization

Even after indexes are created, they require ongoing maintenance to keep performance optimal. As data changes, index effectiveness can degrade.

Reindexing and Vacuuming

In PostgreSQL, after many UPDATE and DELETE operations, indexes can become “bloated,” reducing their efficiency. The REINDEX command rebuilds indexes to eliminate this bloat.

In a production ERP system I worked on, tables that store orders or inventory are updated frequently, and their indexes gradually slowed down. Rebuilding those indexes with REINDEX noticeably improved query times.

-- Rebuild a specific index
REINDEX INDEX idx_orders_customer_id;

-- Rebuild all indexes on a table
REINDEX TABLE orders;

PostgreSQL’s autovacuum mechanism automatically cleans up dead rows and updates statistics, helping to keep indexes and tables performant. However, in some cases manual VACUUM or VACUUM ANALYZE is needed—especially after heavy write bursts or after a manual REINDEX—to ensure the planner has up‑to‑date statistics.

Detecting and Dropping Unused Indexes

Over‑indexing can hurt performance. Identifying and removing indexes that are seldom used saves disk space and speeds up writes. The pg_stat_user_indexes view provides an idx_scan column indicating how many times each index has been read. Low values suggest candidates for removal.

In one project I found many stale indexes on old reporting tables that were never used. After dropping them, INSERT and UPDATE operations on those tables became roughly 15 % faster.

-- Query index usage statistics
SELECT
    schemaname,
    relname,
    indexrelname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM
    pg_stat_user_indexes
WHERE
    schemaname = 'public' -- specify your schema
ORDER BY
    idx_scan ASC, index_size DESC;

Review the results, verify that the application does not rely on the low‑usage indexes, and then drop them safely.

💡 Importance of Index Maintenance

If indexes are not maintained regularly, their performance can degrade over time. This can cause serious issues, especially in systems with heavy data traffic. Optimizing autovacuum settings, running manual REINDEX and VACUUM ANALYZE when needed, and periodically reviewing unused indexes are critical steps to keep database performance high.

Trade‑offs in Index Selection

Every index type carries a cost. Choosing an index usually means balancing read performance against write overhead.

B‑tree vs. Other Index Types

B‑tree indexes offer a solid general‑purpose balance. They support both equality and range queries. However, for searching large arrays or JSONB documents, GIN indexes are far more efficient. Likewise, for time‑series data with natural ordering, BRIN indexes provide substantial disk‑space savings. The right choice depends on your query patterns and data types.

Single‑Column vs. Composite Indexes

Single‑column indexes target one column. Composite indexes cover multiple columns. If your queries often filter on several columns (e.g., WHERE col1 = 'A' AND col2 = 'B'), a composite index can be more efficient. But if the first column of a composite index isn’t used, the whole index becomes ineffective. Therefore, ordering columns correctly in composite indexes is crucial.

For instance, if you frequently search users by both last_name and first_name, a composite index on (last_name, first_name) is more effective than two separate single‑column indexes. However, if you only ever query by first_name, the leading last_name column renders the composite index largely useless.

Index Cost and Application Performance

Each index occupies disk space and must be updated on data changes, slowing down INSERT, UPDATE, and DELETE operations. Hence, rather than indexing every column, create indexes only where performance analysis shows a clear benefit.

When I built the backend for a mobile app, adding too many indexes to the activity‑log table (which receives ~50 million rows daily) severely impacted write latency. Limiting indexes to the columns actually needed for search or analysis restored overall application performance.

Conclusion: Thoughtful Index Selection and Ongoing Optimization

Database index selection is one of the most fundamental and effective ways to optimize database performance. Picking the right index type allows the query planner to retrieve data in the fastest possible way. B‑tree, Hash, GiST, GIN, and BRIN each serve different data structures and query patterns.

Remember that indexes are not a magic wand. Every index incurs a cost, especially on write performance. Therefore, when selecting indexes, perform query analysis, consider data types, and balance read versus write workloads.

Finally, indexes are not static objects. As data evolves and query patterns change, indexes must be revisited, tuned, and maintained. Regularly cleaning up unused indexes, rebuilding fragmented ones, and vacuuming keep your database performing at its best over time. This continuous optimization is essential for scalability and user satisfaction.

DEV Community