Mustafa ERBAY

Posted on Jun 7 • Originally published at mustafaerbay.com.tr

Database Index Selection: The Balance Between Performance and

#postgres #learning #productivity

Database Index Selection: The Balance Between Performance and Maintenance

While developing a production ERP system, the execution time of a report querying sales orders had me quite stuck. At first, I thought the database server's resources might be insufficient, but the EXPLAIN ANALYZE output told a different story. The query was performing unnecessary disk reads to access just a few rows out of millions. That's when I realized that database index selection isn't just a performance optimization; it's a critical architectural decision that directly impacts maintenance overhead and complexity. In this post, we'll discuss the commonly used index types in PostgreSQL, evaluate them in terms of performance and maintenance, and explore how to make the right decision.

Choosing the right index speeds up your queries while preventing them from unnecessarily consuming disk space and slowing down write operations (INSERT, UPDATE, DELETE). An incorrect index choice can lead to the opposite effects. Striking this balance is vital for the overall health of your system. Especially in systems dealing with large datasets, this fine-tuning can make a significant difference.

Why Do We Use Indexes? The Basic Logic

Database indexes are like the index at the end of a book. They allow you to quickly find a topic you're interested in, rather than reading the entire book from start to finish. In the database context, an index is a sorted copy of data on one or more columns. This structure allows the database engine to access relevant data much faster when searching for a specific value, instead of scanning the entire table. This dramatically improves performance, especially for queries with WHERE clauses, JOIN operations, and situations requiring sorting (ORDER BY).

Imagine you have a table with 100 million rows and you need to search by a specific user ID. Without an index, the database would have to check each of those 100 million rows one by one, every time. If this search is frequent, your system will slow down, and user experience will suffer. However, with an appropriate index, the database can complete this search in milliseconds. That's the power of an index.

ℹ️ The Cost of Indexes

Indexes don't just provide benefits; they also come with costs. Every index occupies disk space and needs to be updated with every data change (INSERT, UPDATE, DELETE). This update process, while improving query performance, slightly slows down write operations. Therefore, index selection is a trade-off: sacrificing a bit of write performance to enhance read performance.

B-tree Index: The Standard of Standards

The default and most commonly used index type in PostgreSQL is B-tree. B-tree indexes are highly efficient for equality (=), comparison (<, >, <=, >=), and range queries (BETWEEN). They can also be used to speed up ORDER BY and GROUP BY clauses. It's a good starting point for almost any type of query.

B-tree indexes store data in a balanced tree structure. This structure ensures that data insertion, deletion, and search operations occur with logarithmic complexity (O(log n)). This means performance remains relatively stable even as the amount of data increases. If your queries typically involve searching for a specific value, retrieving data within a certain range, or sorting results, B-tree is usually the best choice.

For example, defining a B-tree index on the user_id column in a user table will incredibly speed up queries like SELECT * FROM users WHERE user_id = 12345;. Similarly, range queries like SELECT * FROM orders WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31'; also benefit from this index.

Practical Use of B-tree Index

In a client project, we frequently queried the products table based on the category_id and price columns. Initially, we only added an index on the category_id column. Queries became faster, but conditions like WHERE category_id = 5 AND price > 100; were still slow. The EXPLAIN ANALYZE output showed that the database first filtered by category_id and then performed a table scan for price.

The solution was to use a composite B-tree index: CREATE INDEX idx_products_category_price ON products (category_id, price);. This index indexes both the category_id and price columns together. PostgreSQL can use this index to first filter by category_id and then perform a more precise search by price within that filtered set. This reduced query time from several seconds to milliseconds.

-- Composite B-tree index example
CREATE INDEX idx_products_category_price ON products (category_id, price);

-- Query benefiting from this index
SELECT product_name, price
FROM products
WHERE category_id = 5 AND price > 100;

GIN Index: A Powerful Tool for Full-Text Search and JSONB Data

GIN (Generalized Inverted Index) has a different structure than B-tree. It's designed for searching within complex data structures like full-text search, arrays, and JSONB data types. GIN indexes store the unique keywords or values for each document or record, allowing for quick identification of which document contains which keyword.

If your application requires users to perform free-text searches within product descriptions, blog posts, or comments, GIN indexes are your perfect fit. When used with PostgreSQL's tsvector and tsquery types, you can build an incredibly powerful full-text search engine.

GIN indexes are also a lifesaver when working with JSONB data. They can be used to search for a value of a specific key within a JSONB column or to query for a specific key-value pair. This significantly improves query performance when working with unstructured or semi-structured data.

⚠️ GIN Index Maintenance Overhead

GIN indexes generally consume more disk space and can be more costly for update operations compared to B-tree indexes. This is because they need to index all unique keywords or values within each document or record. Therefore, it's important to use GIN indexes only when truly necessary and configure them correctly. The fastupdate option in GIN indexes, while improving update performance, may require a VACUUM operation for the index to be fully updated.

GIN Index Usage with JSONB

For an e-commerce site's product catalog, we stored product features in JSONB format. We had a JSONB column named features in the products table, and we needed to search based on features within this column, such as color, size, and material.

Initially, there was no index on this JSONB column. Queries like SELECT * FROM products WHERE features @> '{"material": "cotton"}'; scanned the entire table. To fix this, we added a GIN index:

-- GIN index for JSONB column
CREATE INDEX idx_products_features ON products USING GIN (features);

-- Faster query with this index
SELECT product_name
FROM products
WHERE features @> '{"material": "cotton", "color": "blue"}';

Thanks to this index, finding products with specific materials and colors transformed from operations that took hours to queries that took mere milliseconds. GIN indexes make a big difference, especially when using JSONB operators like ?, ?|, ?&, and @>.

BRIN Index: A Memory-Friendly Option for Large, Sequential Datasets

BRIN (Block Range Index) takes a very different approach compared to B-tree and GIN indexes. BRIN indexes store the value range for each data block. That is, they record the minimum and maximum value for a specific column for each data block of the table. This keeps the index size much smaller compared to B-tree or GIN indexes.

BRIN indexes are most effective when data is physically sequential on disk. For example, they can yield excellent results in a time-series database or on indexes defined over a sequential column like created_at. If your data is physically sequential, when you query for data within a specific range, a BRIN index can tell the database which data blocks might be relevant, preventing unnecessary block reads.

The biggest advantage of BRIN indexes is their small size and relatively lower maintenance cost. However, their performance may not be as good as B-tree or GIN indexes when data is not physically sequential. Therefore, BRIN indexes are generally preferred for specific use cases.

💡 BRIN Index Efficiency

The efficiency of BRIN indexes depends on the pages_per_range parameter. This parameter determines how many pages of information the index will store for each block range. Smaller pages_per_range values provide more precise searching but increase index size. Larger values reduce index size but can decrease search accuracy. The default value (pages_per_range = 128) is usually a good starting point.

BRIN Index Usage with Time-Series Data

I have a side project that analyzes financial data stored on my own servers. This data contains millions of time-stamped records, increasing daily. I need to query data within a specific time range based on the timestamp column.

For this scenario, a BRIN index turned out to be an ideal solution. With the command CREATE INDEX idx_financial_data_timestamp ON financial_data USING BRIN (timestamp);, I created the index. This index allows me to know which time range each data block belongs to. Thus, queries like SELECT * FROM financial_data WHERE timestamp BETWEEN '2026-06-01' AND '2026-06-02'; can be completed by reading only the relevant data blocks. Thanks to the BRIN index, I can meet my performance expectations even while managing thousands of GBs of data.

-- BRIN index for time-series data
CREATE INDEX idx_financial_data_timestamp ON financial_data USING BRIN (timestamp);

-- Query for a specific time range
SELECT *
FROM financial_data
WHERE timestamp BETWEEN '2026-06-01 10:00:00' AND '2026-06-01 11:00:00';

Considerations and Trade-offs in Index Selection

Choosing the right index is not just about knowing which index type provides the fastest query. It also requires considering factors like write performance, disk space usage, ease of maintenance, and the overall trends of your query patterns. Each index type has its own advantages and disadvantages.

B-tree: Best for general-purpose use. Efficient for equality, comparison, and range queries. However, it's not ideal for full-text search or complex data structures.
GIN: Powerful for data types like full-text search, arrays, and JSONB. However, it may have slower write performance and require more disk space compared to B-tree.
BRIN: Memory-friendly and fast for large, sequential datasets. However, its performance can degrade when data is not physically sequential.

In one project, the users table involved both equality queries on the email column and range queries on the registration_date column. When deciding which index to choose, I evaluated the following trade-offs:

Single B-tree index (email, registration_date): Works well for both equality and range queries. However, for searches based on registration_date, not using email as the first filtering condition could reduce performance.
Separate B-tree indexes (email and registration_date): The email index for equality queries and the registration_date index for range queries would each work very fast. However, two indexes would consume more disk space and double the write operations' slowdown.

Ultimately, considering that users typically log in with their email and rarely search by registration date, I decided to define a B-tree index only on the email column. Queries on registration_date were still completed within an acceptable time, which was a more logical choice in terms of disk space and write performance balance.

Nuances of Managing Indexes

Every index adds an extra load to your system. Therefore, it's important to create only the indexes you truly need and regularly clean up unused indexes. PostgreSQL's pg_stat_user_indexes view allows you to see how frequently indexes are used. The idx_scan column shows how many times the index has been scanned. If the idx_scan value for an index is very low, you might consider removing it.

-- To find unused or rarely used indexes
SELECT
    schemaname,
    relname,
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM
    pg_stat_user_indexes
ORDER BY
    idx_scan ASC;

Furthermore, using the EXPLAIN ANALYZE command to analyze your query plans and actual execution times is the most effective way to understand which indexes are working and which are not. Remember, index selection is a dynamic process; as your database's usage patterns change, you may need to revisit your index strategy.

Conclusion: Smart Choices, Healthy Systems

Database index selection is one of the cornerstones of building performant and sustainable systems. Understanding the capabilities and limitations of different index types like B-tree, GIN, and BRIN allows you to make the right decisions. By carefully evaluating your query structures, data access patterns, and trade-offs, you not only enhance read performance but also optimize the overall health and maintenance of your database.

Remember, the best index is the least index. Every index has a cost. Therefore, analyze your needs correctly before creating an index and regularly review the indexes you've created. This approach will ensure your systems remain fast, stable, and manageable in the long run.

DEV Community