Team Timescale for Timescale

Posted on Mar 19 • Originally published at timescale.com on Mar 18

PostgreSQL Indexes for Columnstore: 1,185x Faster Lookup Queries, 224x Faster Inserts in TimescaleDB

#devops #news #database #postgres

TimescaleDB 2.18 adds PostgreSQL indexes to columnstore, speeding up lookups by 1,185x and inserts by 224x. Now you can have both fast analytics and quick data retrieval without compromise. 📄 Skip to the full article

TimescaleDB's hypercore engine made real-time analytics dramatically faster. This hybrid storage system already gives you the best of both worlds: a rowstore for rapid transactional data ingestion and a columnstore for blazing-fast analytics on compressed storage. Data automatically moves from rowstore to columnstore as it "cools," optimizing query performance and reducing your storage footprint.

But TimescaleDB 2.18 raises the bar even further with a game-changing feature: Early Access support for PostgreSQL indexes—B-tree and hash indexes—in the columnstore, delivering up to 1,185x faster record retrievals and 224x faster inserts. Most columnar databases can't do this.

Until now, developers faced a frustrating trade-off: when data moved to the columnstore, you lost PostgreSQL's powerful indexing capabilities. This loss slowed down critical operations like looking up specific financial transactions or updating historical IoT sensor data.

By bridging the gap between transactional and analytical performance, TimescaleDB continues to push the boundaries of PostgreSQL as the fastest application database for real-time analytics. It now combines high-performance columnar analytics with fast indexed lookups—without compromise.

What Are B-tree and Hash Indexes, and Why Are They Useful

Indexes are a fundamental part of database performance optimization. They allow queries to quickly locate and retrieve data without scanning entire tables. B-tree and hash indexes are among PostgreSQL’s most widely used index types, each designed for different queries.

B-tree indexes are the default and most versatile type of index in PostgreSQL. They keep data sorted in a hierarchical structure, making them ideal for queries that involve range lookups (>, <, BETWEEN) and equality lookups (=). When a query searches for a specific value or a range of values in an indexed column, the B-tree structure enables the database to quickly traverse the tree and find the relevant records in logarithmic time (O(log n)), significantly improving performance compared to a full table scan.

Hash indexes are designed for exact-match lookups (=) and use a hashing function to map values to unique disk locations. When searching for a single value—such as looking up a transaction by its transaction ID—hash indexes can be even faster than B-tree indexes as they don’t require tree traversal and have an amortized constant O(1) lookup. Their limitation is that they don’t support range queries, making them specifically suited for cases with frequently queried, unique keys.

Their performance advantage comes from their optimized data structure designed for efficient key searching. This results in fewer disk pages needing to be read, which reduces I/O spikes when locating specific data points or enforcing uniqueness. In some cases, decompression can be avoided altogether.

When are they useful?

These indexes are helpful, for example, in financial analytics, where firms continuously process vast amounts of market data, trades, and transactions in real time. Columnstores efficiently store and analyze years of historical price movements and trading patterns but would not perform well in other query patterns and would benefit from additional indexes that are common in rowstores.

For example, when looking up a specific trade by its unique trade ID, a hash index would provide the fastest results since it allows direct access to the data with minimal overhead.

You use a B-tree index to check unique constraints or for range-based filtering. For instance, a trading system might need to identify all trades with a trading volume greater than 10,000 to detect unusually large trades or analyze market liquidity. A B-tree index on trade volume would allow the database to quickly retrieve and filter the relevant rows without scanning all trades.

Our benchmarks demonstrate substantial performance improvements:

1,185x faster point lookup queries to retrieve a single record
224.3x faster inserts when checking unique constraints
2.6x faster upserts
4.5x faster range queries

By adding PostgreSQL’s indexing capabilities to our columnstore, point lookup queries are now 1,185 times faster in TimescaleDB.

TimescaleDB now delivers 224.3x faster inserts by adding a B-tree index on compressed data, according to our benchmarks.

How We Implemented B-tree and Hash Indexes in Hypercore’s Columnstore

To support indexes on hypercore’s columnstore data, we leveraged PostgreSQL’s support for pluggable index and table access methods. PostgreSQL has long offered multiple index types, such as B-tree (default), hash, GIN, and BRIN, all implemented as Index Access Methods (IAMs).

In PostgreSQL 12, a similar table access method interface (TAM) was introduced for table storage. This key technology enabled us to better integrate our columnar engine with PostgreSQL, support indexes on columnstore data, and deliver numerous other enhancements.

Since we developed hypercore’s columnstore before TAM’s introduction, we have had to retroactively adapt our implementation to incorporate this technology.

PostgreSQL architecture for table and index access methods (TAM and IAM, respectively).

As illustrated, the TAM hides the details of the storage implementation while presenting a standard interface to PostgreSQL. When the IAM builds the index, it uses the TAM API to access the data in the table. Since the IAM doesn’t know anything about how the data is stored on disk, the TAM provides an implementation-agnostic API for the IAM to speak to any table type.

Similarly, when the IAM performs an index scan, it looks up the key in the index data structure, and then finds the reference to the data in the table. It uses this reference to directly retrieve the full data record in the table via the TAM.

By default, PostgreSQL uses the heap TAM, optimized for row-oriented tables. TimescaleDB has used the heap TAM since its launch. TimescaleDB stores data in the rowstore in standard PostgreSQL row-oriented tables, so the heap TAM works out of the box. To make the heap TAM work with the columnstore, we use PostgreSQL’s TOAST to store columnar data as compressed arrays.

However, querying this “columnized” data returns compressed, opaque data. To support normal queries, TimescaleDB modifies the PostgreSQL query plan by adding a custom scan node (DecompressChunk) that decompresses the data on the fly. Unfortunately, indexing this compressed data through the heap TAM is not very useful, as it would index only the compressed values, not the original data.

TimescaleDB 2.18 changes the game by introducing PostgreSQL indexes support in columnstore with the help of a new table access method: hypercore TAM. The new table access method handles decompression behind the scenes, enabling PostgreSQL to use its standard interfaces for indexing, statistics collection, enforcing constraints, and locking tuples by reference.

This also allows PostgreSQL’s built-in scan nodes, such as sequential and index scans, to operate in the columnstore. Custom scan nodes continue to be used for analytical query performance optimizations, including vectorized filtering and aggregation.

Our Early Access release supports B-tree and hash indexes, making point lookups, upserts, and unique constraint enforcement more efficient on the columnstore. Read the documentation for more details on how these PostgreSQL indexes work and their recommended use cases.

How to Use PostgreSQL Indexes in the Columnstore

Enabling hypercore TAM

You need to enable hypercore TAM to use PostgreSQL indexes. To create a new table that leverages the hypercore TAM method to interface with the columnstore, you first convert the table to a hypertable, enable the columnstore with the hypercore TAM, and finally define a columnstore policy to automatically move chunks from the rowstore to the columnstore as they age.

create table readings (
    metric_uuid uuid default gen_random_uuid(),
    created_at timestamptz not null,
    uploaded_at timestamptz not null,
    location_id integer references locations (location_id),
    device_id integer references devices (device_id),
    temperature float,
    humidity float
);

-- convert the table to a hypertable
select create_hypertable (
       'readings',
       by_range('uploaded_at')
);

-- enable columnstore with hypercore TAM
alter table readings
set access method hypercore
set (
       timescaledb.orderby = 'created_at',
       timescaledb.segmentby = 'location_id'
);

-- move chunks older than 1 day from rowstore to columnstore
CALL add_columnstore_policy(
       readings,
       interval '1 day'
);

If you have an existing hypertable, you can set the hypercore table access method using:

ALTER TABLE readings
set access method hypercore

And you can go back to the standard heap table access method with:

ALTER TABLE readings
set access method heap

The table access method configured for the hypertable will only be applied to new chunks. To change the table access method for an existing chunk, you use the ALTER TABLE command on that specific chunk:

ALTER TABLE _timescaledb_internal._hyper_1_1_chunk SET ACCESS METHOD hypercore;

Creating B-tree and hash indexes in the columnstore

If your hypertable is using heap TAM, those indexes only cover data in the rowstore. You cannot index the data in the columnstore using the standard PostgreSQL IAMs.

If your hypertable uses hypercore TAM, those indexes will be rebuilt as chunks are converted from the rowstore to the columnstore and used by the PostgreSQL query planner when querying data in both the rowstore and columnstore. If you have existing chunks that have not been updated to use the hypercore TAM, you will need to do that before you can use B-tree and hash indexes when querying data in those chunks.

You add B-tree and hash indexes to a hypertable the same way you would do it in a regular PostgreSQL table.

-- hash index
CREATE INDEX readings_metric_uuid_hash_idx ON readings USING hash (metric_uuid);
-- B-tree index
CREATE UNIQUE INDEX readings_metric_uuid_metric_uuid_uploaded_at_idx
    ON readings (metric_uuid, uploaded_at);

Hypercore TAM With PostgreSQL Indexes in Action

To demonstrate the benefits of using indexes on columstore data, we have generated a 100 million-row dataset to represent an IoT application that tracks room temperature and humidity across multiple buildings. This is a relatively small data set but enough to demonstrate the benefits of using B-tree and hash indexes.

We stored the data in a readings hypertable and enabled the columnstore with and without using the new hypercore table access method to run our benchmarks:

create table readings (
    metric_uuid uuid default gen_random_uuid(),
    created_at timestamptz not null,
    uploaded_at timestamptz not null,
    location_id integer references locations (location_id),
    device_id integer references devices (device_id),
    temperature float,
    humidity float
);

-- convert the table to a hypertable
select create_hypertable (
       'readings',
       by_range('uploaded_at')
);

-- enable columnstore
alter table readings
set access method hypercore -- comment out to disable hypercore TAM
set (
       timescaledb.orderby = 'created_at',
       timescaledb.segmentby = 'location_id'
);

-- move chunks older than 1 day from rowstore to columnstore
CALL add_columnstore_policy(
       readings,
       interval '1 day'
);

Read our documentation on hypertables and hypercore to learn more about the different parameters used in this example.

Our benchmark ran queries 20 times against the readings table with and without the hypercore TAM and took the average of all executions.

Point lookups

Indexes are particularly useful for highly selective queries, such as retrieving a unique event by its identifier. For example:

SELECT 
    created_at,
    device_id,
    temperature
FROM readings 
WHERE metric_uuid = 'dd19f5d3-d04b-4afc-a78f-9b231fb29e52';

Without an index, the query requires scanning and filtering the entire dataset, leading to slow execution and high I/O usage. Timescale introduced min/max sparse indexes in version 2.15. They would help when using incremental numeric IDs but would not work well for random numeric values or IDs like UUIDs.

Instead, a hash index on metric_uuid allows for a direct lookup, significantly improving performance by decompressing only the relevant data segment.

CREATE INDEX readings_metric_uuid_hash_idx ON readings USING hash (metric_uuid);

The SELECT query above performed 1,185x faster when using the hypercore TAM with a hash index, coming in at 10.9 ms vs. 12,915 ms.

In this case, we used a hash index for the metric_uuid since it is optimized for equality comparisons. A B-tree index, while more versatile and supporting other queries, was slightly slower at 12.57 ms.

Backfill and updates to historical data

A very common use case in real-time applications is backfilling or updating old data.

If a sensor fails during a batch upload or gets temporarily disconnected, it may resend the data later. If you don’t check if the data already exists in the database before storing it, you could end up with duplicated records.

The backend database can enforce uniqueness using a unique index or primary key, such as <metric_uuid, uploaded_at>, to prevent duplicate entries. In this case, you need to include the hypertable partition key as well, since a UNIQUE constraint on a hypertable must include it. Below, we add a primary key to the table, which automatically creates a B-tree index.

ALTER TABLE readings ADD PRIMARY KEY (device_id, created_at);

An insert statement ensuring no duplicates might look like this:

INSERT INTO readings VALUES (...) ON CONFLICT (device_id, created_at) DO NOTHING;

Primary constraints are enforced through unique indexes, making conflict checks fast. Without an index, verifying uniqueness would require scanning and decompressing potentially large amounts of data, significantly slowing inserts and consuming excessive IOPS.

Our benchmarks showed 224.3x faster inserts, reducing the execution time from 289,139 ms to 1,289 ms.

Similarly, historical data updates are often performed as part of an ingestion pipeline using upserts. An upsert is a database operation that inserts a new row if it doesn’t exist or updates the existing row if a conflict occurs.

Doing this as part of the ingestion pipeline is convenient because it allows re-ingesting new versions of rows instead of performing separate update statements. Upserts are very frequent across Timescale Cloud’s customer base.

The query below attempts to insert a new record. If a record for the same metric_uuid and uploaded_at values already exists, it updates the temperature with the corresponding value from the new record.

INSERT INTO readings VALUES (...) ON CONFLICT (device_id, created_at) DO UPDATE SET temperature = EXCLUDED.temperature;

With a primary key index, conflicting rows are directly located within the compressed data segment in the columnstore. Without an index, the system would need to scan and decompress data, considerably slowing ingestion speed.

Our benchmarks showed 2.6x faster upserts, reducing the execution time from 24,805 ms to 9,520 ms.

Fast anomaly detection

Imagine you need to report at regular intervals how many times each device has exceeded a critical temperature. You can easily do this by counting the times the temperature exceeded and grouping by device ID.

SELECT
    device_id,
    COUNT(temperature)
FROM readings
WHERE temperature > 52.5 
GROUP BY device_id;

device_id | count  
-----------+-------
       68 | 1
      258 | 1
      192 | 1
      276 | 1
      114 | 1
      227 | 1
      153 | 1
      210 | 1
      266 | 1
      165 | 1
      296 | 1
      144 | 1
       93 | 1
      285 | 1
      221 | 1
      167 | 1
       14 | 1
      123 | 1
      152 | 1
      206 | 1
      230 | 1
      136 | 1
      256 | 2
        1 | 1
(24 rows)

We can create a partial B-tree index to make this query faster. A partial B-tree index is a B-tree index that only includes rows satisfying a specific WHERE condition, making it smaller and more efficient for queries that match that condition. Since, in this case, we only care about temperature values above 52.5, we would need to index those values:

CREATE INDEX ON readings (temperature) where temperature > 52.5;

With the existing columnstore sparse min/max index, the query takes 63.3 ms to execute and only 14 ms with the B-tree index above, so 4.5x faster.

When to Use (and When Not to Use) Indexes on Compressed Data

Adding B-tree and hash indexes to compressed data enables dramatically faster lookups and inserts, but it comes with a trade-off: increased storage usage due to additional indexing structures.

This feature can be particularly helpful if:

You need fast lookups on non-SEGMENTBY keys (e.g., querying specific records by UUID).
Query latency on compressed data is a bottleneck for your application.
You perform frequent updates to historical data and need efficient uniqueness enforcement.

Consider it carefully if:

Your queries already benefit from columnstore min/max indexes or SEGMENTBY optimizations.
Your workloads prioritize compression efficiency over lookup speed.
You primarily run aggregations and range scans, where indexes may not provide meaningful speedups.

Conclusion

Real-time analytics applications require more than fast inserts and analytical queries. They also need high performance when retrieving individual records, enforcing constraints, or performing upserts, something that OLAP/columnar databases lack.

TimescaleDB supports and accelerates real-time analytics using hypercore without missing out on important PostgreSQL features, including support for standard PostgreSQL indexes. We call hypercore a “hybrid” storage engine precisely because it supports deep analytics while staying true to PostgreSQL.

By adding full support for B-tree and hash indexes on columnstore data, developers can now perform point lookups 1,185x faster, enforce unique constraints with a performance boost of 224.3x faster inserts, and execute upserts 2.6x faster—all while maintaining columnstore compression and analytics performance.

PostgreSQL indexes via hypercore TAM are available in Early Access—check out the documentation to start using it today.

Stay tuned for more updates—there’s more to come. In the meantime, continue enjoying PostgreSQL, knowing that your analytical queries will be blazing fast and your storage footprint small. Why bolt together multiple databases, add lag and maintain endless ETL when you can do it all in TimescaleDB?

DEV Community