I had an expensive ~20 second query on a massive ~500 million row PostgreSQL table. I brought that query down to about 2 milliseconds. The quick answer is that there was a missing index. In this post, we'll explore how we came to that conclusion and how we determined what kind of index to add.
tl;dr: the query we will look at in this post was slow because of a missing index on a created_at
timestamp column. We tried separately adding a BRIN and a B-Tree index. When a BRIN index was added, there was no performance improvement. When a B-Tree index was added, however, we saw a huge performance improvement. The most expensive query dropped from ~20 seconds to ~2 milliseconds. A 10,000x speed up. The average timing of this query is now even a little lower than that.
Read on if you want more of the details.
Note: all the queries I'm about to cover are run against a locally-restored snapshot of production data. Take care when adding indexes like these in production. Do so concurrently
and be sure the monitor your app in the process.
The Data Model
First, let's look at a sufficient snapshot of the data model.
We have a massive events
table, it contains hundreds of millions of rows. Each of those events
is associated with a particular record in the users
table.
Users do things and those things are recorded as events in the events
table. Users do lots of things, so there are lots of events.
The Slow Query
The query that our monitoring software was flagging as slow was one where we were trying to find the two most recent events for a specific user.
select * from events
where user_id = 123
order by created_at desc
limit 2;
Some users have way more events than others, so the slowness of this query can vary pretty drastically. This particular user has over ~200,000 events. It's instances like this that revealed the bottleneck in this table's design.
Let's explain analyze
this query to better understand where the issue is.
explain analyze
select * from events
where user_id = 123
order by created_at desc
limit 2;
Running explain analyze
with this query gives us a really good idea of what is going on because it actually executes the query. Here is what the output of that looks like:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1066813.50..1066813.50 rows=2 width=114) (actual time=23207.023..23207.024 rows=2 loops=1)
-> Sort (cost=1066813.50..1067596.73 rows=313292 width=114) (actual time=23207.022..23207.022 rows=2 loops=1)
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on events (cost=5868.59..1063680.58 rows=313292 width=114) (actual time=74.682..23098.624 rows=222272 loops=1)
Recheck Cond: (user_id = 123)
Rows Removed by Index Recheck: 6448758
Heap Blocks: exact=67292 lossy=128418
-> Bitmap Index Scan on index_events_on_user_id (cost=0.00..5790.26 rows=313292 width=0) (actual time=57.349..57.349 rows=222272 loops=1)
Index Cond: (user_id = 123)
Planning time: 0.095 ms
Execution time: 23207.502 ms
(12 rows)
The headline number is the Execution time at the bottom of the output -- 23207.502 ms
. That's 23 seconds.
The thing I then want to note is where the bulk of those 23 seconds is going. Fortunately in this explain analyze
output, it pretty clearly stands out in a single area.
Notice this bit of the output where it is sorting the table based on created_at DESC
.
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on events (cost=5868.59..1063680.58 rows=313292 width=114) (actual time=74.682..23098.624 rows=222272 loops=1)
We can see the actual time
ranges from 74.682..23098.624
. That's where this query is spending the vast majority of its time.
Looking at the table indexes and the query we are running, I probably could have guessed the issue was a missing index on created_at
. I don't just want to add indexes haphazardly. I want numbers to back it up. This confirms that. We are on track to fixing this.
BRIN vs. B-Tree
So, we know we need an index on created_at
, but what kind of index? B-Tree is the default when the index type isn't specified. But I know I've heard of Postgres supporting other kinds of indexes. What about those?
Here is what the Crunchy Data blog has to say about the BRIN index.
PostgreSQL 9.5 introduced a feature called block range indexes (aka BRIN)
that is incredibly helpful in efficiently searching over large time series
data and has the benefit of taking up significantly less space on disk than a
standard B-tree index.
"Large time series data" is exactly what we are dealing with here, and an index that takes up less space on disk sounds great. So that seems worth a try. In fact, let's start there.
With BRIN Index
I'll start by adding the index.
create index index_events_on_created_at
on events using brin (created_at);
Note: Adding the BRIN index to a ~500 million row table took ~12 minutes on my dev machine.
Now I'll re-run the query from earlier still with explain analyze
:
> explain analyze select * from events where user_id = 123 order by created_at desc limit 2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1066813.50..1066813.50 rows=2 width=114) (actual time=23074.657..23074.658 rows=2 loops=1)
-> Sort (cost=1066813.50..1067596.73 rows=313292 width=114) (actual time=23074.655..23074.655 rows=2 loops=1)
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on events (cost=5868.59..1063680.58 rows=313292 width=114) (actual time=68.786..22975.087 rows=222272 loops=1)
Recheck Cond: (user_id = 123)
Rows Removed by Index Recheck: 6448758
Heap Blocks: exact=67292 lossy=128418
-> Bitmap Index Scan on index_events_on_user_id (cost=0.00..5790.26 rows=313292 width=0) (actual time=48.569..48.569 rows=222272 loops=1)
Index Cond: (user_id = 123)
Planning time: 2.097 ms
Execution time: 23075.473 ms
(12 rows)
Unfortunately, that output looks very similar to the one before we added an index. It's hard to know exactly why a BRIN index isn't conferring any benefit here. The important thing is that we measured. We know that at least for this data set, a BRIN index isn't the way forward.
Since we don't want the BRIN index, let's drop it and then give B-Tree a try.
drop index index_events_on_created_at;
With B-Tree Index
create index index_events_on_created_at
on events(created_at);
Note: Adding the B-Tree index to a ~500 million row table took ~8 minutes on my dev machine.
And again, we'll run our slow query and see how it performs.
> explain analyze select * from events where user_id = 123 order by created_at desc limit 2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..535.62 rows=2 width=114) (actual time=0.056..1.951 rows=2 loops=1)
-> Index Scan Backward using index_events_on_created_at on events (cost=0.57..83812631.72 rows=313292 width=114) (actual time=0.055..1.950 rows=2 loops=1)
Filter: (user_id = 123)
Rows Removed by Filter: 4819
Planning time: 0.116 ms
Execution time: 2.000 ms
(6 rows)
Besides the fact that the query returns immediately, I can tell by the change in shape and height of the query plan that we're on to something. The execution time has dropped to 2.000 ms
as well. The second line of the query plan shows that the new index is being used.
-> Index Scan Backward using index_events_on_created_at on events (cost=0.57..83812631.72 rows=313292 width=114) (actual time=0.055..1.950 rows=2 loops=1)
That's more like it. This backs up the decision to add a B-Tree index for the created_at
column in production. As I mentioned at the beginning of the post, adding an index like this in any production setting, but especially for a table
of this size, it is important to do so concurrently
and to monitor your app's health metrics during the process.
Cover image credit: Kiyoshi on Unsplash
Top comments (1)
As the query is "order by created_at desc" wouldn't make sense to have the index created in the same DESC order? Many times data is accessed in natural way meaning we need the latest records not the oldest yet indexes are created in default ASC and not DESC order.