By examining the indexes on the table articles we can see that, although there's an index on published_at which helps the query, there is no index on the other important filtering field: published. Published articles are hit every single time they are queried from the DB (unpublished articles are visible only to their creator, which already is looking into a smallest sample of all articles, instead every public query filters on published = true).
My suggestion is to run EXPLAIN ANALYZE on the query, create an index, re-run the EXPLAIN ANALYZE afterwards and see what happens. The estimated cost should lower.
Not being a join, there's no foreign keys involved. The other query ActiveRecord does involves only the primary key so there's not much that can be done on that, for now.
This dashboard has a sql query that probably lists columns from article table twice. It has "articles".* (all columns from articles table) and then lists more columns from the very same table.
This is the part I'm not fully understanding. How is it that the same table gets hit so many times in a single transaction? Is it because Skylight aggregates different paths/URLs in one single page? Otherwise it would be worrysome if a single page were to go back to the articles and users table so many times:
Also this uses lower function. Usually if SQL uses functions, these can result in table scans. Unless you store normalized tag version, index function expression or your database is not case sensitive.
Yeah, functions kill performance a little :D I suspect that is autogenerated by a library devto is using, it might require an investigation. There's an index on the tags table but it's not an index on lower which as you say will result in a table scan instead :(
I'm not sure if it has to select everything, but it also seems that there's no index on path column, hence - this query should scan the whole table
Yep, there's no index on articles.path either.
I'll recap:
articles most probably needs an index on published
tags most probably needs an index on lower(name)
articles definitely needs an index on path
All of these indexes can be tested on a copy of the production DB before.
These should help query performance, before diving in how to select less data or caching stuff or other tricks.
Thank you! How would first query plan change if you were to create an index I've suggested? I've got difficult time reading explain as I'm used to sql server plans.
But I see there is a scan and two sort operations.
With suggested index I wanted to eliminate any kind of sort operations as they're quite heavy on any relational database.
This is the part I'm not fully understanding. How is it that the same table gets hit so many times in a single transaction?
I believe you are looking at aggregates here, so these are all different types of queries based on different requests. This is one of the nuances of Skylight that are less intuitive IMO. A single transaction doesn't hit all of those (unless they all show 100% hits). @vaidehijoshi
correct me if I'm wrong.
articles most probably needs an index on published
Are indexes on boolean columns that beneficial? I'm all for adding it if they are.
tags most probably needs an index on lower(name)
I think there is one already or am I missing something?
articles definitely needs an index on path
Yes, definitely.
All in all I still want to bring up the question of performance vs memory. My thought was that the includes(:user) is what is eating up a good deal of memory but is not necessarily killing the query. Its purpose here is to eager load user associations but in doing so without selecting columns, it loads the whole objects into memory, which can be costly.
Queries that load users are perfectly fine, they use primary key to do so. Like you mentioned, I would just limit columns that it selects.
Now indexing a boolean column alone does not make much sense, but my intention was to create a filtered index (where published is true), the second column being published at with descending order meaning what within published articles, all of them will be already sorted in disk.
I have included rest of the columns that are in select clause so that database does not have to do extra lookups from the index to the actual table to get remaining columns.
Having such an index means it 100% satisfies published articles query.
Are indexes on boolean columns that beneficial? I'm all for adding it if they are.
The amount of "that" unfortunately needs to be measured. I don't think a full index would make a difference, but a partial one might (if published = false is significantly less used).
in reality this index might be ignored, I don't think the tags table is that big, PostgreSQL might do a sequential scan if it deems it faster
All in all I still want to bring up the question of performance vs memory. My thought was that the includes(:user) is what is eating up a good deal of memory but is not necessarily killing the query. Its purpose here is to eager load user associations but in doing so without selecting columns, it loads the whole objects into memory, which can be costly.
Have you tried the performance vs memory difference by forcing eager loading in the includes?
Hi Evaldas, great analysis, in reality though this query:
translates to the following two queries:
followed by a
SELECT *
for the users with those ids:(I'm running it on my local copy)
If we examine the query plan for the first one, we can see it's pretty basic:
Running
EXPLAIN ANALYZE
on the query itself doesn't yield much more info, because on a small subset is not a costly operation:By examining the indexes on the table
articles
we can see that, although there's an index onpublished_at
which helps the query, there is no index on the other important filtering field:published
. Published articles are hit every single time they are queried from the DB (unpublished articles are visible only to their creator, which already is looking into a smallest sample of all articles, instead every public query filters onpublished = true
).My suggestion is to run
EXPLAIN ANALYZE
on the query, create an index, re-run theEXPLAIN ANALYZE
afterwards and see what happens. The estimated cost should lower.Not being a join, there's no foreign keys involved. The other query ActiveRecord does involves only the primary key so there's not much that can be done on that, for now.
This is the part I'm not fully understanding. How is it that the same table gets hit so many times in a single transaction? Is it because Skylight aggregates different paths/URLs in one single page? Otherwise it would be worrysome if a single page were to go back to the
articles
andusers
table so many times:Yeah, functions kill performance a little :D I suspect that is autogenerated by a library devto is using, it might require an investigation. There's an index on the
tags
table but it's not an index onlower
which as you say will result in a table scan instead :(Yep, there's no index on
articles.path
either.I'll recap:
articles
most probably needs an index onpublished
tags
most probably needs an index onlower(name)
articles
definitely needs an index onpath
All of these indexes can be tested on a copy of the production DB before.
These should help query performance, before diving in how to select less data or caching stuff or other tricks.
Pinging @ben here :)
Thank you! How would first query plan change if you were to create an index I've suggested? I've got difficult time reading explain as I'm used to sql server plans.
But I see there is a scan and two sort operations.
With suggested index I wanted to eliminate any kind of sort operations as they're quite heavy on any relational database.
I believe you are looking at aggregates here, so these are all different types of queries based on different requests. This is one of the nuances of Skylight that are less intuitive IMO. A single transaction doesn't hit all of those (unless they all show 100% hits). @vaidehijoshi correct me if I'm wrong.
Are indexes on boolean columns that beneficial? I'm all for adding it if they are.
I think there is one already or am I missing something?
Yes, definitely.
All in all I still want to bring up the question of performance vs memory. My thought was that the
includes(:user)
is what is eating up a good deal of memory but is not necessarily killing the query. Its purpose here is to eager load user associations but in doing so without selecting columns, it loads the whole objects into memory, which can be costly.Queries that load users are perfectly fine, they use primary key to do so. Like you mentioned, I would just limit columns that it selects.
Now indexing a boolean column alone does not make much sense, but my intention was to create a filtered index (where published is true), the second column being published at with descending order meaning what within published articles, all of them will be already sorted in disk.
I have included rest of the columns that are in select clause so that database does not have to do extra lookups from the index to the actual table to get remaining columns.
Having such an index means it 100% satisfies published articles query.
The amount of "that" unfortunately needs to be measured. I don't think a full index would make a difference, but a partial one might (if
published
= false is significantly less used).Something like:
I'm trying to generate a massive
articles
table to see if I can have PostgreSQL activate the index.There is one, but it's on "name", not on lower(name):
with something like this:
in reality this index might be ignored, I don't think the tags table is that big, PostgreSQL might do a sequential scan if it deems it faster
Have you tried the performance vs memory difference by forcing eager loading in the
includes
?With: