Discussion on: DEV Community Is Open-Sourcing Its Performance Metrics via Skylight

View post

Replies for: Hi Evaldas, great analysis, in reality though this query: Article. includes(:user). where(published: true). select(:published_at, :process...

Thank you! How would first query plan change if you were to create an index I've suggested? I've got difficult time reading explain as I'm used to sql server plans.

But I see there is a scan and two sort operations.

With suggested index I wanted to eliminate any kind of sort operations as they're quite heavy on any relational database.

Ben Halpern The DEV Team • Feb 8 '19

This is the part I'm not fully understanding. How is it that the same table gets hit so many times in a single transaction?

I believe you are looking at aggregates here, so these are all different types of queries based on different requests. This is one of the nuances of Skylight that are less intuitive IMO. A single transaction doesn't hit all of those (unless they all show 100% hits). @vaidehijoshi correct me if I'm wrong.

articles most probably needs an index on published

Are indexes on boolean columns that beneficial? I'm all for adding it if they are.

tags most probably needs an index on lower(name)

I think there is one already or am I missing something?

articles definitely needs an index on path

Yes, definitely.

All in all I still want to bring up the question of performance vs memory. My thought was that the includes(:user) is what is eating up a good deal of memory but is not necessarily killing the query. Its purpose here is to eager load user associations but in doing so without selecting columns, it loads the whole objects into memory, which can be costly.

Evaldas Buinauskas • Feb 8 '19

Queries that load users are perfectly fine, they use primary key to do so. Like you mentioned, I would just limit columns that it selects.

Now indexing a boolean column alone does not make much sense, but my intention was to create a filtered index (where published is true), the second column being published at with descending order meaning what within published articles, all of them will be already sorted in disk.

I have included rest of the columns that are in select clause so that database does not have to do extra lookups from the index to the actual table to get remaining columns.

Having such an index means it 100% satisfies published articles query.

rhymes • Feb 8 '19

Are indexes on boolean columns that beneficial? I'm all for adding it if they are.

The amount of "that" unfortunately needs to be measured. I don't think a full index would make a difference, but a partial one might (if published = false is significantly less used).

Something like:

create index articles_published on articles (published) where published;

I'm trying to generate a massive articles table to see if I can have PostgreSQL activate the index.

I think there is one already or am I missing something?

There is one, but it's on "name", not on lower(name):

PracticalDeveloper_development> \d+ tags;
...
    "index_tags_on_name" UNIQUE, btree (name)

with something like this:

create index tags_lower_name on tags(lower(name));

in reality this index might be ignored, I don't think the tags table is that big, PostgreSQL might do a sequential scan if it deems it faster

All in all I still want to bring up the question of performance vs memory. My thought was that the includes(:user) is what is eating up a good deal of memory but is not necessarily killing the query. Its purpose here is to eager load user associations but in doing so without selecting columns, it loads the whole objects into memory, which can be costly.

Have you tried the performance vs memory difference by forcing eager loading in the includes?

With:

Article.eager_load(:user).where(published: true).select(:published_at, :processed_html, :user_id, :organization_id, :title, :path).order("published_at DESC").page(2).per(12)