Ken Whitesell

Posted on Feb 20, 2021

In Pursuit of PostgreSQL Performance

#postgres #django

(Or: How someone who doesn’t really know what they’re doing manages to improve performance of lengthy queries.)

I’m working on a system that has more than 200 tables and about 2 GB of data. The system is built using Django and relies heavily upon the ORM. We don’t write SQL – all our queries are built through Django querysets. Some of the queries are fairly complex, involving 10 or more tables.

About a month ago, my co-worker discovered that at least 5 of the more complex queries were taking longer to run on our test server than on his development laptop. That seemed odd to me, so I tried to recreate the situation – and I did. Those queries ran faster (generally about half the time) on each of our laptops than on the server available to us.

Table 1 – base line results
| Query   | server | laptop |
| -----   | ------:| ------:|
| Query 1 |    31s |    17s |
| Query 2 |    44s |    24s |
| Query 3 | 1m 47s |    46s |
| Query 4 |    28s |    17s |

And so the hunt began… Knowing that the amount of data was only going to continue to increase, we had to get the run times for the longer queries down into the neighborhood of our development environments.

First step was to get copies of the actual queries being run to completely isolate the time needed to run the queries from the overhead introduced by Python and the Django ORM. These are some fairly big queries – the generate SQL for the two largest queries are about 16,000 characters. (At a full 100 characters per line, that means that this would be a 160-line query – and that’s full lines. In reality, formatting these queries ends up being 200+ lines.)

Then it was time to check the basics. We’re working with identical copies of source data. We’re building the database using the same scripts. After the database is built, we’re manually running a vacuum, reindex, analyze. We’ve got enough memory on these systems to set the buffers such that they’re larger than the data being accessed.

Time to look at the differences between these systems.

Admittedly, my knowledge an understanding of PostgreSQL internals is minimal at best. I’m a lot more comfortable with my knowledge of operating systems and file systems. So I wanted to start with what I know.

Now, I’ll admit to being a bit of a Linux-bigot. I refused to believe a Windows laptop running PostgreSQL could be more than twice as fast as a server running Linux on hardware significantly better than those laptops. At this point I’m discarding the OS difference as a significant factor

The server is running PostgreSQL in a docker container with the data stored in a docker volume. Change to mapping the data directory to a mapped directory on the host. No change. Change to using a PostgreSQL instance running on the host. Still no difference.

This server is actually a virtual machine running in a VMWare ESXI. The server image is Ubuntu 20.04 with the disk mapped as an LVM volume. Could it be an issue with the multiple layers of disk mapping causing the problems? What about the selection of the disk interface? (ATA vs SCSI) So we allocated two new “physical” drives to that image, one with each of the available interfaces. I didn’t bother partitioning either drive, I created file systems directly on the devices. Created tablespaces on those file systems, created databases… and still no difference.

The next identified differences were the versions of PostgreSQL being used. My laptop is running PostgreSQL 12, my co-worker’s laptop is running PostgreSQL 10, the test server is running PostgreSQL 13. Ahhh… is there a problem with 13? Back the test server back to version 12 – no difference. Install and test with version 10 – difference! A significant difference (~ 20%), but doesn’t come close to accounting for the entire difference – and backing down from 13 to 10 is not an acceptable answer. But at least I’ve got some direction to go to continue my search.

The next stage of my quest lasted for the better part of a week. I knew I had to do a lot of reading to try and understand what might be happening. I started out by looking at all the release notes and other available docs to see what might be causing this. One of the changes I saw made references to the query planner and how it uses the various settings to improve the execution plan.

So my next step was to run “explain analyze” on my queries across the difference versions. Bingo! The slow queries on the server were running sequential scans on tables where the faster systems were using indexes. I also saw where the JIT was taking more than 30 seconds of the total run time.

More research, more ideas. There’s a PostgreSQL mailing list just for performance issues and there’s a wealth of information in the archives. Between the mailing list and the docs, I identified some settings that may have been relevant.

I started out with making some trivial changes (turning JIT off was one of them), and immediately saw some improvements. It still wasn’t an answer, but I was starting to feel confident that I was heading in the right direction.

For the record, here is the list of settings I decided to focus in on. (Note, this is after making all the appropriate changes to the various buffer and cache settings.)

seq_page_cost
random_page_cost
cpu_tuple_cost
cpu_index_tuple_cost
cpu_operator_cost
parallel_tuple_cost
parallel_setup_cost

At this point I was getting kinda lost. My first experiments were showing some improvements, but I wasn’t seeing any direct relationship between the changes I was making and the degree (or lack) of improvement. I know it’s due to a lack of understanding of exactly what these settings represent, but it makes it difficult to know exactly what changes I’m looking for.

So without knowing what else to try, I decided to rely upon some brute-force evaluation. I put together a script that would change settings, restart PostgreSQL, run the queries, save the results, and repeat this process. From there it’s an issue of looking at all the results.

One thing I realized rather quickly is that the number of combinations for these settings quickly multiply out of control. If I were to test 3 values for each of those variables independently, that produces 2,187 combinations. If each test takes 1 minute, then the complete set of tests would last more than 36 hours. If I tried to run 5 values for each of those 7 variables, it would take 54 days to complete.
I needed to come up with a different approach. I ran more limited tests with a larger number of combinations of each. For example, I ran 25 tests of different combinations of the seq_page_cost and random_page_cost, while keeping all other values the same. I then took my best combinations from than, and ran 125 tests for each of 5 different values for cpu_tuple_cost, cpu_index_tuple_cost, and cpu_operator_cost. Take the best results of that, then go back and rerun the 25 earlier tests. Repeat this process until the results as to what the best values should be, stabilize.

Many groups of these changes produce very minor overall effects. I’d get large groups of test results that are all within a tenth of a second for a 30-second query. However, the results would show large clusters around certain timing results, which I was happy to see – the big gaps occur when PostgreSQL changes a table access from an index search to a sequential scan.

And so we get to the punchline. After all this, the values we’ve currently settled on are:

seq_page_cost = 0.01
random_page_cost = 0.01
cpu_tuple_cost = 0.01
cpu_index_tuple_cost = 0.5
cpu_operator_cost = 0.25
parallel_tuple_cost = 1
parallel_setup_cost = 100

Note: My understanding is that these are all relative to each other, and that they all could be rescaled. However, that doesn’t appear to be the case. I tried multiplying all values by 100 to make them all integers, but the run times of the tests failed. I’m guessing there’s another setting I’ve missed that would also need to be adjusted.

Table 2 – final results
| Query   | server before | After | Laptop |
| ------  | -------------:| -----:|-------:|
| Query 1 |           31s |    4s |    17s |
| Query 2 |           44s |    5s |    24s |
| Query 3 |        1m 47s |   33s |    46s |
| Query 4 |           28s |    4s |    17s |

So not only is it better on the server now, it’s significantly better!

If there are any lessons that I learned from this, it would be:

I am extremely grateful to the PostgreSQL team for their documentation and their mailing lists. There's no way I could have gotten this far without those resources.
There's a lot of information available to help resolve performance issues - it may just require some digging and reading.
There are hardware configurations where the PostgreSQL defaults are not optimal for complex data relationships.

Top comments (11)

Tomas Vondra • Feb 22 '21

As for the cost parameters, you're right those are considered relative to each other. It's a good / recommended practice to keep seq_page_cost = 1.0, and treat everything else as a multiple of that. For example, to make using indexes more likely, you'd lower random_page_cost. The current parameters (both seq_page_cost and random_page_cost set to 0.01) essentially says that I/O is extremely cheap, i.e. that reading an 8kB page from disk is about 50x cheaper than parsing one row from the page (after it's read into memory). That seems rather strange, even if all the data fit into RAM.

It's not very clear to me what you mean when you say that "run times of the tests failed". What happened?

Perhaps a better approach would be to compare the good/bad query plan (as shown by explain analyze), and focus on the differences and the most expensive part.

Ken Whitesell • Feb 22 '21

"run time of the tests failed" = bad edit on my part, good catch on yours.
The run times failed to remain constant. I made the assumption that by multiplying all of the _cost variables by the same constant would have yielded consistent results - they did not do that. Multiplying all 7 values by 100 produced runtimes consistently worse than the original values.

"That seems rather strange, even if all the data fit into RAM."
I agree, they do to me, too. I make no pretense of understanding them. These are empirical results, and I accept them as such, recognizing that there are probably any number of conditions that could change, affecting these results. But for now, these settings produce the best identified results, so that's what we're going with.

"Perhaps a better approach would be to compare the good/bad query plan" -
For those versions where the differences were significant, I did - that's what got me started on this. It was noticing that there were sequential scans of tables on the server, where the laptop was doing index searches, that pretty much started all of the real research / experimentation.

"focus on the differences and the most expensive part." - I'm not sure I understand what that translates to in terms of configuration settings. Or, to be more precise, I lack the knowledge to be able to say that for query "X", I will get better results (shorter run times) when seq_page_cost = 0.01 and random_page_cost = 0.01 than I do for when seq_page_cost = 1 and random_page cost = 1, or when seq_page_cost = 0.01 and random_page_cost = 0.012. (Which is what I discovered through my tests.)

Tomas Vondra • Feb 23 '21

I'm not sure what exactly is the goal of multiplying all the cost variables by the same number, TBH. In principle, that should not change the planning decision because what matters is how do the costs assigned to different possible plans compare. And if you multiply everything by the same number, I'd expect that to be about the same. There may be some non-linear parts in the costing formular, so perhaps it's possible that (cost(P1) < cost(P2)) with the initial cost values, and (cost(P1) > cost(P2)) with the multiplied ones. But it's stange that it'd cause consistently worse planning decisions, but without seeing the before/after plans it's hard to say what's happening.

Regarding the "focus on the differences and the most expensive part" - well, it's hard to tune any system without at least some basic understanding of how it works. What I meant is that you can look at EXPLAIN ANALYZE of the query, look which part of the query takes most of the time, and then try doing something about it. Is the most expensive part a sequential scan? Well, let's try convincing the planner to use an index scan in some way - for simple queries you might try disabling seqscans entirely, and see what happens. Or maybe just lower random_page_cost a bit. It's also possible that the cardinality estimates are off for some reason, at which point the cost estimation is entirely unreliable. That's also visible in the explain analyze (as "rows").

Ken Whitesell • Feb 23 '21

"I'm not sure what exactly is the goal of multiplying all the cost variables by the same number, TBH."

It was what you mentioned in your earlier reply:
""It's a good / recommended practice to keep seq_page_cost = 1.0, and treat everything else as a multiple of that.""

So it was my attempt to "upscale" those values to base everything off of the 1.0 value for seq_page_cost. (What I might do this weekend is rerun my tests with seq_page_cost = 1.0 and run various combinations of other settings to see if I can get the same run times for those queries. At this point it's more academic than anything else, but I am curious about this.)

"Or maybe just lower random_page_cost a bit."
That's where I started with this - but it didn't go far enough. It was after doing that and digging into the output from explain analyze where I saw that JIT was taking 30 seconds on the longest-running query - assuming I was interpreting the output from it correctly, so my second change was disabling JIT.
Everything just kinda followed from that, as I realized just how little I knew and began exploring. (In the interests of trying to keep this post to a reasonable length, I skipped the details of all the blind-alleys that I dove into regarding other settings changes. I was running spot checks on every variable that I though might have been even remotely related to performance.)

"It's also possible that the cardinality estimates are off for some reason, at which point the cost estimation is entirely unreliable." - That was the reason for doing the analyze before running the queries. My understanding is that that was going update the statistics to provide for the most accurate estimates. And since the data was static for these tests, the numbers were not going to change.

Tomas Vondra • Feb 23 '21

JIT taking 30 seconds for a single query is quite unexpected / extreme. I'm sure we'd be interested in more details, so if you could send some details (PostgreSQL version, OS, llvm versions, query and explain plans) on pgsql-performance, that'd be helpful.

As for the estimates - doing analyze is good, but it's possible the estimates are still wrong even with fresh statistics. For example, there might be expressions that are difficult to estimate well. Joins are notoriously difficult to estimate well. Sometimes the ORM generates queries that are difficult to estimate, etc. So it's possible the estimates were off, confusing the planner enough to pick a poor plan. And setting the cost parameters simply confused it "in the opposite direction" enough to pick a better plan. It's impossible to say without looking at the explain analyze. Planner gets improved between versions, so maybe that's why you saw different behavior on different PostgreSQL versions.

What I'd suggest is picking one or two queries that perform particularly poorly, do explain analyze on them (both with the default and modified cost values, and maybe even on different PostgreSQL versions) and send that to pgsql-performance mailing list. If you ping me, I'll take a look.

Ken Whitesell • Feb 23 '21

Quick question on the logistics for this - the query is about 25K, and the explains are about 150K. Do you want these included in-line text within the email, as attachments, zip attached, or ??? (I don't want to spam the list or create problems for anyone, and I don't see anything in either wiki.postgresql.org/wiki/Guide_to_... or wiki.postgresql.org/wiki/Slow_Quer... that seems to address issues with this much information to supply - my apologies if it's there and I've just missed it.)
I do see the reference to explain.depesz.com, I can see where that will help...

Tomas Vondra • Feb 25 '21

Definitely as attachments, not in-line - the clients sometimes wrap / reflow the lines, making it impossible to read (particularly for such long queries / explains). Compressing it won't hurt, of course, although I'm pretty sure it'll be compressed during transfer anyway.

explain.depesz.com is an option too, and I've been recommending that in the past, but that only really addresses the explain plan, not the queries. So just send it as attachements, and the readers can always copy it into explain.depesz.com if they wish.

Bhupesh Varshney 👾 • Feb 20 '21

Oh my that's a significant improvement in performance
Thanks for writing this!

Also could you attach some potential resources as well ?

Ken Whitesell • Feb 20 '21

That’s a good idea! Thanks for the suggestion, I’ll see what I can pull together.

ShayHa • Feb 20 '21

I'd like to see some resources as well, that improvement is amazing.
Thank you for sharing.

Tiago Parreira • Feb 21 '21

Great write up, thanks a lot! I would also love some resources - would love to see the scripts used for running permutations of variables, sounds like something I would do, go automation! :)

View full discussion (11 comments)