Here on DealerOn's Team Mercury, we've had an exciting opportunity to build out entirely new systems in the last couple years, starting from scratch. This means that we've been very lucky to be a team that gets to work with exciting new technologies while a bunch of other chumps have to deal with the messes we left behind in the legacy systems (ha, suckers).
One of the exciting new technologies we get to work with is Azure SQL Database. Our reporting application is in sort of an unusual situation where we have to store a huge amount of data, and we want to be able to read it quickly in certain situations, but we can do almost all our writes off hours.
This is because we only care about most of this data on a daily timeframe. So while we have to import a lot of data, we run those imports between about midnight and 6 AM Eastern, while most users are sleeping. During the day, we turn around and read out this data to users as requested through our web API--this is when we start to really care about performance. It's far more of a big deal for a user to have to wait 10 seconds for an API call to return than it would be for an import to take 10 extra minutes to run at 3 AM.
What this means is that we've had to learn a lot about database optimization, and then how to apply what we've learned to our extremely specific use case. In this article, I'll focus on automatic tuning, and the performance recommendations that Azure regularly suggests for us.
Azure has three options for automatic tuning:
- FORCE PLAN (we have this turned on)
- CREATE INDEX (we have this turned off)
- DROP INDEX (we have this turned off)
I'll go into depth on these options, and why we chose to set them the way we did. Azure also offers "Performance Recommendations" that are basically suggestions to create or drop indexes if you won't let it do that automatically, and to parameterize your queries (which you should be doing all the time anyway). So if you don't set the tuning options to create/drop indexes automatically, you'll get suggestions from Azure to do them manually.
This is basically a magic button that makes your database go faster. Without going too far into depth, sometimes SQL Server tries to run a query using a cached execution plan that turns out to be sub-optimal, usually because the parameters have changed. When you have "Force Plan" turned on, it will automatically detect when you're using a bad plan, and switch to a faster one. We turned this setting on as part of an investigation into why some of our queries were sometimes much slower than they ought to be, and discovered that it made those queries and many others faster, often by orders of magnitude.
I still feel like this is kind of a hack--if SQL Server is smart enough to detect when it's using a bad plan, why couldn't it just use a good plan to start? But in our experience, turning this setting on made a lot of things faster and didn't slow anything noticeably down. If we wanted to see these kinds of gains manually, we'd have to roll up our sleeves and dig really deep into query plans and optimization, for just about every query that we run. While it's possible that doing that right could provide us with even more benefit, it would be a huge amount of work, and we'd likely have to keep repeating it for every new query we write, as well as checking back on old queries to make sure nothing's changed. So this option makes a lot of sense--it's one button to click that provides a major performance boost for almost no effort or drawbacks. I strongly recommend turning it on, and then bragging to your managers about your deep understanding of database optimization.
We decided to investigate the index suggestions that Azure comes up with, to evaluate whether to let it run them automatically. Overall, we found that it does often come up with good suggestions, but sometimes the suggestions it comes up with are wrong for our use case, and sometimes they're just wrong.
The right index can make a huge difference in query performance. However, every index brings performance drawbacks, so the traditional process of database tuning involves carefully considering the advantages of any new index against the drawbacks. Azure's automatic tuning claims to do the same, to the best of its ability; for example, it continuously monitors the automatic indexes it creates, and will remove them if it detects performance problems, though the criteria are not clear. However, one major drawback of the automatic tuning system is that it lacks any context of what's important to you as the user of the database. I've also found it sometimes recommends indexes that don't actually help, though I'd hope it will detect those and revert them.
To Azure, a query is a query. There's no way for our database to know whether a specific SELECT statement came from an API endpoint (which needs a response ASAP) or a daily scheduled marketing data export (which doesn't need to be nearly as fast). Therefore, some of the automatic indexes may optimize things that you didn't really need optimized in the first place, potentially at the cost of slowing down the things you do want to be fast. A cool feature for future versions (take note Microsoft) might be the ability to mark queries by priority level, which could tell Azure what to focus optimizations on, and maybe even let higher-priority queries take precendence in conflict situations.
Automatic index recommendations also seem to miss some broader context sometimes. One of the index suggestions I evaluated involved putting a non-clustered index on two columns, to help speed up a query that uses those columns in its WHERE clause. However, on investigation, I found that this exact query goes on to update one of the columns in this index. When I applied the index on a testing database, I found that it actually increased execution time on the query by up to 50%--the time taken to rebuild the index was more than the time saved by having that index in the first place! Obviously we discarded the suggestion, so we don't know what would have happened if automatic tuning had been enabled. Hopefully Azure would have detected the regression and quicky reverted the index, but this is a great example of why you should evaluate these suggestions yourself before accepting them.
Unfortunately, the process for evaluating tuning suggestions is kind of janky. When they show up in your recommendations, they only come with the table and columns suggested, not the query(s) which actually triggered the recommendation. This has been suggested, and ignored--a classic Microsoft move to get you to use their products without thinking too much about what's happening at a lower level. There may be a way to track through the system views to find the actual query text you want to see, but I'm not enough of a wizard to figure it out. So we've been doing it the old-fashioned way instead, by looking at Regressed Queries in SSMS.
Basically, any query that triggered a high-priority index recommendation is likely to be one of the top regressed queries. Queries in this view also include missing indexes in their details, so you can match those up with the indexes getting recommended in the Azure UI to trace which queries they should be helping. Once you know the actual problem that Azure has detected, you can take a step back and figure out how to solve it. Keep in mind that you may not need a new index to solve every performance problem, and with your broader human knowledge of the system, you might be able to find a fix that doesn't involve changing anything on the DB side. For example, how high priority is this query? Is there a way it could be rewritten to take advantage of existing indexes or other optimizations? Do you actually even need it at all? Sometimes we go investigate a query in our code, only to find out that we're not actually using its results anywhere.
If it still seems like an index might be helpful, you can test it. We use dev databases that are regularly refreshed by cloning the production databases, which generally make it very easy to test optimizations on. I normally start by running the offending query a few times with a few different parameters (if applicable) and recording those execution times, to get a baseline for performance. Then you can add your index and rerun the same queries as before, to get an accurate idea of the performance difference. If there's a noticeable improvement, you probably want to go ahead and create this index for real.
At DealerOn, we still create these indexes manually, instead of just accepting the suggestions through the Azure UI. This is so that we can integrate them into our normal workflow process, and so that we can give them descriptive names that fit into our own schema. We then monitor performance after they've been applied, to make sure they're actually helping out where they're supposed to, and to make sure they're not causing issues anywhere else.
The flip side of the CREATE INDEX automatic suggestions is that Azure will also monitor and detect when it thinks indexes are hurting your performance. We don't have much experience with this, because it's never given us this suggestion before--clearly all our indexes are doing great! However, if we ever see this suggestion show up, we'll follow a validation process like we do for the CREATE suggestions.
Since all our indexes are created manually with descriptive names, we should be able to trace them back to the associated JIRA ticket and find out when and why they were created. We can then determine whether this reasoning is still valid, or even whether it was valid in the first place. This may help us identify issues where we didn't properly account for a change we made, or with our original process that created the index, or even larger structural issues with our system design.
If you don't want to worry about that sort of stuff that's fine, and maybe you should turn on the CREATE and DROP INDEX options. In a normal scenario, they'll probably improve your overall database performance for a minimum of effort. But if you really care about optimization (like we do), or if some optimizations are more optimal than others (like in our use case), or even if you're just interested in knowing what SQL Server is doing at a lower level, I recommend you disable the automatic index tuning and instead start paying attention to the suggestions you get.