1. Build indices based on the actual access patterns
For a first approach, sure, go with a naïve/best-effort approach and make an educated guess.
The moment you start getting real data of usage patterns, and which queries are bottlenecking your DB, use those to dictate what indices to create and which to drop. Indices are not free, so pick accordingly.
2. Indices are normally chosen based on selectivity
Where selectivity is just how many different values are there for a given column present in an index compared to the total number of rows in that table.
If 2 indices can be used for one query, the RDBMS (Relational Database Management System) will pick the most selective one.
3. Composite/Multi-column indices take into account column order
When building a multi-columnar index always take into account the order of the columns included.
ℹ️ The rules are:
1) Left to right, no skipping
2) Stop at the first range
Left to right, no skipping → If you create an index on columns A, B and C, you can use that index on a query that searches only in A, but not for one that searches in B or C, as they require searching for a value in A first. Even if you use column A and C in your query, only column A will be checked against the index because B isn’t included as well.
Stop at the first range → The moment one of the columns is checked for a range (≤, <, >, ≥, etc) then the rest of the columns will not be checked against the index.
A good rule of thumb is to create the index with the most selective columns first, but always tailor it to access patterns as mentioned above!
4. Covering indices
We say an index is covering when it can fulfil an entire query all by itself, without having to reference back to the original table/index for more data.
They are not a separate type per se, but rather a quality any given index may have for a query. Aim to provide them for your most demanding queries so long as the extra space of dedicated indices is worth it.
5. Careful with functions and indices (index obfuscation)
The moment one of the columns in the index is placed inside a function, said index cannot be used for it, as it will have been transformed and thus, obfuscated. You may need to be creative for this, such as doing
WHERE date_created BETWEEN '2023-01-01' AND '2023-12-31'
instead of
WHERE YEAR(date_created) = 2023
They both provide the same results, yet the first one would be able to use an index in column date_created
, whereas the second wouldn't.
Note: Some RDBMSs allow indexing functions, but as far as I know, they are not best practices since they serve one very particular case and are less flexible. However, they are still an option if your usecase can benefit from them.
And there it is! 5 simple and quick tip/tidbits you now know that make you far better at DB Indexing. Hope you enjoyed the article, but if you didn't, be sure to let me know why so I can improve in the future 😉
Massive thanks to the amazing course the people at PlanetScale have made for the vast majority of information presented here. I'm thus far loving it, and I greatly recommend you check it out!
Top comments (0)