Helen Anderson

Posted on Nov 27, 2018 • Updated on Feb 9, 2021 • Originally published at helenanderson.co.nz

Speed up your queries with indexes

#sql #data #database #beginners

The classic analogy for indexes goes ... databases are like libraries.

Tables are like books stored in a library.

Rows are stored on pages of a book.

Flipping through a textbook page by page looking for that one page you need is going to take time. The same way scanning millions of rows is going be time-consuming and tedious. That's where indexes come in.

Why do I need an index?
Are there different types of indexes?
Where can I find my index once it's created?
What are good candidates for indexes?
Can't I do this later?
Do I need to do this at all?
How many indexes are too many?
This is no good to me, I use Oracle
Why didn't you explain everything about b-trees?
Is this the answer to all my performance problems?

Why do I need an index?

Indexes speed up performance by either ordering the data on disk so it's quicker to find your result or telling the SQL engine where to go to find your data. If you don't apply an index, the SQL engine will scan through every row one by one.

While this isn't necessarily a bad thing, as your database grows things could start to slow down.

Are there different types of indexes?

There are two main types in SQL Server:

Clustered Index - the contents page

Physically arranges the data on disk in a way that makes it faster to get to.
You can only apply one per table because the data can only be ordered one way.

create clustered index [id_idx] --name of the index
on [dbo].[actor_registration](actor_id)

Non-clustered Index - the index at the back of a book.

These create a lookup that points to where the data is.
You can create up to 999 but as each index carries overhead and maintenance, you'll probably want to stick to just a few.

create nonclustered index [last_name_idx] --name of the index
on [dbo].[actor_registration](last_name)

Where can I find my index once it's created?

You can find the indexes on a table by expanding the table where the index is, then expanding indexes.

This is also where you can create an index using the wizard. In the example below the option for clustered index is now greyed out because we have one on this table already.

What are good candidates for indexes?

ID columns, names, account numbers and others that have lots of changes. Ideally something unique, sequential that you are using in SELECTs and JOINs frequently.

Can't I do this later?

Sure, no problem. Sometimes it's good to get a handle on how you are querying the data and then add them later.

If you have no indexes on your table the data is stored in the order it comes in. This is called Heaped Storage and is effectively an expensive way of storing a spreadsheet.

Be aware indexes take time to apply if your tables are large by the time you get to this task it may take some time.

Do I need to do this at all?

There's no rule saying you should or shouldn't. The advantages of not adding indexes are that your INSERTs and UPDATEs will be faster and your database will be physically smaller.

If you do notice things getting slow, check out the Execution Plan for any suggestions and more information on where the effort is going to execute your query.

How many indexes are too many?

As always, it depends. Too many indexes may slow down performance. Once you've created an index for your Primary Key and Unique Keys it will be up to you, the Execution Plan and perhaps your friendly DBA as to what you do next.

This is no good to me, I use Oracle

Lucky for you there's a great resource for you right here on Dev.to

Why didn't you explain everything about b-trees?

Because there's already some excellent content here on Dev.to that goes further on this topic and I'm looking to provide a beginners overview.

Is this the answer to all my performance problems?

Indexes need maintenance. They may improve performance initially, but need to be reviewed, updated and maintained as your database grows. They aren't a 'set and forget' magic bullet and should be reviewed, and deleted, as your requirements change.

Your best bet at first is to use the Execution Plan to view its suggestions or ask your friendly DBA to lend a hand.

Let me know what other key concepts you think would be useful for complete beginners and junior data analysts.

SQL concepts from A to Z

Helen Anderson ・ Feb 6 '19

#data #database #sql #beginners

This post first appeared on helenanderson.co.nz

Top comments (37)

Niklas • Nov 27 '18

Keep in mind, that indexes are only useful if you have a high variance in your data, like username or firstname, lastname. Something that is limited in its variabce, like a enum field - category for example which holds only 5 possible categories isnt the best choice to add an index 😊👍

Andrey Alferov • Dec 2 '18

It's not always true. Selective index is good but if you not index on column in foreign key you may can get big problem with locks.

Niklas • Dec 2 '18

Can you explain this a little further? :)

Andrey Alferov • Dec 2 '18

Of course. For example table2 have column colF and foreign key to table1. If on colF not indexing and while table1 (any data) change we have exclusive table lock on table2. It's true for Oracle.
I have this problems many times.

Niklas • Dec 2 '18

Cool, thank for sharing! I guess its kind if database agnostic, good to know for oracle! ✌️😊

Helen Anderson • Dec 1 '18

Great advice! That’s certainly what I do in the real world and more often than not it will be two or three columns not just one

Hypertext • May 26 '22

yeah for that you can use look up table

William Gathoye • Nov 28 '18 • Edited

That execution plan sounds like a great feature especially as it is nicely expressed as time. This would allow me to check whether my schema is optimized for some SQL requests, or simply to see if the SQL request structures I use are not too time consuming (that would be fine for my lab assessment, good grades come in handy!).

Do you know if this is available in other DBMS? Even if you wrote Oracle users weren't lucky here, I tried to check for that DBMS (because we are forced to use Oracle at my university) and I wasn't able to find a solution as easy as with SQL Server (expressed as seconds). Any solution for MySQL/MariaDB? If that matter, I'm used to use the CLI, but also GUI tools like DBeaver and SQL Developer.

Kasey Speakman • Nov 28 '18

Been a while, but I used to get the query execution plan in MySQL by simply putting EXPLAIN in front of it. I believe the same keyword is used in Postgres as well.

Ashley Sheridan • Nov 30 '18

Presumably it's the same for MSSQL, but MySQL allows you to create indexes across multiple columns. This is very useful in a lot of situations where the identifying feature of the data is a combination of two or more things, e.g. user ID and a friend ID, where you'll likely have a row for each of the ID links.

Helen Anderson • Dec 1 '18

Nice catch! In the real world I’ll create indexes across more than one column, as you’ve described. But in the interests of simplicity kept it to just one in the examples

Daniel Golant • Dec 7 '18

Great post, Helen. Really missing Execution Plan now that I use Postgres. EXPLAIN is... fine, but EP was so nice.

Helen Anderson • Dec 7 '18

Good to know... I’m migrating to Postgres so it’ll be a bit of a change

Daniel Golant • Dec 7 '18

It’s more an aspect of your tools, I know data grip can do some impressive stuff, if you can afford it

Ben Greenberg • Nov 28 '18

This is great! One of the first things I did at my job was create indexes and that dramatically sped up our SSRS queries. It felt like a good low hanging fruit, which unfortunately gets overlooked.

Helen Anderson • Nov 28 '18

That’s great to hear. Did you have a friendly DBA to talk it through with?

Ben Greenberg • Nov 29 '18

In this case, I also became a bit of the friendly DBA. It's a small in house team, so everyone does a bit of everything!

Helen Anderson • Nov 29 '18

I feel like there are a lot of 'accidental DBAs' out there :)

Ben Greenberg • Nov 29 '18

That's probably true, and we're all just trying to be extra careful not to drop the database!

Helen Anderson • Dec 1 '18

Having all that power is a beautiful... and terrifying thing :)

ComputerSmiths • Nov 27 '18

So if I’m (typically) putting measurements in a database with the Unix epoch time, and then pulling out the last 24 hours to plot, do I want to add an index on epoch? Is there an SQL “query” that’ll do that? I’m mostly using mysql and mariadb on Raspberry Pi (Debian-ish) if it matters. Thanks!

Katie • Nov 27 '18

There's hardly anything to add to this awesome post, except maybe:

"I don't maintain the database; I just query it!" folks, don't be scared off!

Even if you don't have tools that show you the indexes, it's always good to ask your DBA (or some other expert) where the indexes are in tables you plan to join with SQL (or a drag-and-drop tool that generates SQL behind the scenes), so that you can potentially write faster queries.

I say "potentially" because the index won't always make your query run faster (it depends on the data and on your query). But that's the subject of a 3-hour university lecture and exercises doing "explain plan" with pencil and paper ... :-)

A good guess, if you're in a rush, is that the "primary key" for a table is often indexed to make "joining" it to other tables against "foreign keys" faster. In other words, join tables the way they seem intuitively related to each other, those joins may run reasonably quickly.

But ask your DBA or other database expert to be sure.

Helen Anderson • Nov 28 '18

Thanks Katie :)

Great advice to check in with your DBA before making those kinds of decisions. There's often a lot more going on behind the scenes and other indexes may have been applied for different reasons. An index may make things go faster, but there could be another way to solve the problem.