SQL Indexes 101

#database #sql

If you've ever written an app or a website that's not static, chances are you've already worked with SQL either by directly writing SQL or through an ORM. Databases play a huge role in most apps and they can easily be the bottleneck of your app if you're not careful. That's why optimizing database performance is essential. One way of optimizing SQL database is by creating indexes. But first, let me briefly explain how SQL works.

How SQL works:

SQL stores data on disk by dividing disk space into 8KB logical pages (logical means that the space is not physically divided). when inserting data, SQL adds table rows to the logical pages.

Then, to query the database, SQL uses a cost-based query optimizer to calculate all the possible methods of getting the required data (without executing them) and then executes the method with the least cost. These methods can be either By scanning all rows of the table or by using an index. Which brings us to our topic.

SQL Index:

An index in SQL is a structure that uses one or more columns of a table or a view to create new keys that order or group the data rows according to the columns used.

Indexes have 2 types, Clustered and Non-clustered:

Clustered Indexes:

A clustered index determines the order of the rows on disk, hence, there can only be one clustered index for a table. If a table has no clustered index, the rows are structured as a heap. By default, a Primary key constraint creates a clustered index.

Non-Clustered Indexes:

Unlike clustered indexes, A non-clustered index uses disk space because it is stored separated from the table itself. It's a structure that contains the index key values where each key has an entry (row locator) which is a pointer to the data rows. A row locator can be either:

a. Pointer to the row itself.
b. The clustered key index of the row if the table has a clustered index.

By default, a Unique constraint creates a non-clustered index. And since a non-clustered index does not affect the order of the rows on disk, we can create more than one.

Here are some key differences between clustered and non-clustered indexes:

Clustered	Non-Clustered
1. physically order and sort rows in memory	1. logically order rows and use row locators to access actual rows.
2. Generally faster access to data.	2. Slow access to data compared to clustered index.
3. requires no extra space.	3. requires extra space since it is stored separately. However, non-clustered indexes are much smaller since they don't store the pages themselves.
4. By default, it is the primary key of the table.	4. By default, it is created with UNIQUE constraint.

Choosing When to Use Indexing:

Whether you choose clustered or non-clustered indexes depends on what you need and the nature of your project. Actually, choosing to use indexes at all should never be your default. For example, you don't want to put an index on a field that is rarely used or used in small amount of data. You also don't want to use too many indexes or else inserting in the database would take too long since indexes need to be modified every time you INSERT, DELETE or UPDATE a row in the database.

Conclusion:

SQL indexes are a very powerful tool to reduce querying time, however, if not used carefully, they can cause write operations to take more resources. At the end of the day, it's a trade-off that you must be careful before choosing to take.

DEV Community