DEV Community

Yeeun Ju
Yeeun Ju

Posted on

Clustered vs Non-Clustered: Understanding Database Indexes

Where Indexes are Used

When a user sends a data retrieval request to the database, the DB server process first checks the memory.
It caches the queried data in the DB buffer cache, and if the desired data is not there, it copies the data file from the disk, finds the queried data, and returns it.

In this process, indexes greatly improving data retrieval efficiency.
If a column without an index is queried, the entire table must be retrieved and searched.
Using indexes to improve query speed is one way to enhance performance.

How Do Indexes Improve Query Speed?

An index is, simply put, like a table of contents in a book.
The table of contents allows you to quickly move to the desired content within the book.
However, additional pages need to be printed to accommodate the table of contents.

Indexes work the same way.
They help you query information in the table more easily but require additional space.
When an index is created for a specific column, it maintains a constantly sorted state in a new index table (external file).

index-table

And, because it always maintains a sorted state, it needs to be updated every time Insert, Update, or Delete operations occur.

Data Structures Used by Indexes

Hash Table

Time complexity of O(1)

B+ Tree

  • Unlike B-trees, all data is stored only in leaf nodes, and leaf nodes are connected as a linked list.
  • While having a time complexity of O(logN), are particularly well-suited for inequality operations.
  • MySQL and Microsoft SQL Server have indexes implemented as B+ trees.
  • PostgreSQL has indexes implemented as B-trees.

There are two types of indexes, which can be created as follows:

  • Clustered Index
CREATE CLUSTERED INDEX index_name ON table_name(column_name);
Enter fullscreen mode Exit fullscreen mode
  • Non-Clustered Index
CREATE NONCLUSTERED INDEX index_name ON table_name(column_name);
-- or
CREATE INDEX index_name ON table_name(column_name);
Enter fullscreen mode Exit fullscreen mode

*The default is a non-clustered index.
*While MongoDB doesn't have clustered indexes, its '_id' field behaves similarly to a clustered index in many ways.

Clustered Index

Original table:

ID Name Age
2 Charlie 35
3 Alice 30
1 Bob 25

After creating a clustered index based on ID (modified original table):

ID Name Age
1 Bob 25
2 Charlie 35
3 Alice 30
  • Physically sorted according to the index order (the current table is rearranged)
  • Only one can exist per table and has the conditions of a Primary Key (PK)
    • Because it's a key for sorting
    • When a PK is specified, it's automatically created and sorted
  • Uses B-tree(depth is important), O(logN)
  • It offers improved query performance.
  • While there may be some overhead during data insertion due to maintaining the order, it generally improves search performance
  • Typically, an integer column (ID) with auto_increment is used as PK for the clustered index

After creating a clustered index on the 'Name' column:

ID Name Age
3 Alice 30
1 Bob 25
2 Charlie 35

Non-Clustered Index

Original table:

ID Name Age
2 Charlie 35
3 Alice 30
1 Bob 25

After creating a non-clustered index on the 'Name' column (additional table):

Name Row Locator
Alice Locator to Row 3
Bob Locator to Row 1
Charlie Locator to Row 2
  • The index is stored in a separate structure, and the actual data is not sorted.
  • The Row Locator refers to the location of the data.
    • The content of the Row Locator depends on the type of the table (with clustered index / without clustered index)
  • Multiple non-clustered indexes can be created on a single table.
  • An additional step of accessing the actual data after index lookup is required

Additionally, both types of indexes can be used together, and indexes can be created for multiple columns.

Which Columns Should Have Indexes?

*It depends on usage, so it's best to determine based on various factors such as query patterns, data distribution, table size, etc.

  1. Columns with high cardinality (e.g., email: Yes, gender: No)
  2. Columns with high selectivity (i.e., columns with low duplication)

Top comments (0)