DEV Community

Vishnu KN
Vishnu KN

Posted on

TIL: Clustered and Non-clustered indices in SQL

The tables that we see in SQL server are not the stored data but the "presentation" of it. When a table is created and rows are inserted, behind the scenes, SQL stores the actual data relating to that table on what is known as data pages. This behind the scenes stuff is called SQL internals. And the way it is stored may be "clustered" or "non-clustered" (based on what is specified) or "heap" (if nothing is specified)

Heap is like haphazardly dumping the data wherever space is available, like an unorganized phone book. In SQL server, if no index is defined, heap is the default approach.

Clustered index is when you organize that phone book according to some parameter (like first name, etc). That means, when clustered index is defined, the heap is organized as per the parameter defined in the clustered index (and thus it is no longer a heap, it is eliminated).

Heap and clustered index are thus opposites of each other.

And non clustered index is like a yellow pages book which references the phone book (organized or un-organized).


In SQL server, IF primary key is defined, it automatically becomes the clustered index key. If not defined, there will be no clustered index key. And any column can be defined as the clustered index key or as the non clustered index key.


AI Visulaization:

1️⃣ Heap (no clustered index)

Rows are just placed wherever there’s space. No particular order:

Data Pages
+-------------------+   +-------------------+
| StudentID=3,Ravi  |   | StudentID=2,Meera |
| StudentID=1,Anil  |   |                   |
+-------------------+   +-------------------+
Enter fullscreen mode Exit fullscreen mode

👉 To find Name='Meera', SQL has to scan everything.


2️⃣ Clustered Index on StudentID

Rows are physically sorted by StudentID across data pages:

Data Pages (clustered on StudentID)
+-------------------+   +-------------------+
| StudentID=1,Anil  |   | StudentID=3,Ravi  |
| StudentID=2,Meera |   |                   |
+-------------------+   +-------------------+
Enter fullscreen mode Exit fullscreen mode

👉 To find StudentID=2, SQL can go straight to it (fast).


3️⃣ Non-Clustered Index on Name (table has clustered index on StudentID)

Separate structure, sorted by Name, pointing back to rows:

Non-Clustered Index (sorted by Name)
+--------------------+
| Anil   → StudentID=1
| Meera  → StudentID=2
| Ravi   → StudentID=3
+--------------------+

Clustered Data Pages (sorted by StudentID)
+-------------------+   +-------------------+
| 1,Anil,22         |   | 3,Ravi,20         |
| 2,Meera,21        |   |                   |
+-------------------+   +-------------------+
Enter fullscreen mode Exit fullscreen mode

👉 Steps when you query WHERE Name='Meera':

  1. SQL looks in the non-clustered index → finds Meera → StudentID=2.
  2. Uses the clustered index on StudentID → fetches row 2,Meera,21.

4️⃣ Non-Clustered Index on Name (table is a heap)

If there’s no clustered index, the non-clustered index points to the physical address (RID):

Non-Clustered Index (sorted by Name)
+--------------------+
| Anil   → Page1,Slot2
| Meera  → Page2,Slot1
| Ravi   → Page1,Slot1
+--------------------+

Heap Data Pages (unordered)
+-------------------+   +-------------------+
| Page1: Ravi(3)    |   | Page2: Meera(2)   |
| Page1: Anil(1)    |   |                   |
+-------------------+   +-------------------+
Enter fullscreen mode Exit fullscreen mode

👉 Here, SQL jumps directly to the physical location.


✅ Summary:

  • Heap = unordered, non-clustered indexes point to physical row location.
  • Clustered = ordered by clustered key, non-clustered indexes point to clustered key.

Optimizing how SQL stores data behind the scenes (internals) can make a huge difference in performance and scalability.

Non clustered indexes come in useful in scenarios where a column which is not a clustered index is frequently being queried.

🔹 Example

Suppose you have a table clustered on EmployeeID (so rows are stored ordered by ID):

EmployeeID Name Department
1 Anil IT
2 Meera Finance
3 Ravi HR
  • Query 1:
  SELECT * FROM Employees WHERE EmployeeID = 2;
Enter fullscreen mode Exit fullscreen mode

👉 Fast because clustered index is on EmployeeID.

  • Query 2:
  SELECT * FROM Employees WHERE Name = 'Meera';
Enter fullscreen mode Exit fullscreen mode

👉 Without non-clustered index, SQL must scan every row (slow).
👉 With a non-clustered index on Name, SQL can directly use the sorted Name list and jump to the right row.


✅ Analogy

  • Clustered index = phone book sorted by last name (only one order possible).
  • Non-clustered index = an extra list, say sorted by phone number or by city, pointing back to the entries in the main book.

So, a non-clustered index is useful when you want fast lookups or sorts based on a column other than the clustered key.


In the above example, SQL can directly jump to the M's and that reduces the volume of scanning that needs to be done.

Top comments (0)