Vishnu KN

Posted on Sep 25

TIL: Clustered and Non-clustered indices in SQL

#beginners #sql #database

The tables that we see in SQL server are not the stored data but the "presentation" of it. When a table is created and rows are inserted, behind the scenes, SQL stores the actual data relating to that table on what is known as data pages. This behind the scenes stuff is called SQL internals. And the way it is stored may be "clustered" or "non-clustered" (based on what is specified) or "heap" (if nothing is specified)

Heap is like haphazardly dumping the data wherever space is available, like an unorganized phone book. In SQL server, if no index is defined, heap is the default approach.

Clustered index is when you organize that phone book according to some parameter (like first name, etc). That means, when clustered index is defined, the heap is organized as per the parameter defined in the clustered index (and thus it is no longer a heap, it is eliminated).

Heap and clustered index are thus opposites of each other.

And non clustered index is like a yellow pages book which references the phone book (organized or un-organized).

In SQL server, IF primary key is defined, it automatically becomes the clustered index key. If not defined, there will be no clustered index key. And any column can be defined as the clustered index key or as the non clustered index key.

AI Visulaization:

1️⃣ Heap (no clustered index)

Rows are just placed wherever there’s space. No particular order:

Data Pages
+-------------------+   +-------------------+
| StudentID=3,Ravi  |   | StudentID=2,Meera |
| StudentID=1,Anil  |   |                   |
+-------------------+   +-------------------+

👉 To find Name='Meera', SQL has to scan everything.

2️⃣ Clustered Index on `StudentID`

Rows are physically sorted by StudentID across data pages:

Data Pages (clustered on StudentID)
+-------------------+   +-------------------+
| StudentID=1,Anil  |   | StudentID=3,Ravi  |
| StudentID=2,Meera |   |                   |
+-------------------+   +-------------------+

👉 To find StudentID=2, SQL can go straight to it (fast).

3️⃣ Non-Clustered Index on `Name` (table has clustered index on StudentID)

Separate structure, sorted by Name, pointing back to rows:

Non-Clustered Index (sorted by Name)
+--------------------+
| Anil   → StudentID=1
| Meera  → StudentID=2
| Ravi   → StudentID=3
+--------------------+

Clustered Data Pages (sorted by StudentID)
+-------------------+   +-------------------+
| 1,Anil,22         |   | 3,Ravi,20         |
| 2,Meera,21        |   |                   |
+-------------------+   +-------------------+

👉 Steps when you query WHERE Name='Meera':

SQL looks in the non-clustered index → finds Meera → StudentID=2.
Uses the clustered index on StudentID → fetches row 2,Meera,21.

4️⃣ Non-Clustered Index on `Name` (table is a heap)

If there’s no clustered index, the non-clustered index points to the physical address (RID):

Non-Clustered Index (sorted by Name)
+--------------------+
| Anil   → Page1,Slot2
| Meera  → Page2,Slot1
| Ravi   → Page1,Slot1
+--------------------+

Heap Data Pages (unordered)
+-------------------+   +-------------------+
| Page1: Ravi(3)    |   | Page2: Meera(2)   |
| Page1: Anil(1)    |   |                   |
+-------------------+   +-------------------+

👉 Here, SQL jumps directly to the physical location.

✅ Summary:

Heap = unordered, non-clustered indexes point to physical row location.
Clustered = ordered by clustered key, non-clustered indexes point to clustered key.

Optimizing how SQL stores data behind the scenes (internals) can make a huge difference in performance and scalability.

Non clustered indexes come in useful in scenarios where a column which is not a clustered index is frequently being queried.

🔹 Example

Suppose you have a table clustered on EmployeeID (so rows are stored ordered by ID):

EmployeeID	Name	Department
1	Anil	IT
2	Meera	Finance
3	Ravi	HR

Query 1:

  SELECT * FROM Employees WHERE EmployeeID = 2;

👉 Fast because clustered index is on EmployeeID.

Query 2:

  SELECT * FROM Employees WHERE Name = 'Meera';

👉 Without non-clustered index, SQL must scan every row (slow).
👉 With a non-clustered index on Name, SQL can directly use the sorted Name list and jump to the right row.

✅ Analogy

Clustered index = phone book sorted by last name (only one order possible).
Non-clustered index = an extra list, say sorted by phone number or by city, pointing back to the entries in the main book.

So, a non-clustered index is useful when you want fast lookups or sorts based on a column other than the clustered key.

In the above example, SQL can directly jump to the M's and that reduces the volume of scanning that needs to be done.

DEV Community

TIL: Clustered and Non-clustered indices in SQL

1️⃣ Heap (no clustered index)

2️⃣ Clustered Index on `StudentID`

3️⃣ Non-Clustered Index on `Name` (table has clustered index on StudentID)

4️⃣ Non-Clustered Index on `Name` (table is a heap)

🔹 Example

✅ Analogy

Top comments (0)

1️⃣ Heap (no clustered index)

2️⃣ Clustered Index on StudentID

3️⃣ Non-Clustered Index on Name (table has clustered index on StudentID)

4️⃣ Non-Clustered Index on Name (table is a heap)

🔹 Example

✅ Analogy

2️⃣ Clustered Index on `StudentID`

3️⃣ Non-Clustered Index on `Name` (table has clustered index on StudentID)

4️⃣ Non-Clustered Index on `Name` (table is a heap)