DEV Community

Cover image for SQL Server Indexes Explained: Column Order, INCLUDE, and the Mistakes That Taught Me
Mashrul Haque
Mashrul Haque

Posted on

SQL Server Indexes Explained: Column Order, INCLUDE, and the Mistakes That Taught Me

Part 3 of the SQL Server Performance Series

Last updated: January 2026

I inherited a table with 47 indexes. Here's what that disaster taught me about column order, INCLUDE columns, and knowing when enough is enough.


TL;DR

Indexes are sorted copies of your columns with pointers back to the full rows. SQL Server reads them left to right. Column order is everything. INCLUDE columns let you avoid key lookups without bloating the sorted structure. Every index slows down writes. And please, for the love of everything, check whether your indexes are actually being used before you create more.


Forty-Seven Indexes

I once inherited a database where a single table had forty-seven nonclustered indexes.

Counted twice. The number seemed wrong. It wasn't.

The history wasn't hard to piece together. Every few months, someone noticed a slow query. They created an index. The query got faster. They moved on. Repeat for a decade. Nobody ever cleaned up. The logic made sense if you didn't think too hard: indexes make queries faster, so more indexes means faster database. Right?

Inserts on that table took 800 milliseconds. Eight hundred. The team was convinced they had a hardware problem. There was serious talk of a server upgrade.

We dropped forty indexes that hadn't been read in six months. Insert time dropped to 15 milliseconds. Query performance stayed exactly the same because nobody was using those indexes anyway.

More indexes is not better. The right indexes is better.

I still think about that table.


The Thing Itself

Forget the phone book analogy. Everyone uses that one and it only gets you so far.

An index is a separate data structure. It stores a sorted copy of specific columns, plus a pointer back to the full row. That's it. That's the whole thing.

When you create an index on CustomerId, SQL Server builds this new structure where all CustomerId values live in sorted order, each pointing back to its full row in the main table. Think of it like a very efficient lookup table that knows exactly where to find things.

Query WHERE CustomerId = 12345? SQL Server binary searches the sorted index. This is fast. O(log n) fast. It jumps directly to the matching rows using those pointers.

No index? SQL Server has no choice. It reads every row in the table and checks: is this CustomerId 12345? Is this one? How about this one? That's a scan. With millions of rows, it's painful. With an index, it seeks directly to the answer. (If you want to see what this looks like in practice, Part 2 on execution plans shows you how to spot the difference.)

Where the Phone Book Breaks Down

The phone book analogy works fine for simple cases. Sorted by last name? Finding "Smith" is easy. Flip to S.

But think harder and it falls apart:

How would you find everyone named "John" regardless of last name? You'd literally have to read every page. The book's organization works against you.

A book sorted by "LastName, FirstName" is useless for finding all Johns. The data is there. You just can't get to it efficiently.

And here's the other thing: when someone moves, nobody updates the phone book. But databases change constantly. Rows get inserted, updated, deleted. The index has to keep up.

Real indexes are more flexible than phone books. But the core principle is identical: the sort order determines which queries can use the index efficiently. Get the order wrong and you might as well not have an index at all.


Clustered and Nonclustered

Two sentences. That's all you need.

Clustered index: The table data itself, physically sorted by the index columns. You get one. Just one. It's not a copy of anything. It IS the table.

Nonclustered index: A separate sorted copy that points back to the clustered index. You can have many of these.

That's it. Everything else is implementation detail.

Clustered Indexes

When you create a clustered index, you're deciding how the table's data pages are physically arranged on disk. The rows sit in sorted order based on your clustered key.

CREATE CLUSTERED INDEX IX_Orders_OrderDate ON Orders(OrderDate);
Enter fullscreen mode Exit fullscreen mode

Now OrderDate IS the physical ordering of the table. New rows get inserted in OrderDate order. Pages are arranged by OrderDate.

If you don't create a clustered index, you get a "heap." A table with no physical order. Heaps aren't inherently bad. I've used them intentionally for staging tables. But they complicate other things and confuse the optimizer in edge cases.

Most tables should have a clustered index on the primary key. For OLTP systems, that usually means an integer identity column (minimal fragmentation from inserts). For time-series data, consider the datetime column instead. New rows always go at the end, which is what you want.

(The clustered index choice affects how the optimizer plans your queries. It's worth understanding that relationship.)

Nonclustered Indexes

These are the indexes you create for query optimization. The ones you'll spend 90% of your time thinking about.

Each nonclustered index stores the indexed columns (sorted), the clustered index key (so it can find the full row), and any INCLUDE columns you added.

CREATE NONCLUSTERED INDEX IX_Orders_Status ON Orders(Status);
Enter fullscreen mode Exit fullscreen mode

Query filters on Status? SQL Server seeks into IX_Orders_Status, finds the matching rows, grabs the clustered key from each one, and uses those keys to fetch the full rows from the clustered index.

That last step, fetching full rows, is called a "key lookup." It's expensive. It's the thing INCLUDE columns exist to eliminate.


Why Column Order Matters

This is the concept I got wrong for years. The one I see other developers get wrong too.

An index on (A, B, C) works great for:

  • WHERE A = 1 (just the first column, no problem)
  • WHERE A = 1 AND B = 2
  • All three columns together: WHERE A = 1 AND B = 2 AND C = 3
  • Range queries work too: WHERE A = 1 AND B > 5
  • Even WHERE A = 1 ORDER BY B (the sort comes free)

But it falls apart for:

  • WHERE B = 2 : A is missing, so SQL Server can't use the sorted structure
  • WHERE C = 3 : same problem, worse
  • WHERE A = 1 AND C = 3 : this one surprises people. B is missing, so C can't seek. SQL Server finds all the A=1 rows but then has to scan them for C=3
  • WHERE B = 2 AND C = 3 : dead on arrival without A

SQL Server uses indexes left to right. Once you skip a column, you can't seek on later columns.

Here's the quick reference:

Query Index (A, B, C) Result
WHERE A = 1 Seeks efficiently
WHERE A = 1 AND B = 2 Seeks efficiently
WHERE A = 1 AND B = 2 AND C = 3 Seeks efficiently
WHERE B = 2 Full scan, A missing
WHERE A = 1 AND C = 3 Partial Seeks on A, scans for C
WHERE B = 2 AND C = 3 Full scan, A missing

The Library

Picture a library organized by genre, then author, then title.

Finding "Science Fiction → Asimov → Foundation"? Easy. Walk to Sci-Fi, find the A shelf, grab Foundation. Ten seconds.

Finding "any genre, Asimov, anything"? Now you're checking the Asimov section of every single genre. Romance Asimov. Horror Asimov. Probably empty, but you still have to check. Way slower.

Finding "any genre, any author, Foundation"? You're walking every shelf in the building looking for that one title. Bring a lunch.

Same books. Same organization. Wildly different search times depending on which pieces of information you have when you start looking.

The Ordering Rules

When I'm designing a composite index, I follow this order. It's not the only way, but it works.

Equality predicates first. These are your = comparisons. They narrow things down the most.

Range predicates next. Your >, <, BETWEEN filters. These come after equalities because once you hit a range, you can't seek any further.

ORDER BY columns at the end. If your ORDER BY matches the index order, SQL Server skips the sort operation entirely. Free performance.

-- The query pattern
SELECT OrderId, OrderDate, Total
FROM Orders
WHERE CustomerId = @CustomerId
  AND Status = 'Pending'
  AND OrderDate > @StartDate
ORDER BY OrderDate DESC;

-- The index that matches it
CREATE INDEX IX_Orders_Optimal
ON Orders(CustomerId, Status, OrderDate DESC)
INCLUDE (Total);
Enter fullscreen mode Exit fullscreen mode

Why this order? CustomerId and Status are equality predicates. Either could be first. OrderDate is a range predicate, so it comes after the equalities. OrderDate DESC matches the ORDER BY, which means no sort operation needed. Total goes in INCLUDE because we SELECT it but don't filter on it.


INCLUDE Columns

I see this pattern all the time. Someone creates an index, the query uses it, everyone celebrates. Then it falls over in production.

-- The query
SELECT OrderId, OrderDate, Total, Status
FROM Orders
WHERE CustomerId = 123;

-- The index someone created
CREATE INDEX IX_Orders_Customer ON Orders(CustomerId);
Enter fullscreen mode Exit fullscreen mode

The index works. Technically. SQL Server seeks to CustomerId = 123. Great. But then it has to do a key lookup for every single matching row to fetch OrderDate, Total, and Status. The index doesn't have those columns.

Ten matching rows? Ten key lookups. Whatever. You won't notice.

Ten thousand matching rows? Ten thousand key lookups. Now you notice.

INCLUDE adds columns to the index leaf level without affecting sort order.

CREATE INDEX IX_Orders_Customer
ON Orders(CustomerId)
INCLUDE (OrderDate, Total, Status);
Enter fullscreen mode Exit fullscreen mode

Now the index contains everything the query needs. No key lookups. SQL Server reads only the index. It never touches the main table.

This is called a "covering index." The index covers all columns the query needs.

Where to Put Columns

Quick reference:

Filter Type Example Where It Goes
Equality filters WHERE Status = 'Active' Key column
Range filters WHERE OrderDate > @StartDate Key column, after equalities
ORDER BY columns ORDER BY CreatedDate Key column at the end
SELECT-only columns SELECT Total, Notes INCLUDE
JOIN conditions ON Orders.CustomerId = ... Key column

The rule is simple. If you filter or sort on it, key column. If you just retrieve it, INCLUDE.

A Note on Size

INCLUDE columns increase index size. A covering index might grow nearly as large as the table itself.

That's usually fine. Disk is cheap. Key lookups are expensive.

But be thoughtful. Including a VARCHAR(MAX) column might make the index enormous. In those cases, ask whether the key lookup cost is actually a problem worth solving.


The Cost of Indexes

Every index you create has a price. Multiple prices, actually.

Write overhead. Every INSERT updates every index. Every UPDATE that touches indexed columns updates those indexes. DELETE? Same story. One index is fine. Ten is noticeable. Forty-seven is where you start getting calls from the DBA at 3 AM.

Storage. This one sneaks up on you. Each nonclustered index is a complete copy of the indexed columns plus the clustered key plus INCLUDE columns. I've seen tables where the indexes used more disk space than the data. A lot more. Five times more in one memorable case.

Memory pressure. SQL Server caches index pages in the buffer pool. More indexes means more pages fighting for limited RAM. The worst part? An index that's never used still takes up buffer pool space when it gets loaded. You're paying memory rent for a tenant that contributes nothing.

Maintenance windows that keep growing. Indexes fragment over time. More indexes means more fragmentation means longer rebuild times. Your maintenance window creeps from 2 hours to 4 hours to "we need to talk about moving to a Saturday night."


Finding Unused Indexes

SQL Server tracks index usage in sys.dm_db_index_usage_stats. This DMV shows seeks, scans, lookups, and updates for each index since the last restart.

SELECT
    OBJECT_SCHEMA_NAME(i.object_id) AS SchemaName,
    OBJECT_NAME(i.object_id) AS TableName,
    i.name AS IndexName,
    i.type_desc AS IndexType,
    ISNULL(s.user_seeks, 0) AS UserSeeks,
    ISNULL(s.user_scans, 0) AS UserScans,
    ISNULL(s.user_lookups, 0) AS UserLookups,
    ISNULL(s.user_updates, 0) AS UserUpdates,
    CASE
        WHEN s.user_seeks + s.user_scans + s.user_lookups = 0 THEN 'UNUSED'
        WHEN s.user_updates > (s.user_seeks + s.user_scans) * 10 THEN 'WRITE-HEAVY'
        ELSE 'OK'
    END AS Assessment
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s
    ON i.object_id = s.object_id
    AND i.index_id = s.index_id
    AND s.database_id = DB_ID()
WHERE i.type_desc = 'NONCLUSTERED'
    AND OBJECTPROPERTY(i.object_id, 'IsUserTable') = 1
ORDER BY
    (ISNULL(s.user_seeks, 0) + ISNULL(s.user_scans, 0) + ISNULL(s.user_lookups, 0)) ASC,
    s.user_updates DESC;
Enter fullscreen mode Exit fullscreen mode

One catch: this data resets on SQL Server restart. Failover? Gone. Patch Tuesday reboot? Gone.

Don't drop indexes based on one week of data. Wait until you have at least one full business cycle. A month minimum. A quarter is better. Some indexes only matter during year-end close. You might need a full year of data before you're sure.

The Safe Way to Remove an Index

Don't just DROP it. That's how you learn about that one critical report that runs once a quarter.

  1. Identify candidates (zero reads, lots of writes)
  2. Disable the index first. This deletes the index data but keeps the definition in metadata. SQL Server stops considering it for queries
  3. Wait. Watch. A week at minimum. Preferably through an end-of-month close
  4. If nothing catches fire, drop it
  5. If something breaks, rebuild it. The definition is still there, so SQL Server knows exactly what to reconstruct
-- Step 1: Disable (deletes data, keeps definition)
ALTER INDEX IX_Orders_Unused ON Orders DISABLE;

-- Step 2a: Something broke? Put it back
ALTER INDEX IX_Orders_Unused ON Orders REBUILD;

-- Step 2b: Nothing broke after a week? Kill it
DROP INDEX IX_Orders_Unused ON Orders;
Enter fullscreen mode Exit fullscreen mode

Finding Missing Indexes

SQL Server tracks missing index suggestions in sys.dm_db_missing_index_* DMVs. These are indexes the optimizer wished existed while running queries.

SELECT TOP 20
    CONVERT(DECIMAL(18,2), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS ImprovementMeasure,
    DB_NAME(mid.database_id) AS DatabaseName,
    OBJECT_NAME(mid.object_id, mid.database_id) AS TableName,
    mid.equality_columns AS EqualityColumns,
    mid.inequality_columns AS InequalityColumns,
    mid.included_columns AS IncludedColumns,
    migs.user_seeks,
    migs.user_scans,
    migs.avg_user_impact AS AvgImpactPercent
FROM sys.dm_db_missing_index_group_stats migs
JOIN sys.dm_db_missing_index_groups mig
    ON migs.group_handle = mig.index_group_handle
JOIN sys.dm_db_missing_index_details mid
    ON mig.index_handle = mid.index_handle
WHERE mid.database_id = DB_ID()
ORDER BY ImprovementMeasure DESC;
Enter fullscreen mode Exit fullscreen mode

Don't Create These Blindly

Here's the thing about missing index suggestions: SQL Server is being helpful, but it's also being dumb.

It has no idea whether a similar index already exists in different column order. No clue how the new index will tank your insert performance. No context about whether this query matters to anyone or runs once a year during an audit.

Treat these as hints, not commands.

Before creating anything, I ask myself a few questions. Does this query even run that often? Is there an existing index I could tweak instead of creating a whole new one? Is the read improvement worth the write hit?

And the big one: can I consolidate three of these suggestions into a single smarter index?

Consolidation

Here's something I see all the time. The missing index DMV spits out three suggestions:

  • Index on (A, B) INCLUDE (C)
  • Index on (A, B, C)
  • Index on (A) INCLUDE (B, D)

Three separate indexes, right? No. Step back and look at the pattern.

CREATE INDEX IX_Consolidated ON Table(A, B, C) INCLUDE (D);
Enter fullscreen mode Exit fullscreen mode

One index. Covers all three query patterns. This is the kind of thing SQL Server can't figure out for you. You have to actually think about it.


A Decision Framework

Before I create an index, I make myself answer a few questions. Keeps me from creating indexes I'll regret later.

Does this query even matter? How often does it run? Who cares if it's slow? There's a difference between "the checkout page takes 3 seconds" and "the monthly audit report takes 3 seconds." A query that runs once a month at 3 AM while everyone's asleep? Maybe it doesn't need an index. Maybe 3 seconds is fine.

What's actually happening right now? Pull up the execution plan (Ctrl+M in SSMS, or just paste your query into Plan Explorer). Is it scanning when it should seek? Key lookups everywhere? How many logical reads? I need real numbers, not guesses.

Is there already an index that's close?

SELECT
    i.name AS IndexName,
    STRING_AGG(c.name, ', ') WITHIN GROUP (ORDER BY ic.key_ordinal) AS KeyColumns,
    STRING_AGG(CASE WHEN ic.is_included_column = 1 THEN c.name END, ', ') AS IncludedColumns
FROM sys.indexes i
JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE i.object_id = OBJECT_ID('Orders')
    AND i.type_desc = 'NONCLUSTERED'
GROUP BY i.name
ORDER BY i.name;
Enter fullscreen mode Exit fullscreen mode

Sometimes you're one INCLUDE column away from a covering index. Adding a column to an existing index beats creating a whole new one.

What's the write cost going to be? How often does this table get modified? Is it already groaning under the weight of eight indexes? Are inserts already suspiciously slow? For read-heavy tables, indexes are almost always worth it. For tables getting hammered with INSERTs all day? Be picky.

Did it actually work? This is the step people skip. Create the index, run the query again, look at the new execution plan. Did seeks replace scans? Did key lookups disappear? Did logical reads drop?

If not, you might have the wrong index. Or the problem might be something else entirely. You've been chasing the wrong thing.


Common Questions

"How many indexes is too many?"

Depends. I know. But it does.

I've seen tables that legitimately need fifteen indexes and tables where three is too many. The real question isn't the count. It's whether they're earning their keep.

If you have indexes with zero reads, you have too many. Period.

For OLTP tables with heavy writes, I try to keep it under five nonclustered indexes. Reporting tables? Different story. Go wild. Nobody's doing bulk inserts into your reporting database at 2 PM on a Tuesday.

Foreign keys: yes, you probably need to index them

SQL Server doesn't automatically create indexes on foreign key columns. This surprises people. They assume FK = index. It doesn't.

CREATE INDEX IX_Orders_CustomerId ON Orders(CustomerId);
Enter fullscreen mode Exit fullscreen mode

Foreign keys show up in JOINs, and joins need indexes on both sides. Skip this and you'll get table scans on every join. I see this mistake constantly.

One exception: if the foreign key has only a handful of values, like a Status column with 5 options, the index won't help much.

Composite indexes: the leftmost column trick

Here's something useful. An index on (A, B) can serve queries on just A efficiently. The reverse isn't true. An index on just A can't help much with A AND B queries because it has to scan all the A matches looking for B.

So if you have queries on both A alone and A, B together? Just create (A, B). One index, two query patterns covered.

Index rebuilds: probably not as urgent as you think

Fragmentation matters less with modern SSDs. I've seen DBAs obsess over 15% fragmentation on SSD storage. Don't.

That said, it's not zero. The traditional rule of thumb: under 10% fragmentation, leave it alone. Between 10-30%, REORGANIZE. Over 30%, REBUILD. These are guidelines, not laws. Microsoft's current guidance actually says you shouldn't use fixed thresholds at all. Measure the actual impact on your workload instead.

SELECT
    OBJECT_NAME(ps.object_id) AS TableName,
    i.name AS IndexName,
    ps.avg_fragmentation_in_percent,
    ps.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') ps
JOIN sys.indexes i ON ps.object_id = i.object_id AND ps.index_id = i.index_id
WHERE ps.page_count > 1000
ORDER BY ps.avg_fragmentation_in_percent DESC;
Enter fullscreen mode Exit fullscreen mode

For enterprise systems, use Ola Hallengren's Index Maintenance Solution. It's become the industry standard for good reason.

Filtered indexes: powerful but finicky

Filtered indexes only include rows matching a condition. Smaller. More targeted. Can be exactly what you need.

CREATE INDEX IX_Orders_Active
ON Orders(CustomerId, OrderDate)
WHERE Status = 'Active';
Enter fullscreen mode Exit fullscreen mode

Great when your queries always include that filter.

But here's the gotcha: the query must include the filter condition exactly, or SQL Server won't even consider using the filtered index. And I mean exactly. Parameterization can mess this up in subtle ways.


Final Thoughts

Every index is a bet. "I think read speed here matters more than write overhead." Sometimes you're right. Sometimes you end up with forty-seven indexes and 800-millisecond inserts.

Don't create indexes because you think they might help someday. Don't drop them because some article (including this one) made you paranoid. Look at the data. Look at the execution plans. Make decisions based on evidence.

That table with forty-seven indexes? I still think about it sometimes.

The goal isn't more indexes. It's the right indexes. Turns out that's a harder problem. Also a more interesting one.


About the Author

I'm Mashrul Haque, a Systems Architect with over 15 years of experience building enterprise applications with .NET, Blazor, ASP.NET Core, and SQL Server. I specialize in Azure cloud architecture, AI integration, and performance optimization.

When production catches fire at 2 AM, I'm the one they call.

This is Part 3 of the SQL Server Performance Series. Part 1 covers how the optimizer works. Part 2 teaches you to read execution plans. Part 4 covers SARGability: the query patterns that prevent SQL Server from using your carefully designed indexes.

Top comments (0)