SQL Server’s identity column misconceptions

#sql #sqlserver #misconceptions

There are a lot of misconceptions about identity columns in SQL Server – some of them are more common than others – Here’s a list of the most common misconceptions I’ve seen or heard about identity columns:

An Identity column is unique.
An Identity column generates consecutive numbers.
An identity column can’t auto-generate already existing values.
An identity column as a primary key is enough to identify the row.
Using the wrong tool to get identity values back after an insert.

Some of these misconceptions are so widely spread – so I thought I should debunk them, even though most of them are officially documented to be false.

An Identity column is unique

Official documentation clearly states:

The identity property on a column does not guarantee Uniqueness of the value. Uniqueness must be enforced by using a PRIMARY KEY or UNIQUE constraint or UNIQUE index.

Further more, SQL Server provides you a way to manually insert the value of an identity column in an insert statement,
using the SET IDENTITY_INSERT T-SQL command.

An Identity column generates consecutive numbers

This is yet another very common misconception, also documented to be false:

SQL Server might cache identity values for performance reasons and some of the assigned values can be lost during a database failure or server restart. This can result in gaps in the identity value upon insert.

Further more – Gaps in identity columns will be created each time a row is deleted from the table, or an insert statement is rolled back. SQL Server will not reuse such values – which is also documented:

For a given identity property with specific seed/increment, the identity values are not reused by the engine. If a particular insert statement fails or if the insert statement is rolled back then the consumed identity values are lost and will not be generated again. This can result in gaps when the subsequent identity values are generated.

An identity column can’t auto-generate already existing values

SQL Server will not auto-generate already existing values – unless the identity property is reseeded using a database console command called DBCC CHECKIDENT.
Using this command you can change the seed of the identity property, causing all the values computed from that point forward to be computed using the new seed. SQL Server will not check if the newly created values already exist in the table.

An identity column as a primary key is enough to identify the row

If your table’s primary key contains the identity column, but there’s no other unique constraint or index on the table, there’s nothing SQL Server can do to prevent you from inserting duplicate data into the table – meaning rows that the only difference between them is the identity value.

So, should you stop using identity columns as your primary key? Not necessarily.

A primary key that is not actually a part of the table’s data as the table’s is called a surrogate key – unlike a natural key which is a primary key that is comprised of one or more columns that contains the actual data of the table.

The debate over surrogate keys vs. natural keys is probably as old as the first rdbms that supported surrogate keys – and is yet to be determined. Some DBAs prefer using surrogate keys while others prefer using only natural keys. Personally, I prefer using surrogate keys, because they can simplify relations between tables and of course join queries.
However, keep in mind: Whenever a surrogate key is used, it’s imperative to also create a unique constraint or index on the column(s) that makes up the natural key of the table!

Using the wrong tool to get identity values back after an insert

I know I’ve already blogged about it, but it is a common enough misconception to include in the list.
The short version: A lot of developers (and DBAs) are unaware of the subtle differences between @@Identity, Scope_Identity(), and Ident_Current(), and a lot of them are also unaware of the fact that the output clause returns the values before trigger execution.

Most of the times, you want to get the value directly inserted from the statement you’ve just executed. In that case, you can use Scope_Identity() or the Output clause. If there are insert triggers involved, you might want to use a transaction with isolation level serializable and Ident_current() instead.