DEV Community

Cover image for Updating the Collation of an Existing MSSQL Database
Michael Kennedy
Michael Kennedy

Posted on • Edited on

Updating the Collation of an Existing MSSQL Database

When products are installed outside of a controlled development environment we expect the unexpected, that's why we have logging. Product databases are typically an area where this isn't as much of an issue. Beyond client data, the variables are restricted to SQL version numbers, hardware and the server database collation...

As much as some might advise otherwise, clients will install multiple products on a single SQL server to reduce licencing costs.

When we don't have control over the server collation, products that make heavy use of #temporary tables or that query system databases such as msdb can run into problems. System databases will tend to use the same collation as the server and if we're not careful we'll encounter the dreaded collation error:

Cannot resolve the collation conflict between “SQL_Latin1_General_CP1_CI_AS” and “Latin1_General_CI_AS” in the equal to operation.

This situation commonly occurs when an existing database is restored to a SQL server where the target collations don't match.

Preventative Programming

As developers, the first thing that we can do to prevent this is by deliberately working environments where the collation of the development database has been deliberately set to something different to the server collation.

This forces us to automatically consider database collations when writing our stored procedure and function code, substituting collate database_default whenever necessary.

Core Database Collation

If the proverbial ship has sailed then we can find ourselves in a position where we have to manually update the collation of large client databases.

At first glance it might look like we just need to open database properties and change the collation but it's far from being that simple.

When we update the collation in this manner we're actually changing the default collation, newly created tables and columns will use this collation but existing objects will be unaffected.

Updating the collation of an existing database is tricky, but it can be boiled down to a simple process.

Preparation

  1. Take a database backup (because not to do so is to invite disaster).
  2. Set recovery mode to Simple.

    USE [master]
    GO
    ALTER DATABASE [SampleDatabase] SET RECOVERY SIMPLE WITH NO_WAIT
    GO
    
  3. Set the database to single user mode (only necessary in an environment where another system could be accessing the database).

    USE [master]
    GO
    ALTER DATABASE [SampleDatabase] SET SINGLE_USER WITH NO_WAIT
    GO
    

Target Columns

Once our database backup has been configured we'll need to work out which database columns need to be updated. We can achieve this by querying system tables to retrieve a list of columns which don't use the new database collation.

declare @newCollationName sysname
set @newCollationName = 'Latin1_General_CI_AI'

select
    *
from
(
    select
        s.[name] as [schema],
        o.[name] as [table],
        c.[name] as [column],
        t.[name] as [datatype],
        case
            when 
                c.[max_length] = -1 AND
                (
                    lower(t.[name]) = 'char' OR
                    lower(t.[name]) = 'nchar' OR
                    lower(t.[name]) = 'varchar' OR
                    lower(t.[name]) = 'nvarchar'
                )
            then 'max'
            when lower(t.[name]) = 'varchar' OR lower(t.[name]) = 'char'  then convert(varchar,c.[max_length])
            when lower(t.[name]) = 'nvarchar' OR lower(t.[name]) = 'nchar'  then convert(varchar,c.[max_length]/2)
            else null
        end as [length],
        c.[collation_name] as [collation]
    from sys.objects o
    inner join sys.columns c on (c.[object_id] = o.[object_id])
    inner join sys.types t on (t.[system_type_id] = c.[system_type_id] AND t.[user_type_id] = c.[system_type_id])
    inner join sys.schemas s on (s.[schema_id] = o.[schema_id])
    where 
        o.[is_ms_shipped] = 0 and 
        o.[type] = 'U' and
        c.[collation_name] is not null AND
        lower(c.[collation_name]) != lower(@newCollationName)
    union select
        s.[name] as [schema],
        o.[name] as [table],
        c.[name] as [column],
        'cs' as [datatype],
        'cs' as [length],
        'cs' as [collation]
    from sys.objects o
    inner join sys.columns c on (c.[object_id] = o.[object_id])
    inner join sys.types t on (t.[system_type_id] = c.[system_type_id] AND t.[user_type_id] = c.[system_type_id])
    inner join sys.schemas s on (s.[schema_id] = o.[schema_id])
    where 
        o.[is_ms_shipped] = 0 and 
        c.is_computed = 1
) s
order by
    s.[table],
    s.[column]
Enter fullscreen mode Exit fullscreen mode

Incompatible Objects

Now we have a list of database columns which need to be updated, but we still can't act on it, we'll need to drop all items associated with these columns first - Functions, Indexes, Unique Constraints, Foreign/Primary keys and Computed columns will all need to be dropped before we can attempt to change the collation.

The products I work with make use of a standalone database upgrade tool which compares a targeted database directly against the expected database schema and will automatically re-create any missing items.
In other scenarios it will be necessary to script these objects before dropping them in order to recreate them later.

Executing the below SQL will give us the information we need. (where #targetColumns is the output of the previous SELECT operation)

--drop table valued functions
select
    'DROP FUNCTION [' + s.[name] + '].[' + o.[name] + ']' as command,
    1 as [priority]
from sys.objects o
inner join sys.schemas s on (s.[schema_id] = o.[schema_id])
where o.[type] = 'TF'

--indexes to drop
union SELECT 
    'DROP INDEX [' + s.[name] + '].[' + t.[name] + '].[' + ind.[name] + ']' as command,
    2 as [priority]
FROM sys.indexes ind 
INNER JOIN sys.index_columns ic ON  ind.object_id = ic.object_id and ind.index_id = ic.index_id 
INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id 
INNER JOIN sys.tables t ON ind.object_id = t.object_id 
inner join sys.schemas s on (s.[schema_id] = t.[schema_id])
inner join #targetColumns tcol 
on 
(
    lower(tcol.[schema]) = lower(s.[name]) AND
    lower(tcol.[table]) = lower(t.[name]) AND
    lower(tcol.[column]) = lower(col.[name])
)
WHERE 
    ind.is_primary_key = 0 AND 
    ind.is_unique = 0 AND 
    ind.is_unique_constraint = 0 AND 
    t.is_ms_shipped = 0 
group by
    s.[name],
    t.[name],
    ind.[name]

--drop unique constraints
union SELECT 
    'ALTER TABLE [' + s.[name] + '].[' + t.[name] + '] DROP CONSTRAINT [' + ind.[name] + ']' as command,
    2 as [priority]
FROM sys.indexes ind 
INNER JOIN sys.index_columns ic ON  ind.object_id = ic.object_id and ind.index_id = ic.index_id 
INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id 
INNER JOIN sys.tables t ON ind.object_id = t.object_id 
inner join sys.schemas s on (s.[schema_id] = t.[schema_id])
inner join #targetColumns tcol 
on 
(
    lower(tcol.[schema]) = lower(s.[name]) AND
    lower(tcol.[table]) = lower(t.[name]) AND
    lower(tcol.[column]) = lower(col.[name])
)
WHERE 
    ind.is_primary_key = 0 AND 
    ind.is_unique = 1 AND 
    t.is_ms_shipped = 0 
group by
    s.[name],
    t.[name],
    ind.[name]

--foreign keys to drop
union select
    'ALTER TABLE [' + s.fk_schema + '].[' + s.[fk_table] + '] DROP CONSTRAINT [' + s.[fkname] + ']' as command,
    3 as [priority]
from
(
    select 
        fk.name as [fkname],
        schema_name(tab.schema_id) as [schema],
        tab.name as [table],
        col.name as [column],
        schema_name(tab.schema_id) as [fk_schema],
        tab.name as [fk_table]
    from sys.tables tab
        inner join sys.columns col 
            on col.object_id = tab.object_id
        inner join sys.foreign_key_columns fk_cols
            on fk_cols.parent_object_id = tab.object_id
            and fk_cols.parent_column_id = col.column_id
        left outer join sys.foreign_keys fk
            on fk.object_id = fk_cols.constraint_object_id
        left outer join sys.tables pk_tab
            on pk_tab.object_id = fk_cols.referenced_object_id
        left outer join sys.columns pk_col
            on pk_col.column_id = fk_cols.referenced_column_id
            and pk_col.object_id = fk_cols.referenced_object_id
    union select 
        fk.name as [fkname],
        schema_name(pk_tab.schema_id) as [schema],
        pk_tab.name as [table],
        pk_col.name as [column]  ,  
        schema_name(tab.schema_id) as [fk_schema],
        tab.name as [fk_table]
    from sys.tables tab
        inner join sys.columns col 
            on col.object_id = tab.object_id
        inner join sys.foreign_key_columns fk_cols
            on fk_cols.parent_object_id = tab.object_id
            and fk_cols.parent_column_id = col.column_id
        left outer join sys.foreign_keys fk
            on fk.object_id = fk_cols.constraint_object_id
        left outer join sys.tables pk_tab
            on pk_tab.object_id = fk_cols.referenced_object_id
        left outer join sys.columns pk_col
            on pk_col.column_id = fk_cols.referenced_column_id
            and pk_col.object_id = fk_cols.referenced_object_id
) s
inner join #targetColumns t 
on 
(
    lower(t.[schema]) = lower(s.[schema]) AND
    lower(t.[table]) = lower(s.[table]) AND
    lower(t.[column]) = lower(s.[column])
)

--primary keys to drop
union SELECT 
    'ALTER TABLE [' + s.[name] + '].[' + t.[name] + '] DROP CONSTRAINT [' + ind.[name] + ']' as command,
    4 as [priority]
FROM sys.indexes ind 
INNER JOIN sys.index_columns ic ON  ind.object_id = ic.object_id and ind.index_id = ic.index_id 
INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id 
INNER JOIN sys.tables t ON ind.object_id = t.object_id 
inner join sys.schemas s on (s.[schema_id] = t.[schema_id])
inner join #targetColumns tcol 
on 
(
    lower(tcol.[schema]) = lower(s.[name]) AND
    lower(tcol.[table]) = lower(t.[name]) AND
    lower(tcol.[column]) = lower(col.[name])
)
WHERE 
    ind.is_primary_key = 1 AND
    t.is_ms_shipped = 0 
group by
    s.[name],
    t.[name],
    ind.[name]

--columns to drop
union select
    'ALTER TABLE [' + s.[name] + '].[' + o.[name] + '] DROP COLUMN [' + c.[name] + ']' as command,
    6 as [priority]
from sys.objects o
inner join sys.columns c on (c.[object_id] = o.[object_id])
inner join sys.types t on (t.[system_type_id] = c.[system_type_id] AND t.[user_type_id] = c.[system_type_id])
inner join sys.schemas s on (s.[schema_id] = o.[schema_id])
where 
    o.[is_ms_shipped] = 0 and 
    c.is_computed = 1
Enter fullscreen mode Exit fullscreen mode

In larger databases, dropping this many items in one go will still add data to the database log even with simple logging mode enabled.
Our execution of this process includes a truncation of database logs every 100 records, this is particularly useful when executing in an environment where disk space is at a premium.

Update Column Collation

Once all associated items have been dropped, we can simply iterate through the list of columns and update the database collation.

ALTER TABLE [{schema}].[{table}] 
ALTER COLUMN [{column}] {data type}({length}) 
COLLATE {new collation name}
Enter fullscreen mode Exit fullscreen mode

Update Database Collation

Now we're in a position where we can update the core database collation. This is easy enough to do within SQL Management Studio, but we can also achieve this by executing the following SQL command:

ALTER DATABASE {database name} COLLATE {new collation name};
Enter fullscreen mode Exit fullscreen mode

At this point we can recreate previously dropped items and take the database out of single user mode. Depending on the number of items that were changed, it may also be prudent to reindex the database.

Top comments (0)