DEV Community

Cover image for What happens when SQL Server PAGE gets corrupted ?
Amar Abaz
Amar Abaz

Posted on

What happens when SQL Server PAGE gets corrupted ?

In this article, I will explore how SQL Server works with PAGES, the smallest unit of data storage. We will explore how SQL Server organizes data within these pages and how to interact with them. Additionally, I am gonna simulate page corruption, run a validity check, and demonstrate how to fix it.

Data files are divided into 8KB pages, which consist of

  • Page Header (96 bytes); Contains metadata about the page.
  • Data Rows (8060 bytes); Holds the actual data of your table.
  • Slot Array (36 bytes); Manages the location of data rows within the page.

Together, these components make up 8192 bytes, or 8KB, which is the standard size for a SQL Server page.
An extent in SQL Server consists of 8 contiguous 8KB pages, totaling 64KB.
There are two types of extents UNIFORM EXTENTS, where all 8 pages are used by a single object, and MIXED EXTENTS, where the pages are shared by multiple objects. This helps optimize space usage and performance, and behaviour is menagable in your database Options.

SELECT [name], [is_mixed_page_allocation_on] FROM sys.databases;
Enter fullscreen mode Exit fullscreen mode

These pages are stored in physical files, categorized as

  • Primary or Master Data File (.mdf)
    The main file where system and user data are stored by default. It contains also sys data of your database.

  • Secondary or Next Data File (.ndf)
    Additional data files used to expand storage, but optional for better arhitecture and sizing.

  • Transaction Log Data File (.ldf)
    Stores the transaction logs, which track changes made to the database to ensure consistency and recovery.

These files work together to manage your database storage and ensure data integrity.

Let's get to it..

Lets now check the allocation and structural integrity of the pages in the AdventureWorks2022 database, (skipped checking the indexes). I will now focus on table Person where we will do our corruption test.

DBCC CHECKTABLE ('AdventureWorks2022.Person.Person');
Enter fullscreen mode Exit fullscreen mode


DBCC CHECKALLOC ('AdventureWorks2022', NOINDEX);
Enter fullscreen mode Exit fullscreen mode

Identifying Pages in Your Database

You can identify which pages a table occupies by using DBCC IND.
This command provides a breakdown of where the table's data is stored across pages and files.

DBCC IND('AdventureWorks2022', 'Person.Person', -1)
Enter fullscreen mode Exit fullscreen mode


Once you've identified a page, you can go deeper into its content using the DBCC PAGE command.
Lets show the internal structure of the page, including the Page Header, m_pageId, m_objId, and the data rows stored in it.

For example, to inspect page 1314 in file 1.

DBCC PAGE('AdventureWorks2022', 1, 1314, 1) WITH TABLERESULTS;
Enter fullscreen mode Exit fullscreen mode

Simulating Page Corruption

To demonstrate how corruption can occur, let's intentionally modify a page.

ALTER DATABASE [AdventureWorks2022] SET  SINGLE_USER WITH NO_WAIT;
DBCC WRITEPAGE('AdventureWorks2022', 1, 1314, 0, 1, 0x11, 1);
Enter fullscreen mode Exit fullscreen mode

Verifying the Corruption

Now run DBCC CHECKDB or CHECKTABLE again to see how SQL Server detects the integrity corruption. Do not use CHECKALLOC because it only checks for allocation problems not integrity.

DBCC CHECKDB('AdventureWorks2022');
DBCC CHECKTABLE ('AdventureWorks2022.Person.Person');
Enter fullscreen mode Exit fullscreen mode


These commands will perform an integrity check on the database and report any issues they find, including corruption in the affected pages.
If we select the whole table we have an error.

Msg 824, Level 24, State 2, Line 32
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x246e3cb2; actual: 0x24663cb2). It occurred during a read of page (1:1314) in database ID 5 at offset 0x00000000a44000 in file 'C:\MSSQL\DATA\AdventureWorks2022.mdf'. Additional messages in the SQL Server error log or operating system error log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

More about DBCC Commands on my earlier topic here:
MSSQL DBCC How good are they really ? 👌
.
.
.

Now lets talk about the ways how to fix this and what to keep in mind..

🤞First Option: Try REPAIR_REBUILD

DBCC CHECKTABLE ('Person.Person', REPAIR_REBUILD); 
Enter fullscreen mode Exit fullscreen mode

REPAIR_REBUILD can fix certain types of index corruption or inconsistent structures without causing data loss. It is safer than REPAIR_ALLOW_DATA_LOSS because it doesn't drop data but will rebuild corrupted indexes and structures.
However REPAIR_REBUILD won't help in our case.

Msg 8928, Level 16, State 1, Line 39
Object ID 2101582525, index ID 1, partition ID 72057594049724416, alloc unit ID 72057594057523200 (type In-row data): Page (1:1314) could not be processed. See other errors for details.
The repair level on the DBCC statement caused this repair to be bypassed.
Msg 8939, Level 16, State 98, Line 39
Table error: Object ID 2101582525, index ID 1, partition ID 72057594049724416, alloc unit ID 72057594057523200 (type In-row data), page (1:1314). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 2057 and -4.
Repairing this error requires other errors to be corrected first.
Msg 8976, Level 16, State 1, Line 39
Table error: Object ID 2101582525, index ID 1, partition ID 72057594049724416, alloc unit ID 72057594057523200 (type In-row data). Page (1:1314) was not seen in the scan although its parent (1:1568) and previous (1:1313) refer to it. Check any previous errors.
Repairing this error requires other errors to be corrected first.
Msg 8978, Level 16, State 1, Line 39
Table error: Object ID 2101582525, index ID 1, partition ID 72057594049724416, alloc unit ID 72057594057523200 (type In-row data). Page (1:1315) is missing a reference from previous page (1:1314). Possible chain linkage problem.
Repairing this error requires other errors to be corrected first.
There are 19967 rows in 3807 pages for object "Person.Person".
CHECKTABLE found 0 allocation errors and 4 consistency errors in table 'Person.Person' (object ID 2101582525).

😥Second Option: REPAIR_ALLOW_DATA_LOSS

DBCC CHECKTABLE ('Person.Person', REPAIR_ALLOW_DATA_LOSS); 
Enter fullscreen mode Exit fullscreen mode

This should be your last option and only to be used if you are okay with the possibility of data loss. It attempts to repair the corruption in the table, but any corrupted data that cannot be repaired will be removed. Before proceeding, it is crucial to do BACKUP DATABASE to ensure you have a recovery point in case the repair process causes unintended consequences.

🤔Third Option: PAGE RESTORE
Page restore is a technique used to restore specific corrupted pages in a database from an available backup. The goal is to replace the corrupted pages with uncorrupted pages from a previous backup, minimizing the impact on the rest of the database and avoiding the need for a full restore. However, to successfully restore a page, it's critical to identify when the page was last consistent and healthy.

When you have regular log backups as part of your Recovery Time Objective (RTO) strategy, you can typically recover your database to the point of the last log backup. However, even when using CHECKSUM and VERIFY during backups, these options only validate the integrity of the backup file and do not detect database corruption.

Now let's check and confirm the corrupt page id using command bellow.
This select shows those pages that SQL Server has flagged as suspect, which means it gives you a history of past corruption events.

SELECT * FROM msdb.dbo.suspect_pages;
Enter fullscreen mode Exit fullscreen mode

Execute commands below and do not forget to apply the log restore after a page restore, if not it will break the continuity of your database's transaction LOG chain. After that make another log backup and restore, and check again the whole database with DBCC CHECKDB. This should be the way to do it with large databases, because we will not restore the whole database only page and logs.

RESTORE DATABASE [AdventureWorks2022] PAGE='1:1314' 
FROM  DISK = N'Z:\BCKP\w19$SQ\AdventureWorks2022\FULL\w19$SQ_AdventureWorks2022_FULL_20260214_122842.bak' WITH  FILE = 1,  
NORECOVERY,  NOUNLOAD,  STATS = 5, ONLINE   -- online is supported by Enterprise edition only
RESTORE LOG [AdventureWorks2022] FROM  DISK = N'Z:\BCKP\w19$SQ\AdventureWorks2022\LOG\w19$SQ_AdventureWorks2022_LOG_20260214_122900.trn' WITH  FILE = 1,  
NORECOVERY,  NOUNLOAD,  STATS = 5

BACKUP LOG [AdventureWorks2022] TO DISK = N'Z:\BCKP\w19$SQ\AdventureWorks2022\LOG\LOG_AdventureWorks2022' WITH INIT
RESTORE LOG [AdventureWorks2022] FROM DISK = N'Z:\BCKP\w19$SQ\AdventureWorks2022\LOG\LOG_AdventureWorks2022' WITH RECOVERY
Enter fullscreen mode Exit fullscreen mode

🫡Fourth Option: Remote restore and recreate
Restore to a new location and recreate that Table (For small databases).
Keep it simple if the corruption is isolated to a single table or a small part of the database, and the database size allows for it. An alternative approach is to restore the database to a new location and then recreate the corrupted table on primary db. While doing that you could also make a Trigger to deny any interaction if SQL Server still allows insert, update, delete to be performed.

CREATE TRIGGER tr_tableDENY ON Person.Person
AFTER INSERT, UPDATE, DELETE
AS
BEGIN
    RAISERROR ('This table is temporarily locked for investigation.', 16, 1);
    ROLLBACK TRANSACTION;
END
Enter fullscreen mode Exit fullscreen mode

Final Advice

Make sure to run DBCC CHECKDB regularly. This is crucial for maintaining the integrity of your database. However, don't rely solely on DBCC checks. Set up real time alerts for corruption detection.

Here's my advice, create an SQL job that constantly monitors the msdb.dbo.suspect_pages table for new entries and sends you an alert as soon as any corruption is detected. You can clean this table after alert was processed.

Moreover, since Error 823, 824, and 825 are associated with corruption in database pages, you should set up an alert for these specific errors as well. That way, you can be notified immediately when they happen.
To do this go to SQL Server Agent, Right click on Alerts and select New Alert. Make Alert for every Error and set to use Operator for mail response.

  1. Error 823: This error indicates that SQL Server encountered an I/O error while reading or writing to a disk.
  2. Error 824: This error occurs when SQL Server detects a logical consistency error in the data read from a page.
  3. Error 825: This error signifies a page failure during an I/O operation.

Reference:

Top comments (0)