I'm taking a break from writing about deploying Sitecore 10 to take some time cleaning up our existing 8.0 Sitecore databases in preparation for a move to Sitecore 10. Three DBs, Master, Preview and Web are in excess of 45 GB and are that way for a variety of reasons I list below:
- Little maintenance of content so old content not removed
- Legacy content relating to old implementations, losing a/b paths and so on have not been removed
- Media library bug not fixed until a later version where a copy of each upload is created for each language version
- Lack of version pruning
The goal is twofold:
- Shrink the size of the production DBs to improve performance, particularly on the CM, reduce backup time and disk space consumed by backups
- Shrink the size of the test DBs to < 10 GB to allow for use on developer workstations.
This will be a lengthy post for which I apologize in advance. I suppose I could break it up into multiple posts but instead I'm going to test your attention spans. Enjoy!
Diving In
The first step was to assess the state of play of the current DBs, lets start with the Core DB, the smallest one (well... in theory).
Core - Not So Small
The reported disk utilization for this DB is insane at 86 GB. Most of this is empty space so it worth running some simple cleanup operations at the SQL level. The following is a simple script that cleanups up some transitory tables, drops some unnecessary (non-Sitecore) tables and then runs a DBCC Shrink Database.
Note: Running SQL Server Shrink routines does not necessarily lead to better performance, in fact it may make it worse. However, a primary goal here is to make the DB small enough for use on workstations.
USE SiteCore_Core_8
GO
PRINT 'Shrinking Core'
PRINT 'Truncating History, EventQueue and History...'
truncate table History
truncate table EventQueue
truncate table PublishQueue
GO
PRINT 'Dropping data sync Tables... (not part of a typical SC deploy'
GO
ALTER DATABASE [SiteCore_Core_8] SET RECOVERY SIMPLE
GO
drop table ...
drop table ...
GO
PRINT 'Shrinking Database...'
GO
DBCC SHRINKDATABASE (SiteCore_Core_8, 0);
GO
ALTER DATABASE [SiteCore_Core_8] SET RECOVERY FULL
GO
The end result is a substantial reduction in physical file size, down to 6 GB total between data file and log file.
The End Result
DB | Before | After |
---|---|---|
Core | 85 GB | 6 GB |
Master | 60 GB | 50 GB |
Preview | 43 GB | 39 GB |
Web | 68 GB | 42 GB |
Not bad given we did almost nothing other than run a simple SQL Script. As I mentioned earlier this effort thus far is not designed to improve database performance but to give us a baseline to continue to shrink the DB for use on developer workstations.
The Next Phase
Now that we've plucked the low hanging fruit by running some SQL statements it's time to switch to writing some Powershell. To do this I fired up an 8.0 Sitecore CM directly on the SQL Server and pointed it at the slightly shrunken databases. I am going to be deleting A LOT of items so reducing latency is important.
Write a SPE (Sitecore Powershell Extension) script to remove old and redundant content:
- Delete obsolete content
- Delete content nodes
- Delete template nodes
- Delete layout/sublayouts/renderings
- Delete media library nodes
- Run CleanupDatabase for each database
- Run Rebuild Link Database for each database
- Run Rebuild indices (only for a production or test environment)
Below is some powershell to delete the nodes. Notice they are wrapped in a BulkUpdateContext to speed up the process. We don't care about eventing etc. because we will rebuild all the indices anyway. I'm recording the elapsed time as well just to give some context to the operation, if deleting 100K items takes 100 ms, we have a problem.
$time = Measure-Command {
New-UsingBlock (New-Object Sitecore.Data.BulkUpdateContext) {
Get-Item "master:/media library/dead" | Remove-Item -Recurse -Permanently
Get-Item "master:/content/deadcontent" | Remove-Item -Recurse -Permanently
Get-Item "master:/layout/sublayouts/deaderthancorduroy" | Remove-Item -Recurse -Permanently
Get-Item "master:/templates/deadlikeme" | Remove-Item -Recurse -Permanently
}
}
Write-Host "Completed deleting deadContent in $($time.TotalSeconds) seconds."
Write-Log "Completed deleting deadContent in $($time.TotalSeconds) seconds."
First lets take a look at the table size for the Sitecore Master DB.
As you can see a huge amount of the space consumed by the DB is in the blob table. After running our cleanup routine we should see these sizes shrink dramatically.
For the powershell above I broke the operations up into units of related content. The output of each block executing is shown below. The big one was about 25 GB of media library images that are not actually yet removed from the blob table, for that we'll need to run the cleanup database routine which I cover later in this post.
Completed deleting dead content nodes in 711.6083175 seconds
Completed deleting defunct ecommerce bits in 2333.9450871 seconds.
It's executing slower than I liked, so I decided to introduce more disablers as we really don't care about cache updates, security checks and so on.
$time = Measure-Command {
New-UsingBlock (New-Object Sitecore.Data.BulkUpdateContext) {
New-UsingBlock (New-Object Sitecore.SecurityModel.SecurityDisabler) {
New-UsingBlock (New-Object Sitecore.Data.DatabaseCacheDisabler) {
New-UsingBlock (New-Object Sitecore.Data.Events.EventDisabler) {
Get-Item "master:/media library/Images/reallylotsofimages" | Remove-Item -Recurse -Permanently
Get-Item "master:/layout/Renderings/reallyhugestuffRenderings" | Remove-Item -Recurse -Permanently
}
}
}
}
}
Write-Host "Completed deleting dead huge stuff in $($time.TotalSeconds) seconds."
Write-Log "Completed deleting dead huge stuff in $($time.TotalSeconds) seconds."
Completed deleting Really Huge Stuff in 4688.6444181 seconds.
Removing Redundant Media Library Versions
There is an old "feature" of Sitecore that was addressed in 8.1 (I believe) where during upload of media library items a copy will be created for each language version present in Sitecore. It sounds a bit more like a bug and less like a feature... regardless, we can fix that problem with some handy powershell.
Version Pruning
At this point we need to start considering the database bound for official test environment (and ultimately production) use and that to be used by developers. With developers we can aggressively prune versions and prune to one version but for test environment and production use we want to prune to a more moderate 5-10 versions (and skip some altogether). For this purpose I will take a backup of DB and set it aside as the basis for a separate migration as needed. With a backup safely set aside we can dig into version pruning.
We have used the Version Pruner marketplace module in the past but I never felt entirely comfortable with it's operation. Now that we have the amazing Powershell module at our disposal I much prefer to use it for this purpose so we have full control of it's operation. To that end I engaged the wonderful Elena Mosoff from Sitecore to write a script for us.
The full script of the pruner is here. Maybe someone can find it useful.
It's been many years since we last did any version pruning so I expect this script to have a significant impact on the size of the DB.
The log size and DB size have ballooned which is to be expected given all the operations, mass deletions etc. However, row counts should be telling...
Wow, nearly 9 million versions removed. No wonder it took 27 hours to execute! However, something is amiss... There are nearly 9 million rows in the Archive table. Looks like the RemoveItemVersion command moves things to archive by default and does not support the -permanently flag like removeItem does. After a quick chat with Elena, she pointed me at the code for RemoteItemVersion. If you look closely it's actually using the Permanently flag, it's just undocumented.
The script has been updated to use it, but in the meantime I need to empty that recycle bin.
Enter my favorite Sitecore Consultant Elena (again) who provided a handy dandy SQL snippet for emptying the recycle bin. She pointed out to not do this on a DB you care about (you have been warned). I took a backup first... (I do trust you Elena... I do... for reals).
After our naughty SQL deletion our physical table size report shows the following:
Excellent, now the Archive tables are much smaller although you'll notice the blob table has not changed at all. This is because blobs are not removed when you delete media items but only when you run the Database Cleanup routine from the control panel in Sitecore.
Database Cleanup Routines
Thanks to the venerable Michael West I was able to quickly execute some cleanup routines using powershell.
Powershell Extensions System Maintenance
I'm also rebuilding the link DB to remove any orphan links and hopefully also reduce the size of the link tables.
foreach($database in Get-Database) {
if(!$database.ReadOnly) {
Write-Log "Cleaning up the $($database) database."
$time = Measure-Command {
$database.CleanupDatabase()
}
Write-Log "Completed cleaning up the $($database) database in $($time.TotalSeconds) seconds."
}
}
# Rebuild the Sitecore link DB in each DB
foreach($database in Get-Database) {
if(!$database.ReadOnly) {
Write-Log "Rebuilding the $($database) link database."
$time = Measure-Command {
[Sitecore.Globals]::LinkDatabase.Rebuild($database)
}
Write-Log "Completed rebuilding the $($database) link database in $($time.TotalSeconds) seconds."
}
}
Trouble in Paradise
After running for almost 14 hours the cleanup routine threw an error:
Moral of the story? Give yourself plenty of space for the tempdb to grow. It consumed 100 GB or so before running out of room and throwing an error. There was also substantial log growth for the master DB on a separate drive. I ended up creating a new volume and attaching it to the EC2 instance and moving the tempDB there. I ran the process again this time it completed after another 14 hours or so and finished successfully! Phew.
Ignorant Indices
Some of those table indices look too large. Time for a little more trusty SQL.
USE SiteCore_Master_8;
GO
ALTER INDEX ALL ON dbo.VersionedFields
REBUILD
GO
That's a 24 GB reduction in blobs, impressive, and you can see the index on VersionedFields and Descendants are much smaller but we still are not at our < 10GB goal. The physical file sizes are also too large so it's time to run our DBCC shrink routine again.
Progress
Not bad, that's a significant reduction in the size of the DB. However, we are not quite there yet as our target is below 10 GB for the developer database. What else can we do?
It's quite possible there are many more unused media library items and we can do an analysis of our IIS logs to determine what is in use, but that's a future exercise as it will take some time and carries more risk. What we can do though is aggressively compress the images stored in the blob table.
The Gloves are Coming Off Sitecore - Enter MozJpeg
MozJpeg is an image compression library written by the Mozilla Foundation and designed to provide advanced image compression tools. My experiments with this library have shown some impressive results. Carnival.com already has dynamic image compression provided by Akamai so there isn't much benefit to using MozJpeg in production, however, for the purposes of shrinking the development DB it could be quite useful. I came up with the following approach to hopefully give us a little more space.
A simple console application that:
- Reads the Sitecore blob table
- Extracts the image
- Compress using MozJpeg
- Write image back to the blob table
The code is here: Blob Compressor Github
The image doesn't have to be highest quality, just good enough to give the developer an idea of the appearance of the site. I set it at 25, but 75 is a good sweet spot if you're looking to retain most of the image quality and gain some space back.
Murphy Strikes Back
Seriously, Murphy being who he is and all means that there was no way it was going to be that easy. The problem with the Image datatype in SQL is that it doesn't resize dynamically, so if you run the compression over the blob table you'll see it's output showing compression rates of up to 90% but then when you look at the physical size of the table guess how much it will have changed? Zero. Zilch. Nada. Yet gentlemen we have the technology... (yes, I'm old enough to remember THAT show).
Enter... VARBINARY(MAX). The Image datatype is actually deprecated in SQL Server and I did notice that the Blobs table in Sitecore 10 has actually been updated to use varbinary(max). The problem is that when you copy the image column data to varbinary it doesn't seem to automatically shrink, even if the image is compressed already. It's possible Sitecore is doing some magic when using the data migration tool but I'm not sure so instead I copied the blobs table into a second table with the structure modified to use varbinary(max). The table was initially the same size as the original Blobs table, but once we run the compression tool on it it will shrink.
Lets check our disk space utilization again after running our compression tool over the varbinary table:
Notice the BlobsCompressed table is 5 GB smaller! Phew.
Next we need to push the blobs back into the normal structure and table name. That first statement is scary...
truncate table dbo.Blobs
GO
INSERT INTO dbo.Blobs
SELECT [Id]
,[BlobId]
,[Index]
,[Data]
,[Created]
,[DAC_Index]
FROM dbo.BlobsCompressed
GO
To illustrate the compression, here is an example image in the media library. Sitecore reports the original image size as 64 KB, when I download the image it's 11 KB.
The actual downloaded file:
The actual image. The compression artifacts are obvious but it's fine for a developer workstation:
We'll follow up with another DBCC Shrink operation to get a final, minimized DB size.
Are We There Yet?
Yes! No more bathroom stops or long running powershell scripts.
We made it. 60 GB --> 7 GB.
To reiterate, this is not something to be applied to a production database. Instead, for production, I am going to prune to a more sensible 5 versions (with exceptions) and not compress the blobs which would leave a production database of around 12-14 GB. Now the DB is ready for migration to Sitecore 10.
Thanks for making it this far, stay tuned for future posts and more Sitecore 10 adventures.
Top comments (0)