Mike Pendon (Zym)

Posted on Mar 1, 2020 • Edited on Aug 4, 2020

C#: Bulk Operations for .NET Core (BulkDelete, BulkInsert, BulkMerge, BulkUpdate) is now very simple with RepoDb ORM

#csharp #dotnet #sql #tutorial

Introduction

We would like to share you the latest and greatest updates of RepoDb ORM. Recently, we have published the complete support to the bulk operations for SQL Server in .NET data access space.

The actual repository can be found at RepoDb.SqlServer.BulkOperations.

The updates covered the following methods.

This feature is useful if you are working with huge datasets. It is extremely improving the performance of your application, and at the same time, increasing your productivity during the development due to the simplicity of the design and implementation.

As an author of RepoDb, and being an OSS contributor, I am trying to improve the development experience of the .NET developers when it comes to data accessibility. I am doing it for free, and it will always be. In return, I only like to hear your thoughts, feedback and also the usability experiences when using this library. Through this, we can help each other to further improve this solution.

The benefits of using the Bulk Operations

Nothing else, but PERFORMANCE!

Though, the same goal can be achieved by the batch operations (i.e.: DeleteAll, InsertAll, MergeAll and UpdateAll), but, by leveraging the real bulk operations functionalities, the performance of your application will be improved drastically with a huge-percentage gap.

To explain further, when you only use the batch operations, it is only batching the multiple atomic activities. There are multiple round-trips between your application and your database server. However, when you use the bulk operations, you are bringing all the data from your application into the database server at once. In addition, it is ignoring the database specific operations (i.e.: Logging, Audits, Data-Type Checks, Constraints, etc), thus gives you maximum performance during the operation.

How does it works?

It is leveraging the ADO.NET SqlBulkCopy class from both System.Data.SqlClient and Microsoft.Data.SqlClient namespaces.

The BulkInsert operation is calling the SqlBulkCopy.WriteToServer() method underneath to bring all the data from your application into the database. No additional logic is implied. However, the BulkDelete, BulkMerge and the BulkUpdate operations has an implied logic and technique implemented.

Implied Logic

Basically, a pseudo-temporary table will be created in the database under the transaction context. It then re-uses the BulkInsert operation to bulk process the data from your application towards that pseudo-temporary table. Through this intermediary table, we brought all the data from your application into the database server at one-go.

The library will then create a CLUSTERED INDEX on the pseudo-temporary table using your qualifiers. You can pass your customized qualifiers with a list of Field objects into the qualifiers argument.

If you have not passed any values to the qualifiers argument, the PrimaryKey will be used by default. If the PrimaryKey is not present on the target table, it will use the Identity field instead. Otherwise, an exception will be thrown.

Once all the data is in the database pseudo-temporary table, an optimized SQL Statement will be used to cascade the changes towards the original table.

The generated SQL Statement will be vary on the operations you used. See the SQL Statements below per operation.

For BulkDelete

DELETE T
FROM [dbo].[OriginalTable] T
INNER JOIN [PseudoTempTable] TMP ON TMP.QualiferField1 = T.Field1 AND TMP.QualifierField2 = T.Field2;

For BulkMerge

MERGE [dbo].[OriginalTable] T
USING [PseudoTempTable] S ON S.QualiferField1 = T.Field1 AND S.QualifierField2 = T.Field2
WHEN NOT MATCHED THEN
     INSERT (...) VALUES (...)
WHEN MATCHED THEN
     UPDATE
     SET (...);

For BulkUpdate

UPDATE T
SET T.Field3 = TMP.Field3
     , T.Field4 = TMP.Field4
     , ...
FROM [OriginalTable] T
INNER JOIN [PseudoTempTable] TMP ON TMP.QualiferField1 = T.Field1 AND TMP.QualifierField2 = T.Field2;

Object Parameters

The bulk operations support the following object in all operations.

System.DataTable
System.Data.Common.DbDataReader
System.Collections.Generic.IEnumerable<T>

When you the use the IEnumerable<T> object, the library is wrapping it with a customized object named DataEntityDataReader before even calling the actual bulk operation.

Special Arguments

The arguments qualifiers and usePhysicalPseudoTempTable is provided at the BulkDelete, BulkMerge and BulkUpdate operations.

The argument qualifiers is used to define the qualifier fields to be used during the execution. It refers to the WHERE expression of SQL Statements. If not given, the PrimaryKey will be used. If not given, the Identity field will be used.

The argument usePhysicalPseudoTempTable is used to define whether a physical pseudo-table will be created during the operation. By default, a temporary table (i.e.: #TableName) is created.

Caveats

RepoDb is automatically setting the value of the options argument to SqlBulkCopyOptions.KeepIdentity when calling the (https://repodb.net/operation/bulkdelete), BulkMerge and BulkUpdate operations. Only if you have not passed any qualifiers and if your table has an Identity-PrimaryKey column. The same logic also applies if the PrimaryKey is not present but the underlying table has an Identity column.

Also, when calling the (https://repodb.net/operation/bulkdelete), BulkMerge and BulkUpdate operations, the library is creating a pseudo-temporary table behind the scene. It requires your user to have the correct privilege to create a table in your database, otherwise a SqlException will be thrown.

Getting Started

To start, simply install the library. At the Package Manager Console, type the command below.

> Install-Package RepoDb.SqlServer.BulkOperations

Then, call the boostrapper once.

SqlServerBootstrap.Initialize();

You can visit our installation page for more information.

BulkDelete

The code below only showcasing the BulkDelete via the IEnumerable<T> parameter.

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkDelete<Customer>(customers);
}

Or with qualifiers

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkDelete<Customer>(customers, qualifiers: e => new { e.LastName, e.BirthDate });
}

Or via table-name

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkDelete("Customer", customers);
}

Or via table-name with qualifiers

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkDelete("Customer", customers, qualifiers: e => new { e.LastName, e.BirthDate });
}

BulkInsert

The code below only showcasing the BulkInsert via IEnumerable<T> parameter.

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkInsert<Customer>(customers);
}

Or via table-name

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkInsert("Customer", customers);
}

BulkMerge

The codes below only showcasing the BulkMerge via IEnumerable<T> parameter.

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkMerge<Customer>(customers);
}

Or with qualifiers

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkMerge<Customer>(customers, qualifiers: e => new { e.LastName, e.BirthDate });
}

Or via table-name

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkMerge("Customer", customers);
}

Or via table-name with qualifiers

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkMerge("Customer", customers, qualifiers: e => new { e.LastName, e.BirthDate });
}

BulkUpdate

The codes below only showcasing the BulkUpdate via IEnumerable<T> parameter.

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkUpdate<Customer>(customers);
}

Or with qualifiers

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkUpdate<Customer>(customers, qualifiers: e => new { e.LastName, e.BirthDate });
}

Or via table-name

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkUpdate("Customer", customers);
}

Or via table-name with qualifiers

using (var connection = new SqlConnection(ConnectionString))
{
    var customers = GetCustomers();
    var qualifiers = Field.From("LastName", "BirthDate");
    var rows = connection.BulkUpdate("Customer", customers, qualifiers: qualifiers);
}

As mentioned in the earlier sections, both the System.Data.Common.DbDataReader and System.DataTable objects are also supported.

When to use the Batch and Bulk Operations?

There is no standard of when to use what. It all depends on your situation. Consider the following criteria when deciding.

No of Rows/Columns
Network Latency
Type of Data (Blob, etc)

The most common factor to choose the bulk operations over the batch operations are the number of rows. We usually use the batch operations if the rows we are processing are above 1K, otherwise, we are using the batch operations.

It is also good to understand the advantages of one from the other. The pros of using the bulk operation is performance, nothing else. The cons is, it overrides the operations of the underlying database. It also keeps blocking the target table while being under the bulk operations transaction.

Summary

By using the RepoDb.SqlServer.BulkOperations features, you will have the following.

It gives you maximum performance when processing the large datasets.
You do not need to write the complex implementation of the bulk operations. Every call is a single-line of code.
The library is reviewed by the OSS community.
It is tested and is used by the large number of developers.
High-quality. Packed with real-world Unit and Integration Tests.
Simple implementation. You code will be easy to operate and maintain.

Thank you for reading this article.

We are asking your help to please support this repository and solution. Your stars to our Github page is very valuable.