DEV Community

Xavier Abelaira Rueda
Xavier Abelaira Rueda

Posted on

Mechanisms and Performance when querying data to SQLServer from C#

๐Ÿ“ƒ Introduction

SQL database management systems are widely used and, when interacting with the data from our applications, we know that performance is a critical element to take into consideration.

How to interact with them, knowing the different alternatives and best practices to apply in these scenarios are key elements to have performant applications, avoiding bottlenecks in our interactions with the database.

โœ… Goal

We are going to benchmark different ways of retrieving data in common query scenarios, comparing the performance in each of them.

For this purpose we are going to use our beloved BenchmarkDotNet tool.

In addition, since we don't need long run benchmarks, and we want to avoid Identity resolution from EFCore, specially for a proper comparision between tracked and non tracked queries, we will use [ShortRunJob] option for these executions, also considering all the details that can affect the different scenarios in a real project / environment.

๐Ÿ” Scenario

Description

For each of the benchmark's scenario, we will test with the following preloaded data:

  • 100_000 total persons
  • 100_000 persons x 10 addresses = 1_000_000 total addresses
  • 100_000 persons x 100 posts = 10_000_000 total posts

Data model

Data model for these benchmarks

Querying from a single table

In our first scenario, we are going to check the simplest situation where we need to retrieve paginated data from a single table.

We will have two scenarios:

  • Retrieving 100 rows
  • Retrieving 1000 rows

Retrieving paginaged rows tracking entities with EFCore.

This is the most common scenario to see using EFCore and not worrying about the change tracker. Despite the fact that over the paper there will always be a decrease in performance compared to using it without tracking (AsNoTracking()), it should be mentioned that in certain scenarios, EFCore's Identity Resolution manages to match the performance of both, although they are not usually common scenarios, especially in the REST API development since it's not common to retrieve the same data over and over again within the same request scope.

var data = _dbContext.Persons
    .Take(PageSize)
    .ToList();
Enter fullscreen mode Exit fullscreen mode

This is translated from EFCore like this:

SELECT TOP(@__p_0) 
    p.Id, p.Bio, p.BirthDate, p.Email, p.Name, p.OriginId, p.PhoneNumber, p.Surname
FROM Persons_Guid AS p
Enter fullscreen mode Exit fullscreen mode
Rows Mean Allocated
100 17.627 ms 753.35 KB
1000 177.532 ms 6492.23 KB

Retrieving paginated rows using AsNoTracking() with EFCore.

Similar to the one above, but in this case we specify to EFCore that we don't want to track the entities retrived. This is specially useful when we need this information back for readonly purposes. In this case, without the overhead from EFCore tracking them, the performance boost is high.

var data = _dbContext.Persons
    .AsNoTracking()
    .Take(PageSize)
    .ToList();
Enter fullscreen mode Exit fullscreen mode

Purely from SQL perspective, EFCore translates it identically to the previous example:

SELECT TOP(@__p_0) 
    p.Id, p.Bio, p.BirthDate, p.Email, p.Name, p.OriginId, p.PhoneNumber, p.Surname
FROM Persons_Guid AS p
Enter fullscreen mode Exit fullscreen mode
Rows Mean Allocated
100 3.375 ms 324.26 KB
1000 9.093 ms 2196.32 KB

Retrieving paginated rows using a DTO projection with EFCore.

In this case we are going to use a projection through a LINQ Select statement. This scenario, since is not returning any Entity, tracking of course is disabled, and avoiding client-side evaluation we have very good performance specially for DTOs direct transformation. In our scenario we are mapping a DTO with the same number of properties that the original Entity has.
This could not be the best scenario, since it's specially useful when creating plain DTO's from more than one Entity, however it will be enough to get the idea.

var data = _dbContext.Persons
    .Select(x => new PersonDto
    {
        Id = x.Id,
        OriginId = x.OriginId,
        Name = x.Name,
        Surname = x.Surname,
        Email = x.Email,
        PhoneNumber = x.PhoneNumber,
        BirthDate = x.BirthDate,
        Bio = x.Bio
    })
    .Take(PageSize)
    .ToList();    
Enter fullscreen mode Exit fullscreen mode

Again, from SQL perspective, this is translated from EF Core similarly than the previous examples (with the difference of the columns ordering):

SELECT TOP(@__p_0)
    p.Id, p.OriginId, p.Name, p.Surname, p.Email, p.PhoneNumber, p.BirthDate, p.Bio
FROM Persons_Guid AS p
Enter fullscreen mode Exit fullscreen mode
Rows Mean Allocated
100 3.422 ms 324.26 KB
1000 8.966 ms 2162.07 KB

Retrieving paginated rows using a FromSql / FromSqlInterpolated from EFCore.

Given that EFCore also allow us to send RAW SQL queries, also parametrized, we can then try how it performs, and the important thing here is that since what is returned back is an Entity, is also being tracked by default by EFCore Change Tracker.

var data = _dbContext.Persons
     .FromSqlInterpolated
     (
         $"SELECT TOP ({PageSize}) [Id],[OriginId],[Name],[Surname],[BirthDate],[Bio],[Email],[PhoneNumber] FROM [dbo].[Persons_Guid]"
     )
     .ToList();   
Enter fullscreen mode Exit fullscreen mode

No need to specify a translation from EFCore in this case, so, using the default (tracked) mechanism:

Rows Mean Allocated
100 17.498 ms 755.45 KB
1000 153.713 ms 6490.82 KB

Same rule applies when adding AsNoTracking() for performance improvement. It's then visible than the performance is equally as the one using a classic LINQ Select statement.

Retrieving paginated rows using a view mapped from EFCore as non-indexed entity.

Perhaps it is the least useful case but we would also want to measure how EFCore performs when it has to retrieve simple data from a table through a View, so in case you need it or have it in your project this is the result:

Having this View:

CREATE VIEW PersonsSimpleView
AS
SELECT 
    [Id], [OriginId], [Name], [Surname], [BirthDate], [Bio], [Email], [PhoneNumber]
FROM [dbo].[Persons_Guid]
Enter fullscreen mode Exit fullscreen mode

Invoked like:

//PersonSimpleQuery is the name given to this View mapping
var data = _dbContext.PersonSimpleQuery
    .Take(PageSize)
    .ToList();    
Enter fullscreen mode Exit fullscreen mode

This is translated from EFCore like this:

SELECT TOP(@__p_0) 
    [p].[Id], [p].[Bio], [p].[BirthDate], [p].[Email], [p].[Name], [p].[OriginId], [p].[PhoneNumber], [p].[Surname]
FROM [PersonsSimpleView] AS [p]
Enter fullscreen mode Exit fullscreen mode
Rows Mean Allocated
100 17.456 ms 760.53 KB
1000 152.410 ms 6551.15 KB

Retrieving paginated rows using a Stored Procedure invoked from EFCore.

Another option is to have a predefined Stored Procedure to retrieve the data. It is not recommended to do, but in case you need it or have it, the performance is very similar to the one with Views.

Having this Store Procedure:

CREATE OR ALTER PROC GetPersons(@rows INT) 
AS 
SELECT 
    [Id], [OriginId], [Name], [Surname], [BirthDate], [Bio], [Email], [PhoneNumber]
FROM [dbo].[Persons_Guid] WITH(NOLOCK)
ORDER BY [Id]
OFFSET 0 ROWS
FETCH NEXT @rows ROWS ONLY
Enter fullscreen mode Exit fullscreen mode

Invoked like:

var data = _dbContext.PersonSimpleQuery
    .FromSqlInterpolated
    (
        $"[dbo].[GetPersons] {PageSize}"
    )
    .ToList(); 
Enter fullscreen mode Exit fullscreen mode

This is translated from EFCore like this:

[dbo].[GetPersons] @p0
Enter fullscreen mode Exit fullscreen mode
Rows Mean Allocated
100 19.39 ms 753.39 KB
1000 167.47 ms 6492.67 KB

Retrieving paginated rows using a SQL query executed with Dapper.

And now is when we reach the most performant option, in this case using Dapper. With Dapper we can send directly the SQL query parametrized or not having amazing performance, at the expense of losing the type safety benefits that EFCore provides as an ORM.

In readonly scenarios where performance and responsiviness is key, this option is the preferable one making it up to 3x faster to retrieve the same data compared with the fastest EFCore option. It's also worth mentioning the memory allocation is also quite awesome being much less than EFCore in similar operations.

SQL invokation through Dapper:

var sql = $@"
    SELECT top ({PageSize})
        [Id], [OriginId], [Name], [Surname], [BirthDate], [Bio], [Email], [PhoneNumber]
    FROM [dbo].[Persons_Guid]
    ";

using var connection = new SqlConnection(_connectionString);

var persons = connection.Query<PersonDto>(sql).ToList(); 
Enter fullscreen mode Exit fullscreen mode
Rows Mean Allocated
100 1.281 ms 195.89 KB
1000 5.655 ms 1947.13 KB

Querying having nested data

In our second scenario, we are going to check another common situation where we need to retrieve paginated data for an specific Entity with the nested data.

We will have again two scenarios:

  • Retrieving 100 rows
  • Retrieving 1000 rows

In both sceanarios, each Entity will have 10 elements for each nested collection (Addesses and Posts)

In this case we will then discard Views, SQL queries from EFCore and Store procedures since we checked in the previous example how to work with them and the performance that these solutions may have.

Retrieving paginated rows joining tables using EFCore with LINQ.

In this common scenario, we request 100 elements by playing in turn with all the possibilities that EFCore allows with or without AsNoTracking() and AsSplitQuery(). Two nested collections are retrieved at the same time.

Scenario using both of them at the same time

var data = SqlServerDbTestDataContextHelper.New.DbContext.Persons
    .Include(x => x.Addresses)
    .Include(x => x.Posts)
    .AsNoTracking()
    .AsSplitQuery()
    .Take(PageSize)
    .ToList();
Enter fullscreen mode Exit fullscreen mode

When translated without splitting queries

SELECT 
    [t].[Id], [t].[Bio], [t].[BirthDate], [t].[Email], [t].[Name], [t].[OriginId], [t].[PhoneNumber], [t].[Surname], [a].[Id], [a].[CityId], [a].[Line], [a].[PersonId], [a].[Sequence], [p0].[Id], [p0].[AuthorId], [p0].[Text], [p0].[Title]
FROM (
    SELECT TOP(@__p_0) [p].[Id], [p].[Bio], [p].[BirthDate], [p].[Email], [p].[Name], [p].[OriginId], [p].[PhoneNumber], [p].[Surname]
        FROM [Persons_Guid] AS [p]
        ) 
AS [t]
    LEFT JOIN [Addresses_Guid] AS [a] ON [t].[Id] = [a].[PersonId]
    LEFT JOIN [Posts_Guid] AS [p0] ON [t].[Id] = [p0].[AuthorId]
ORDER BY [t].[Id], [a].[Id]
Enter fullscreen mode Exit fullscreen mode

And using AsSplitQuery()

SELECT TOP(@__p_0) [p].[Id], [p].[Bio], [p].[BirthDate], [p].[Email], [p].[Name], [p].[OriginId], [p].[PhoneNumber], [p].[Surname]
FROM [Persons_Guid] AS [p]
ORDER BY [p].[Id]


SELECT [a].[Id], [a].[CityId], [a].[Line], [a].[PersonId], [a].[Sequence], [t].[Id]
FROM (
    SELECT TOP(@__p_0) [p].[Id]
    FROM [Persons_Guid] AS [p]
     ) 
AS [t]
    INNER JOIN [Addresses_Guid] AS [a] ON [t].[Id] = [a].[PersonId]
ORDER BY [t].[Id]


SELECT [p0].[Id], [p0].[AuthorId], [p0].[Text], [p0].[Title], [t].[Id]
FROM (
    SELECT TOP(@__p_0) [p].[Id]
    FROM [Persons_Guid] AS [p]
      ) 
AS [t]
    INNER JOIN [Posts_Guid] AS [p0] ON [t].[Id] = [p0].[AuthorId]
ORDER BY [t].[Id]
Enter fullscreen mode Exit fullscreen mode
AsNoTracking() AsSplitQuery() Rows Mean Allocated
โœ–๏ธ โœ–๏ธ 100 3,219.56 ms 255.52 MB
โœ”๏ธ โœ–๏ธ 100 1,300.48 ms 213.31 MB
โœ–๏ธ โœ”๏ธ 100 1,811.96 ms 59.98 MB
โœ”๏ธ โœ”๏ธ 100 81.09 ms 18.21 MB

โ„น๏ธ Note that the performance it's radically better using both methods, and especially the absence of tracking from EFCore is what has the most positive impact on performance

Retrieving paginated rows joining tables using a DTO projection with EFCore.

In this case, we will be retrieving information from several tables projected into a DTO before client's side evaluation, so EFCore smartly translates the statement to a single SQL having a very good performance, especially considering that we only retrieve the necessary data, we avoid mapping / iterating a second time between domain models and DTOs, and collaterally, since the models projected are not entities, EFCore does not perform any type of tracking on them.

It is therefore the best option for read-only models in our project if we don't want to deal with Dapper.

Projection code using EFCore:

var data = SqlServerDbTestDataContextHelper.New.DbContext.Persons
    .Select(x => new PersonDto
    {
        Id = x.Id,
        OriginId = x.OriginId,
        Name = x.Name,
        Surname = x.Surname,
        Email = x.Email,
        PhoneNumber = x.PhoneNumber,
        BirthDate = x.BirthDate,
        Bio = x.Bio,
        Addresses = x.Addresses.Select(y => new AddressDto { Id = y.Id, CityId = y.CityId, Line = y.Line, Sequence = y.Sequence }),
        Posts = x.Posts.Select(y => new PostDto { Id = y.Id, Text = y.Text, Title = y.Title })
    })
    .Take(PageSize)
    .ToList();
Enter fullscreen mode Exit fullscreen mode

This is translated to SQL from EFCore to something like this:

SELECT [t].[Id], [t].[OriginId], [t].[Name], [t].[Surname], [t].[Email], [t].[PhoneNumber], [t].[BirthDate], [t].[Bio], [a].[Id], [a].[CityId], [a].[Line], [a].[Sequence], [p0].[Id], [p0].[Text], [p0].[Title]
FROM (
    SELECT TOP(@__p_0) [p].[Id], [p].[OriginId], [p].[Name], [p].[Surname], [p].[Email], [p].[PhoneNumber], [p].[BirthDate], [p].[Bio]
    FROM [Persons_Guid] AS [p]
    ) 
AS [t]
    LEFT JOIN [Addresses_Guid] AS [a] ON [t].[Id] = [a].[PersonId]
    LEFT JOIN [Posts_Guid] AS [p0] ON [t].[Id] = [p0].[AuthorId]
ORDER BY [t].[Id], [a].[Id]
Enter fullscreen mode Exit fullscreen mode
Rows Mean Allocated
100 1,226.3537 ms 1218030776 B
1000 96,768.088 ms 2217682664 B

Retrieving paginated rows joining tables using raw SQL query with Dapper

Before finishing with the paginated queries, of course, we have to evaluate the performance of Dapper with a query that retrieves the same data from the previous scenario.

In this case we see that the performance is much higher again, mainly by avoiding the translation to SQL.

SQL statement invokation from C# with Dapper:

var sql = $@"
    SELECT top ({PageSize})
        persons.[Id] as [PersonId], persons.[OriginId], persons.[Name], persons.[Surname], persons.[BirthDate], persons.[Bio], persons.[Email], persons.[PhoneNumber],
        addresses.[Id] as [AddressId], addresses.[CityId], addresses.[Line], addresses.[Sequence],
        posts.[Id] as [PostId], posts.[Text], posts.[Title]
    FROM [dbo].[Persons_Guid] persons
        join [dbo].[Addresses_Guid] addresses on persons.Id = addresses.PersonId
        join [dbo].[Posts_Guid] posts on persons.Id = posts.AuthorId
    ";

using (var connection = new SqlConnection(SqlServerDbTestDataContextHelper.TestDbConnectionString))
{
    var persons = connection.Query<PersonFullView>(sql).ToList();
}
Enter fullscreen mode Exit fullscreen mode
Rows Mean Allocated
100 1.6398 ms 384083 B
1000 11.6887 ms 3815858 B

Retrieving single entity with joining tables using LINQ FirstAsync with EFCore

Again, in this case we start with the more common case where we need to return a single filtered entity, with all its associated data in two of its collections.

We also compare the performance with AsNoTracking() and AsSplitQuery() separately and together. We will also check the performance using Explicit loading, that in any case have a similar behaviour in this case than AsSplitQuery() when being tracked.

Since the arrival of AsSplitQuery(), explicit loading has been in the background, being less comprehensible in code and being limited to loading on a single row. However, the idea of both is the same: to avoid the cartesian explosion by separating the queries of the parent element from those of the nested data.

Code example combining both:

var data = await SqlServerDbTestDataContextHelper.New.DbContext.Persons
    .Include(x => x.Addresses)
    .Include(x => x.Posts)
    .AsNoTracking()
    .AsSplitQuery()
    .FirstAsync();
Enter fullscreen mode Exit fullscreen mode

In the case of a single query, this is the translation done by EFCore (Not using AsSplitQuery() or explicit loading)

SELECT [t].[Id], [t].[Bio], [t].[BirthDate], [t].[Email], [t].[Name], [t].[OriginId], [t].[PhoneNumber], [t].[Surname], [a].[Id], [a].[CityId], [a].[Line], [a].[PersonId], [a].[Sequence], [p0].[Id], [p0].[AuthorId], [p0].[Text], [p0].[Title]
FROM (
    SELECT TOP(1) [p].[Id], [p].[Bio], [p].[BirthDate], [p].[Email], [p].[Name], [p].[OriginId], [p].[PhoneNumber], [p].[Surname]
    FROM [Persons_Guid] AS [p]
    )
AS [t]
    LEFT JOIN [Addresses_Guid] AS [a] ON [t].[Id] = [a].[PersonId]
    LEFT JOIN [Posts_Guid] AS [p0] ON [t].[Id] = [p0].[AuthorId]
ORDER BY [t].[Id], [a].[Id]
Enter fullscreen mode Exit fullscreen mode

and in the case of using AsSplitQuery() or explicit loading:

SELECT TOP(1) [p].[Id], [p].[Bio], [p].[BirthDate], [p].[Email], [p].[Name], [p].[OriginId], [p].[PhoneNumber], [p].[Surname]
FROM [Persons_Guid] AS [p]

SELECT a.Id, a.CityId, a.Line, a.PersonId, a.Sequence
FROM Addresses_Guid AS a
WHERE a.PersonId == @__get_Item_0)

SELECT p.Id, p.AuthorId, p.Text, p.Title
FROM Posts_Guid AS p
WHERE p.AuthorId == @__get_Item_0)

Enter fullscreen mode Exit fullscreen mode
AsNoTracking() AsSplitQuery() Explicit loading Mean Allocated
โœ–๏ธ โœ–๏ธ โœ–๏ธ 34.2348 ms 5962998 B
โœ”๏ธ โœ–๏ธ โœ–๏ธ 17.8125 ms 5530597 B
โœ–๏ธ โœ”๏ธ โœ–๏ธ 22.6501 ms 829366 B
โœ”๏ธ โœ”๏ธ โœ–๏ธ 6.0082 ms 401577 B
โœ–๏ธ โœ–๏ธ โœ”๏ธ 25.4532 ms 806981 B

โ„น๏ธ Note that the performance it's radically better using both methods, and especially the absence of tracking from EFCore is what has the most positive impact on performance

Explicit loading has a bit of overhead comparing to AsSplitQuery() both very similar in terms of performance, just as expected initially

Retrieving single element with joining tables using a projection with EFCore

Again a mandatory test would be to evaluate the performance of EFCore translating our LINQ query to SQL for a single element with its nested data.

Again, a very good option for managing read-only queries transformed directly to their corresponding DTO's without the need of dealing with Dapper.

Projection code using EFCore:

var data = SqlServerDbTestDataContextHelper.Instance.DbContext.Persons
     .Select(x => new PersonDto
     {
         Id = x.Id,
         OriginId = x.OriginId,
         Name = x.Name,
         Surname = x.Surname,
         Email = x.Email,
         PhoneNumber = x.PhoneNumber,
         BirthDate = x.BirthDate,
         Bio = x.Bio,
         Addresses = x.Addresses.Select(y => new AddressDto { Id = y.Id, CityId = y.CityId, Line = y.Line, Sequence = y.Sequence }),
         Posts = x.Posts.Select(y => new PostDto { Id = y.Id, Text = y.Text, Title = y.Title })
    })
    .First(x => x.Id == id);
Enter fullscreen mode Exit fullscreen mode

This is translated to SQL from EFCore to something like this:

SELECT [t].[Id], [t].[OriginId], [t].[Name], [t].[Surname], [t].[Email], [t].[PhoneNumber], [t].[BirthDate], [t].[Bio], [a].[Id], [a].[CityId], [a].[Line], [a].[Sequence], [p0].[Id], [p0].[Text], [p0].[Title]
FROM (
    SELECT TOP(1) [p].[Id], [p].[OriginId], [p].[Name], [p].[Surname], [p].[Email], [p].[PhoneNumber], [p].[BirthDate], [p].[Bio]
    FROM [Persons_Guid] AS [p]
    WHERE [p].[Id] = @__id_0
    ) 
AS [t]
    LEFT JOIN [Addresses_Guid] AS [a] ON [t].[Id] = [a].[PersonId]
    LEFT JOIN [Posts_Guid] AS [p0] ON [t].[Id] = [p0].[AuthorId]
ORDER BY [t].[Id], [a].[Id]
Enter fullscreen mode Exit fullscreen mode
Mean Allocated
18.6125 ms 4617402 B

Retrieving single element with joining tables using raw SQL multi-query with Dapper

Finally, our latest test puts one of Dapper's most interesting features under the microscope: its ability to launch multiple queries at once and be mapped.

This is useful to compare the performance when loading the same data from the previous scenario, although dumped in a DTO model.

Again, the peformance demonstrated with this approach is simply astonishing.

SQL queries invocation through Dapper:

var sql = $@"
    SELECT [Id], [OriginId], [Name], [Surname], [BirthDate], [Bio], [Email], [PhoneNumber] FROM [dbo].[Persons_Guid] WHERE Id = @PersonId;

    SELECT [Id], [CityId], [Line], [Sequence] FROM [dbo].[Addresses_Guid] WHERE [PersonId] = @PersonId;

    SELECT [Id], [Title], [Text] FROM [dbo].[Posts_Guid] WHERE [AuthorId] = @PersonId
    ";

using (var connection = new SqlConnection(SqlServerDbTestDataContextHelper.TestDbConnectionString))
{
    using (var multiQuery = await connection.QueryMultipleAsync(sql, new { PersonId = id }))
    {
         var person = (await multiQuery.ReadAsync<PersonDto>()).First();
         var addresses = (await multiQuery.ReadAsync<AddressDto>()).ToList();
         var posts = (await multiQuery.ReadAsync<PostDto>()).ToList();

        person.Posts = posts;
        person.Addresses = addresses;
    }
}
Enter fullscreen mode Exit fullscreen mode
Mean Allocated
0.7817 ms 111833 B

๐Ÿ“’ Conclusion

On the one hand, during this extensive post we have been able to see different ways of querying data with both EFCore and Dapper, as well as deep diving on the details of the fuctionality and performance offered by each of these solutions.

Applying one option, another or both will depend, as always, on the needs of each project, however as a general rule of thumb, with EFCore we can fully manage the needs of a project, also taking advantage of the additional collateral advantages that it allows us such as migrations management and safe typing in entity-table mappings. However, it will require making good use of the different features that it provides at the query level such as AsNoTracking() or AsSplitQuery(), or making explicit loading according to the needs of each situation.

On the other hand, however, in higher systems, with large volumes of data, especially those where performance is critical, Dapper offers unmatched performance and low memory load, however it will require extra control at testing level with integration and/or end-to-end tests so that model changes do not impact previously defined queries.

I hope I have been able to provide ideas, tips and useful information in this post about how to take advantage of both tools in this context.

If you liked it, please give me a โญ and follow me ๐Ÿ˜‰.

๐Ÿ“‰ All results

To carry out this comparison, innumerable executions were released, many of them required specific specifications either due to problems with Windows Defender or due to the execution time, so I took the opportunity to leave all the important ones reported. Depending on the type of execution, the results may vary, however they are all really valid to establish some metrics and compare performance.

๐Ÿ”— Resources

The entire code can be found here; and the corresponding report here

Useful links

Top comments (0)