π Introduction
SQL database management systems are widely used and, when interacting with the data from our applications, we know that performance is a critical element to take into consideration.
How to interact with them, knowing the different alternatives and best practices to apply in these scenarios are key elements to have performant applications, avoiding bottlenecks in our interactions with the database.
β Goal
We are going to benchmark different ways of retrieving data in common query scenarios, comparing the performance in each of them.
For this purpose we are going to use our beloved BenchmarkDotNet tool.
In addition, since we don't need long run benchmarks, and we want to avoid Identity resolution from EFCore, specially for a proper comparision between tracked and non tracked queries, we will use [ShortRunJob]
option for these executions, also considering all the details that can affect the different scenarios in a real project / environment.
π Scenario
Description
For each of the benchmark's scenario, we will test with the following preloaded data:
- 100_000 total persons
- 100_000 persons x 10 addresses = 1_000_000 total addresses
- 100_000 persons x 100 posts = 10_000_000 total posts
Data model
Querying from a single table
In our first scenario, we are going to check the simplest situation where we need to retrieve paginated data from a single table.
We will have two scenarios:
- Retrieving 100 rows
- Retrieving 1000 rows
Retrieving paginaged rows tracking entities with EFCore.
This is the most common scenario to see using EFCore and not worrying about the change tracker. Despite the fact that over the paper there will always be a decrease in performance compared to using it without tracking (AsNoTracking()
), it should be mentioned that in certain scenarios, EFCore's Identity Resolution manages to match the performance of both, although they are not usually common scenarios, especially in the REST API development since it's not common to retrieve the same data over and over again within the same request scope.
var data = _dbContext.Persons
.Take(PageSize)
.ToList();
This is translated from EFCore like this:
SELECT TOP(@__p_0)
p.Id, p.Bio, p.BirthDate, p.Email, p.Name, p.OriginId, p.PhoneNumber, p.Surname
FROM Persons_Guid AS p
Rows | Mean | Allocated |
---|---|---|
100 | 17.627 ms | 753.35 KB |
1000 | 177.532 ms | 6492.23 KB |
Retrieving paginated rows using AsNoTracking()
with EFCore.
Similar to the one above, but in this case we specify to EFCore that we don't want to track the entities retrived. This is specially useful when we need this information back for readonly purposes. In this case, without the overhead from EFCore tracking them, the performance boost is high.
var data = _dbContext.Persons
.AsNoTracking()
.Take(PageSize)
.ToList();
Purely from SQL perspective, EFCore translates it identically to the previous example:
SELECT TOP(@__p_0)
p.Id, p.Bio, p.BirthDate, p.Email, p.Name, p.OriginId, p.PhoneNumber, p.Surname
FROM Persons_Guid AS p
Rows | Mean | Allocated |
---|---|---|
100 | 3.375 ms | 324.26 KB |
1000 | 9.093 ms | 2196.32 KB |
Retrieving paginated rows using a DTO projection with EFCore.
In this case we are going to use a projection through a LINQ Select statement. This scenario, since is not returning any Entity, tracking of course is disabled, and avoiding client-side evaluation we have very good performance specially for DTOs direct transformation. In our scenario we are mapping a DTO with the same number of properties that the original Entity has.
This could not be the best scenario, since it's specially useful when creating plain DTO's from more than one Entity, however it will be enough to get the idea.
var data = _dbContext.Persons
.Select(x => new PersonDto
{
Id = x.Id,
OriginId = x.OriginId,
Name = x.Name,
Surname = x.Surname,
Email = x.Email,
PhoneNumber = x.PhoneNumber,
BirthDate = x.BirthDate,
Bio = x.Bio
})
.Take(PageSize)
.ToList();
Again, from SQL perspective, this is translated from EF Core similarly than the previous examples (with the difference of the columns ordering):
SELECT TOP(@__p_0)
p.Id, p.OriginId, p.Name, p.Surname, p.Email, p.PhoneNumber, p.BirthDate, p.Bio
FROM Persons_Guid AS p
Rows | Mean | Allocated |
---|---|---|
100 | 3.422 ms | 324.26 KB |
1000 | 8.966 ms | 2162.07 KB |
Retrieving paginated rows using a FromSql / FromSqlInterpolated from EFCore.
Given that EFCore also allow us to send RAW SQL queries, also parametrized, we can then try how it performs, and the important thing here is that since what is returned back is an Entity, is also being tracked by default by EFCore Change Tracker.
var data = _dbContext.Persons
.FromSqlInterpolated
(
$"SELECT TOP ({PageSize}) [Id],[OriginId],[Name],[Surname],[BirthDate],[Bio],[Email],[PhoneNumber] FROM [dbo].[Persons_Guid]"
)
.ToList();
No need to specify a translation from EFCore in this case, so, using the default (tracked) mechanism:
Rows | Mean | Allocated |
---|---|---|
100 | 17.498 ms | 755.45 KB |
1000 | 153.713 ms | 6490.82 KB |
Same rule applies when adding AsNoTracking()
for performance improvement. It's then visible than the performance is equally as the one using a classic LINQ Select statement.
Retrieving paginated rows using a view mapped from EFCore as non-indexed entity.
Perhaps it is the least useful case but we would also want to measure how EFCore performs when it has to retrieve simple data from a table through a View, so in case you need it or have it in your project this is the result:
Having this View:
CREATE VIEW PersonsSimpleView
AS
SELECT
[Id], [OriginId], [Name], [Surname], [BirthDate], [Bio], [Email], [PhoneNumber]
FROM [dbo].[Persons_Guid]
Invoked like:
//PersonSimpleQuery is the name given to this View mapping
var data = _dbContext.PersonSimpleQuery
.Take(PageSize)
.ToList();
This is translated from EFCore like this:
SELECT TOP(@__p_0)
[p].[Id], [p].[Bio], [p].[BirthDate], [p].[Email], [p].[Name], [p].[OriginId], [p].[PhoneNumber], [p].[Surname]
FROM [PersonsSimpleView] AS [p]
Rows | Mean | Allocated |
---|---|---|
100 | 17.456 ms | 760.53 KB |
1000 | 152.410 ms | 6551.15 KB |
Retrieving paginated rows using a Stored Procedure invoked from EFCore.
Another option is to have a predefined Stored Procedure to retrieve the data. It is not recommended to do, but in case you need it or have it, the performance is very similar to the one with Views.
Having this Store Procedure:
CREATE OR ALTER PROC GetPersons(@rows INT)
AS
SELECT
[Id], [OriginId], [Name], [Surname], [BirthDate], [Bio], [Email], [PhoneNumber]
FROM [dbo].[Persons_Guid] WITH(NOLOCK)
ORDER BY [Id]
OFFSET 0 ROWS
FETCH NEXT @rows ROWS ONLY
Invoked like:
var data = _dbContext.PersonSimpleQuery
.FromSqlInterpolated
(
$"[dbo].[GetPersons] {PageSize}"
)
.ToList();
This is translated from EFCore like this:
[dbo].[GetPersons] @p0
Rows | Mean | Allocated |
---|---|---|
100 | 19.39 ms | 753.39 KB |
1000 | 167.47 ms | 6492.67 KB |
Retrieving paginated rows using a SQL query executed with Dapper.
And now is when we reach the most performant option, in this case using Dapper. With Dapper we can send directly the SQL query parametrized or not having amazing performance, at the expense of losing the type safety benefits that EFCore provides as an ORM.
In readonly scenarios where performance and responsiviness is key, this option is the preferable one making it up to 3x faster to retrieve the same data compared with the fastest EFCore option. It's also worth mentioning the memory allocation is also quite awesome being much less than EFCore in similar operations.
SQL invokation through Dapper:
var sql = $@"
SELECT top ({PageSize})
[Id], [OriginId], [Name], [Surname], [BirthDate], [Bio], [Email], [PhoneNumber]
FROM [dbo].[Persons_Guid]
";
using var connection = new SqlConnection(_connectionString);
var persons = connection.Query<PersonDto>(sql).ToList();
Rows | Mean | Allocated |
---|---|---|
100 | 1.281 ms | 195.89 KB |
1000 | 5.655 ms | 1947.13 KB |
Querying having nested data
In our second scenario, we are going to check another common situation where we need to retrieve paginated data for an specific Entity with the nested data.
We will have again two scenarios:
- Retrieving 100 rows
- Retrieving 1000 rows
In both sceanarios, each Entity will have 10 elements for each nested collection (Addesses and Posts)
In this case we will then discard Views, SQL queries from EFCore and Store procedures since we checked in the previous example how to work with them and the performance that these solutions may have.
Retrieving paginated rows joining tables using EFCore with LINQ.
In this common scenario, we request 100 elements by playing in turn with all the possibilities that EFCore allows with or without AsNoTracking()
and AsSplitQuery()
. Two nested collections are retrieved at the same time.
Scenario using both of them at the same time
var data = SqlServerDbTestDataContextHelper.New.DbContext.Persons
.Include(x => x.Addresses)
.Include(x => x.Posts)
.AsNoTracking()
.AsSplitQuery()
.Take(PageSize)
.ToList();
When translated without splitting queries
SELECT
[t].[Id], [t].[Bio], [t].[BirthDate], [t].[Email], [t].[Name], [t].[OriginId], [t].[PhoneNumber], [t].[Surname], [a].[Id], [a].[CityId], [a].[Line], [a].[PersonId], [a].[Sequence], [p0].[Id], [p0].[AuthorId], [p0].[Text], [p0].[Title]
FROM (
SELECT TOP(@__p_0) [p].[Id], [p].[Bio], [p].[BirthDate], [p].[Email], [p].[Name], [p].[OriginId], [p].[PhoneNumber], [p].[Surname]
FROM [Persons_Guid] AS [p]
)
AS [t]
LEFT JOIN [Addresses_Guid] AS [a] ON [t].[Id] = [a].[PersonId]
LEFT JOIN [Posts_Guid] AS [p0] ON [t].[Id] = [p0].[AuthorId]
ORDER BY [t].[Id], [a].[Id]
And using AsSplitQuery()
SELECT TOP(@__p_0) [p].[Id], [p].[Bio], [p].[BirthDate], [p].[Email], [p].[Name], [p].[OriginId], [p].[PhoneNumber], [p].[Surname]
FROM [Persons_Guid] AS [p]
ORDER BY [p].[Id]
SELECT [a].[Id], [a].[CityId], [a].[Line], [a].[PersonId], [a].[Sequence], [t].[Id]
FROM (
SELECT TOP(@__p_0) [p].[Id]
FROM [Persons_Guid] AS [p]
)
AS [t]
INNER JOIN [Addresses_Guid] AS [a] ON [t].[Id] = [a].[PersonId]
ORDER BY [t].[Id]
SELECT [p0].[Id], [p0].[AuthorId], [p0].[Text], [p0].[Title], [t].[Id]
FROM (
SELECT TOP(@__p_0) [p].[Id]
FROM [Persons_Guid] AS [p]
)
AS [t]
INNER JOIN [Posts_Guid] AS [p0] ON [t].[Id] = [p0].[AuthorId]
ORDER BY [t].[Id]
AsNoTracking() | AsSplitQuery() | Rows | Mean | Allocated |
---|---|---|---|---|
βοΈ | βοΈ | 100 | 3,219.56 ms | 255.52 MB |
βοΈ | βοΈ | 100 | 1,300.48 ms | 213.31 MB |
βοΈ | βοΈ | 100 | 1,811.96 ms | 59.98 MB |
βοΈ | βοΈ | 100 | 81.09 ms | 18.21 MB |
βΉοΈ Note that the performance it's radically better using both methods, and especially the absence of tracking from EFCore is what has the most positive impact on performance
Retrieving paginated rows joining tables using a DTO projection with EFCore.
In this case, we will be retrieving information from several tables projected into a DTO before client's side evaluation, so EFCore smartly translates the statement to a single SQL having a very good performance, especially considering that we only retrieve the necessary data, we avoid mapping / iterating a second time between domain models and DTOs, and collaterally, since the models projected are not entities, EFCore does not perform any type of tracking on them.
It is therefore the best option for read-only models in our project if we don't want to deal with Dapper.
Projection code using EFCore:
var data = SqlServerDbTestDataContextHelper.New.DbContext.Persons
.Select(x => new PersonDto
{
Id = x.Id,
OriginId = x.OriginId,
Name = x.Name,
Surname = x.Surname,
Email = x.Email,
PhoneNumber = x.PhoneNumber,
BirthDate = x.BirthDate,
Bio = x.Bio,
Addresses = x.Addresses.Select(y => new AddressDto { Id = y.Id, CityId = y.CityId, Line = y.Line, Sequence = y.Sequence }),
Posts = x.Posts.Select(y => new PostDto { Id = y.Id, Text = y.Text, Title = y.Title })
})
.Take(PageSize)
.ToList();
This is translated to SQL from EFCore to something like this:
SELECT [t].[Id], [t].[OriginId], [t].[Name], [t].[Surname], [t].[Email], [t].[PhoneNumber], [t].[BirthDate], [t].[Bio], [a].[Id], [a].[CityId], [a].[Line], [a].[Sequence], [p0].[Id], [p0].[Text], [p0].[Title]
FROM (
SELECT TOP(@__p_0) [p].[Id], [p].[OriginId], [p].[Name], [p].[Surname], [p].[Email], [p].[PhoneNumber], [p].[BirthDate], [p].[Bio]
FROM [Persons_Guid] AS [p]
)
AS [t]
LEFT JOIN [Addresses_Guid] AS [a] ON [t].[Id] = [a].[PersonId]
LEFT JOIN [Posts_Guid] AS [p0] ON [t].[Id] = [p0].[AuthorId]
ORDER BY [t].[Id], [a].[Id]
Rows | Mean | Allocated |
---|---|---|
100 | 1,226.3537 ms | 1218030776 B |
1000 | 96,768.088 ms | 2217682664 B |
Retrieving paginated rows joining tables using raw SQL query with Dapper
Before finishing with the paginated queries, of course, we have to evaluate the performance of Dapper with a query that retrieves the same data from the previous scenario.
In this case we see that the performance is much higher again, mainly by avoiding the translation to SQL.
SQL statement invokation from C# with Dapper:
var sql = $@"
SELECT top ({PageSize})
persons.[Id] as [PersonId], persons.[OriginId], persons.[Name], persons.[Surname], persons.[BirthDate], persons.[Bio], persons.[Email], persons.[PhoneNumber],
addresses.[Id] as [AddressId], addresses.[CityId], addresses.[Line], addresses.[Sequence],
posts.[Id] as [PostId], posts.[Text], posts.[Title]
FROM [dbo].[Persons_Guid] persons
join [dbo].[Addresses_Guid] addresses on persons.Id = addresses.PersonId
join [dbo].[Posts_Guid] posts on persons.Id = posts.AuthorId
";
using (var connection = new SqlConnection(SqlServerDbTestDataContextHelper.TestDbConnectionString))
{
var persons = connection.Query<PersonFullView>(sql).ToList();
}
Rows | Mean | Allocated |
---|---|---|
100 | 1.6398 ms | 384083 B |
1000 | 11.6887 ms | 3815858 B |
Retrieving single entity with joining tables using LINQ FirstAsync with EFCore
Again, in this case we start with the more common case where we need to return a single filtered entity, with all its associated data in two of its collections.
We also compare the performance with AsNoTracking()
and AsSplitQuery()
separately and together. We will also check the performance using Explicit loading, that in any case have a similar behaviour in this case than AsSplitQuery()
when being tracked.
Since the arrival of AsSplitQuery()
, explicit loading has been in the background, being less comprehensible in code and being limited to loading on a single row. However, the idea of both is the same: to avoid the cartesian explosion by separating the queries of the parent element from those of the nested data.
Code example combining both:
var data = await SqlServerDbTestDataContextHelper.New.DbContext.Persons
.Include(x => x.Addresses)
.Include(x => x.Posts)
.AsNoTracking()
.AsSplitQuery()
.FirstAsync();
In the case of a single query, this is the translation done by EFCore (Not using AsSplitQuery()
or explicit loading)
SELECT [t].[Id], [t].[Bio], [t].[BirthDate], [t].[Email], [t].[Name], [t].[OriginId], [t].[PhoneNumber], [t].[Surname], [a].[Id], [a].[CityId], [a].[Line], [a].[PersonId], [a].[Sequence], [p0].[Id], [p0].[AuthorId], [p0].[Text], [p0].[Title]
FROM (
SELECT TOP(1) [p].[Id], [p].[Bio], [p].[BirthDate], [p].[Email], [p].[Name], [p].[OriginId], [p].[PhoneNumber], [p].[Surname]
FROM [Persons_Guid] AS [p]
)
AS [t]
LEFT JOIN [Addresses_Guid] AS [a] ON [t].[Id] = [a].[PersonId]
LEFT JOIN [Posts_Guid] AS [p0] ON [t].[Id] = [p0].[AuthorId]
ORDER BY [t].[Id], [a].[Id]
and in the case of using AsSplitQuery()
or explicit loading:
SELECT TOP(1) [p].[Id], [p].[Bio], [p].[BirthDate], [p].[Email], [p].[Name], [p].[OriginId], [p].[PhoneNumber], [p].[Surname]
FROM [Persons_Guid] AS [p]
SELECT a.Id, a.CityId, a.Line, a.PersonId, a.Sequence
FROM Addresses_Guid AS a
WHERE a.PersonId == @__get_Item_0)
SELECT p.Id, p.AuthorId, p.Text, p.Title
FROM Posts_Guid AS p
WHERE p.AuthorId == @__get_Item_0)
AsNoTracking() | AsSplitQuery() | Explicit loading | Mean | Allocated |
---|---|---|---|---|
βοΈ | βοΈ | βοΈ | 34.2348 ms | 5962998 B |
βοΈ | βοΈ | βοΈ | 17.8125 ms | 5530597 B |
βοΈ | βοΈ | βοΈ | 22.6501 ms | 829366 B |
βοΈ | βοΈ | βοΈ | 6.0082 ms | 401577 B |
βοΈ | βοΈ | βοΈ | 25.4532 ms | 806981 B |
βΉοΈ Note that the performance it's radically better using both methods, and especially the absence of tracking from EFCore is what has the most positive impact on performance
Explicit loading has a bit of overhead comparing to AsSplitQuery() both very similar in terms of performance, just as expected initially
Retrieving single element with joining tables using a projection with EFCore
Again a mandatory test would be to evaluate the performance of EFCore translating our LINQ query to SQL for a single element with its nested data.
Again, a very good option for managing read-only queries transformed directly to their corresponding DTO's without the need of dealing with Dapper.
Projection code using EFCore:
var data = SqlServerDbTestDataContextHelper.Instance.DbContext.Persons
.Select(x => new PersonDto
{
Id = x.Id,
OriginId = x.OriginId,
Name = x.Name,
Surname = x.Surname,
Email = x.Email,
PhoneNumber = x.PhoneNumber,
BirthDate = x.BirthDate,
Bio = x.Bio,
Addresses = x.Addresses.Select(y => new AddressDto { Id = y.Id, CityId = y.CityId, Line = y.Line, Sequence = y.Sequence }),
Posts = x.Posts.Select(y => new PostDto { Id = y.Id, Text = y.Text, Title = y.Title })
})
.First(x => x.Id == id);
This is translated to SQL from EFCore to something like this:
SELECT [t].[Id], [t].[OriginId], [t].[Name], [t].[Surname], [t].[Email], [t].[PhoneNumber], [t].[BirthDate], [t].[Bio], [a].[Id], [a].[CityId], [a].[Line], [a].[Sequence], [p0].[Id], [p0].[Text], [p0].[Title]
FROM (
SELECT TOP(1) [p].[Id], [p].[OriginId], [p].[Name], [p].[Surname], [p].[Email], [p].[PhoneNumber], [p].[BirthDate], [p].[Bio]
FROM [Persons_Guid] AS [p]
WHERE [p].[Id] = @__id_0
)
AS [t]
LEFT JOIN [Addresses_Guid] AS [a] ON [t].[Id] = [a].[PersonId]
LEFT JOIN [Posts_Guid] AS [p0] ON [t].[Id] = [p0].[AuthorId]
ORDER BY [t].[Id], [a].[Id]
Mean | Allocated |
---|---|
18.6125 ms | 4617402 B |
Retrieving single element with joining tables using raw SQL multi-query with Dapper
Finally, our latest test puts one of Dapper's most interesting features under the microscope: its ability to launch multiple queries at once and be mapped.
This is useful to compare the performance when loading the same data from the previous scenario, although dumped in a DTO model.
Again, the peformance demonstrated with this approach is simply astonishing.
SQL queries invocation through Dapper:
var sql = $@"
SELECT [Id], [OriginId], [Name], [Surname], [BirthDate], [Bio], [Email], [PhoneNumber] FROM [dbo].[Persons_Guid] WHERE Id = @PersonId;
SELECT [Id], [CityId], [Line], [Sequence] FROM [dbo].[Addresses_Guid] WHERE [PersonId] = @PersonId;
SELECT [Id], [Title], [Text] FROM [dbo].[Posts_Guid] WHERE [AuthorId] = @PersonId
";
using (var connection = new SqlConnection(SqlServerDbTestDataContextHelper.TestDbConnectionString))
{
using (var multiQuery = await connection.QueryMultipleAsync(sql, new { PersonId = id }))
{
var person = (await multiQuery.ReadAsync<PersonDto>()).First();
var addresses = (await multiQuery.ReadAsync<AddressDto>()).ToList();
var posts = (await multiQuery.ReadAsync<PostDto>()).ToList();
person.Posts = posts;
person.Addresses = addresses;
}
}
Mean | Allocated |
---|---|
0.7817 ms | 111833 B |
π Conclusion
On the one hand, during this extensive post we have been able to see different ways of querying data with both EFCore and Dapper, as well as deep diving on the details of the fuctionality and performance offered by each of these solutions.
Applying one option, another or both will depend, as always, on the needs of each project, however as a general rule of thumb, with EFCore we can fully manage the needs of a project, also taking advantage of the additional collateral advantages that it allows us such as migrations management and safe typing in entity-table mappings. However, it will require making good use of the different features that it provides at the query level such as AsNoTracking()
or AsSplitQuery()
, or making explicit loading according to the needs of each situation.
On the other hand, however, in higher systems, with large volumes of data, especially those where performance is critical, Dapper offers unmatched performance and low memory load, however it will require extra control at testing level with integration and/or end-to-end tests so that model changes do not impact previously defined queries.
I hope I have been able to provide ideas, tips and useful information in this post about how to take advantage of both tools in this context.
If you liked it, please give me a β and follow me π.
π All results
To carry out this comparison, innumerable executions were released, many of them required specific specifications either due to problems with Windows Defender or due to the execution time, so I took the opportunity to leave all the important ones reported. Depending on the type of execution, the results may vary, however they are all really valid to establish some metrics and compare performance.
π Resources
The entire code can be found here; and the corresponding report here
Top comments (0)