When our team imported about 90,000 historical records into our production database, we discovered that our Entity Framework Core queries, which had performed well with smaller datasets, suddenly began timing out. This case study examines the performance bottlenecks we encountered and the optimization techniques that resolved them.
The Context
Our application had been running smoothly with a modest dataset of about 1,000 records. The dashboard loaded consistently in under 2 seconds, monitoring showed healthy performance metrics, and users were satisfied with the response times.
The data migration was intended to be routine—importing historical records from a legacy system to provide users with years of valuable information.
However, when users began accessing the system, we encountered significant performance issues.
What Went Wrong
The data import exposed fundamental scaling issues in our EF Core queries. What worked with 1,000 records failed completely with 90,000 records.
The Problem Manifests
The frontend team reported that API endpoints were returning 500 errors. Our investigation revealed:
User Impact:
- Frontend applications receiving 500 errors from API endpoints
- Dashboard pages failing to load with error messages
- API requests timing out after 30 seconds
- Previously fast-loading data lists now completely failing
Technical Symptoms:
- Application logs filled with database timeout exceptions
- Database connection pool reaching capacity
- Server CPU utilization spiking during peak usage
- Previously reliable endpoints failing consistently
Initial Troubleshooting
Our first step was examining the application logs, which revealed a clear pattern:
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.
The timeout period elapsed prior to completion of the operation or the server is not responding.
---> System.ComponentModel.Win32Exception (258): The wait operation timed out.
...
These timeout errors appeared across multiple endpoints, but all shared a common characteristic: they involved querying our primary data table, which had grown from 1,000 to about 90,000 records.
The correlation was unmistakable—every failing operation was related to queries against the expanded dataset.
The Scaling Challenge
This scenario illustrates a common scaling challenge in application development. Code that performs adequately with small datasets can become unusable when data volume increases significantly. This wasn't a defect in our implementation—it was a fundamental scalability limitation.
Why the increase in data volume caused complete failure:
- Query Complexity Scaling: Operations that appear linear often have hidden exponential costs
- Database Execution Plan Changes: Query optimizers choose different strategies for larger datasets
- Memory Allocation Issues: Loading complete result sets becomes problematic at scale
- Index Efficiency: Indexing strategies effective for small tables may not scale appropriately
The application was functioning exactly as designed—the issue was that our design assumptions no longer matched our operational reality.
Identifying the EF Core Bottlenecks
With the correlation established between our data volume increase and the timeout errors, we needed to understand what was happening at the database level. The next step was enabling Entity Framework Core's query logging to see exactly what SQL was being generated.
Enabling EF Core Query Logging
First, we added detailed logging to our application configuration:
// In Program.cs or Startup.cs
builder.Services.AddDbContext<ApplicationDbContext>(options =>
{
options.UseSqlServer(connectionString);
options.EnableSensitiveDataLogging(); // For development only
options.LogTo(Console.WriteLine, LogLevel.Information);
});
The Include() Performance Trap
We discovered that our dashboard queries were using Entity Framework's .Include()
method to load related data. This seemed reasonable for our original small dataset:
// The problematic code that worked fine with 1,000 records
public async Task<List<InfrastructureItem>> GetDashboardDataAsync(string status)
{
return await _context.Infrastructures
.Include(i => i.LocalGovernmentArea)
.Include(i => i.Region)
.Include(i => i.SenatorialDistrict)
.Include(i => i.AssessedBy)
.Include(i => i.CreatedBy)
.Where(i => i.Status == status)
.ToListAsync();
}
What this generated in SQL:
SELECT [i].[Id], [i].[Name], [i].[Status], [i].[CreatedAt],
[l].[Id], [l].[Name], [l].[Code],
[r].[Id], [r].[Name], [r].[Code],
[s].[Id], [s].[Name], [s].[Code],
[u].[Id], [u].[UserName], [u].[Email],
[u2].[Id], [u2].[UserName], [u2].[Email]
FROM [Infrastructures] AS [i]
LEFT JOIN [LocalGovernmentAreas] AS [l] ON [i].[LocalGovernmentAreaId] = [l].[Id]
LEFT JOIN [Regions] AS [r] ON [i].[RegionId] = [r].[Id]
LEFT JOIN [SenatorialDistricts] AS [s] ON [i].[SenatorialDistrictId] = [s].[Id]
LEFT JOIN [AspNetUsers] AS [u] ON [i].[AssessedById] = [u].[Id]
LEFT JOIN [AspNetUsers] AS [u2] ON [i].[CreatedById] = [u2].[Id]
WHERE [i].[Status] = @status
The Hidden Cost
With 90,000 records, this single query was:
- Joining 6 tables for every row returned
- Loading unnecessary data — we only needed names, not full user profiles
- Transferring massive amounts of data from database to application
- Consuming excessive memory to hold all the joined data
The database was working exponentially harder, and the network was transferring far more data than needed.
The Optimization Solution
We replaced the expensive joins with a more efficient approach:
// Optimized version - get only what we need
public async Task<List<InfrastructureItem>> GetDashboardDataAsync(string status)
{
// Step 1: Get core infrastructure data only
var infrastructureData = await _context.Infrastructures
.AsNoTracking() // Important: no change tracking needed
.Where(i => i.Status == status)
.Select(i => new
{
i.Id,
i.Name,
i.Status,
i.CreatedAt,
i.LocalGovernmentAreaId,
i.AssessedById,
i.CreatedById
})
.ToListAsync();
// Step 2: Get lookup data efficiently
var lgaIds = infrastructureData
.Where(i => i.LocalGovernmentAreaId.HasValue)
.Select(i => i.LocalGovernmentAreaId.Value)
.Distinct()
.ToList();
var lgaLookup = await _context.LocalGovernmentAreas
.Where(lga => lgaIds.Contains(lga.Id))
.ToDictionaryAsync(lga => lga.Id, lga => lga.Name);
// Step 3: Map to final result
return infrastructureData.Select(i => new InfrastructureItem
{
Id = i.Id,
Name = i.Name,
Status = i.Status,
CreatedAt = i.CreatedAt,
LocalGovernmentArea = i.LocalGovernmentAreaId.HasValue
? lgaLookup.GetValueOrDefault(i.LocalGovernmentAreaId.Value, "Unknown")
: "Unknown"
}).ToList();
}
Immediate Results
The optimization had an immediate and dramatic impact on our most problematic endpoints. Most importantly, our users could reaccess their dashboards without encountering timeout errors.
Key Lessons Learned
-
.Include()
doesn't scale linearly - each join multiplies the complexity -
.AsNoTracking()
is crucial for read-only operations -
Projection with
.Select()
dramatically reduces data transfer - Separate queries can be faster than complex joins for large datasets
Conclusion
Our EF Core optimizations transformed query performance from 30+ second timeouts to 2-3 second responses. The key lesson: .Include()
doesn't scale linearly with data volume.
Key takeaways:
- Use
.Select()
for projection instead of.Include()
for large datasets - Add
.AsNoTracking()
for read-only operations - Separate queries often outperform complex joins
- Enable EF Core logging to identify bottlenecks
While this solved our immediate crisis, some queries still struggled with sorting and pagination on large datasets. The real breakthrough came with database indexing—the subject of Part 2.
Top comments (0)