Dominika Sikorska

Posted on Dec 22 • Originally published at dominika-sikorska.Medium

LINQ Performance Optimization: 5 Patterns Every C# Developer Should Know

#dotnet #csharp #performance #software

This is Part 3 of the "LINQ Performance & Best Practices" series. If you haven't read the foundational articles yet:

Part 1: Deferred Execution - The Essence of LINQ in C#

Part 2: Multiple Enumerations in LINQ Expressions

Our encryption service API response time jumped from 200ms to 3 seconds after adding a "simple" LINQ query to filter document metadata. The query looked clean, passed code review without comments, and the unit tests ran fine. But production told a different story.

The problem wasn't LINQ itself—it was how we used it. We were materializing too early, projecting too late, and processing large datasets inefficiently. After profiling our .NET microservices and analyzing slow query patterns, I identified 5 optimization patterns that consistently improved performance by 40-60%.

These aren't micro-optimizations that save nanoseconds. They're architectural patterns that fundamentally change how you think about data flow in C# applications—patterns that determine whether your API responds in 200ms or 3 seconds.

If you've read my previous articles on multiple enumerations and deferred execution, you understand the fundamentals. Now let's apply them to real production scenarios where performance actually matters.

Pattern 1: Strategic Materialization—Know When to Call ToList()

The Problem: When Memory Becomes the Bottleneck

Tuesday afternoon, performance testing. Our document encryption service endpoint was consuming 2.4GB of memory for a dataset that should've been 400MB maximum. The culprit was a single line of code:

var documents = await GetAllDocumentsAsync();
var encryptedDocuments = documents.Where(d => d.IsEncrypted && d.Status == "Active");

The developer loaded all 150,000 document metadata records into memory before filtering. Every single document record was loaded, deserialized, and then... 90% of them were immediately discarded by the Where clause.

This is what premature materialization cost us:

2.4GB memory usage for a dataset that needed 400MB after filtering
3.2 seconds to load all documents from storage
Timeout errors when multiple encryption requests hit the endpoint simultaneously
83% wasted bandwidth loading data we'd immediately discard
Unnecessary processing of 135,000 records we'd never use

The worst part? This pattern was repeated across 12 different API endpoints. We were paying for compute resources to waste memory and CPU on data we didn't need.

Why This Happens

As developers, we're taught to "separate concerns." Query the data, then process it. So we write:

var data = GetAllData(); // Get everything first
var filtered = data.Where(condition); // Then filter

It feels clean. It feels organized. But it's a performance disaster. When you materialize a large collection early, you're loading everything into memory before you know what you actually need.

The Solution: Filter Before You Materialize

The fix is simple—filter first, materialize last:

// BAD: Materialize first, filter in memory
var documents = await GetAllDocumentsAsync(); // Loads 150,000 records
var encryptedDocs = documents.Where(d => d.IsEncrypted && d.Status == "Active");

// GOOD: Filter first, then materialize
var encryptedDocs = (await GetAllDocumentsAsync())
  .Where(d => d.IsEncrypted && d.Status == "Active")
  .ToList(); // Loads only 15,000 records

// EVEN BETTER: If using IQueryable (EF, database)
var encryptedDocs = dbContext.Documents
  .Where(d => d.IsEncrypted && d.Status == "Active") // Filters in SQL
  .ToList(); // Loads 15,000 records

The impact:

400MB memory usage (down from 2.4GB)
320ms query time (down from 3.2 seconds)
90% reduction in bandwidth and processing
Zero timeout errors after deployment
Efficient data source filtering (SQL WHERE, API query params, etc.)

The Rule: Materialize After Filtering, Before Multiple Iterations

Here's the decision framework:

Materialize (call .ToList() or .ToArray()) when:

You'll iterate over the results multiple times
You need to pass the data to multiple methods
You're done filtering and ready to process in memory
You want to close the database connection immediately

Don't materialize when:

You're still building the query with more .Where(), .Select(), or .OrderBy() clauses
You'll only iterate once with a foreach loop
You're working with large datasets that should be streamed

For more on why multiple iterations require materialization, see Part 2 on multiple enumerations.

Pattern 2: Project Before You Materialize—Select Only What You Need

The second most common performance issue I've found: loading entire objects when you only need 2-3 properties.

The Problem

// Loading full document metadata objects (15+ properties)
var documents = await GetDocumentsAsync(); // Returns List<DocumentMetadata>

// Then projecting to lightweight objects
return documents.Select(d => new DocumentInfo
{
  Id = d.Id,
  FileName = d.FileName,
  EncryptionStatus = d.EncryptionStatus
});

This code loads all 15+ properties of each document metadata object into memory, deserializes them, and then immediately throws away 12 of those properties when creating the lightweight DocumentInfo. Wasteful.

The Solution: Early Projection

// Project to lightweight objects before materializing
var documents = (await GetDocumentsAsync())
  .Select(d => new DocumentInfo
  {
    Id = d.Id,
    FileName = d.FileName,
    EncryptionStatus = d.EncryptionStatus
  })
  .ToList();

// EVEN BETTER: If using IQueryable (database queries)
var documents = await dbContext.Documents
  .Where(d => d.IsActive)
  .Select(d => new DocumentInfo
  {
    Id = d.Id,
    FileName = d.FileName,
    EncryptionStatus = d.EncryptionStatus
  })
  .ToListAsync();

When you use .Select() before .ToList() on an IQueryable, the data source (database, API, etc.) can optimize what it retrieves. For Entity Framework queries, it generates SQL that only retrieves the columns you need:

-- EF generates optimized SQL with early projection
SELECT d.Id, d.FileName, d.EncryptionStatus
FROM Documents d
WHERE d.IsActive = 1

Performance Impact:

40-60% faster for entities with many properties or large text fields
70% reduction in network bandwidth between database and application
No Entity Framework change tracking overhead for DTOs
Smaller result sets mean less memory and faster serialization

Pattern 3: Avoid the N+1 Problem with Eager Loading

Note: This pattern is specific to database queries with Entity Framework or similar ORMs. If you're working with in-memory collections, skip to Pattern 4.

The N+1 problem is the silent killer of database-backed API performance. One parent query, then N queries for related children. Your code looks clean, but your database server is crying.

The Problem: The Classic N+1 Trap

// Looks innocent, right?
var documents = await dbContext.Documents
  .Where(d => d.UserId == userId)
  .ToListAsync();

foreach (var document in documents)
{
  // This line triggers a separate database query for EACH document!
  var versionCount = document.Versions.Count;
  Console.WriteLine($"Document {document.Id} has {versionCount} versions");
}

If the user has 50 documents, this code executes 51 database queries:

1 query to get all documents
50 queries to get Versions for each document (one per iteration)

The Solution: Eager Loading with Include()

// Load documents AND their versions in a single query
var documents = await dbContext.Documents
  .Where(d => d.UserId == userId)
  .Include(d => d.Versions) // Eager loading
  .ToListAsync();

foreach (var document in documents)
{
  // No database query here - data already loaded
  var versionCount = document.Versions.Count;
  Console.WriteLine($"Document {document.Id} has {versionCount} versions");
}

Entity Framework generates a SQL query with a JOIN:

SELECT d.*, v.*
FROM Documents d
LEFT JOIN DocumentVersions v ON d.Id = v.DocumentId
WHERE d.UserId = 12345

Performance Impact:

1 query instead of 51 (98% reduction)
Response time: 214ms (down from 2,847ms)
92% faster API endpoint
Reduced database load and connection pool pressure

When NOT to Use Eager Loading

Eager loading isn't always the answer. Avoid it when:

1. You're loading too much data (the "over-eager" problem):

// BAD: Loading entire object graph
var document = await dbContext.Documents
  .Include(d => d.Versions)
    .ThenInclude(v => v.EncryptionKeys)
    .ThenInclude(k => k.KeyMetadata)
    .ThenInclude(m => m.AuditLogs)
  .FirstOrDefaultAsync(d => d.Id == documentId);
// If document has 100 versions with multiple keys each, you just loaded 1000+ entities

2. You need filtered related data:

// EF Core 5+: Filtered includes
var documents = await dbContext.Documents
  .Include(d => d.Versions.Where(v => v.IsActive))
  .ToListAsync();

The balance: Use .Include() for data you'll definitely use. For optional or conditional data, consider split queries or explicit loading.

Pattern 4: Stream Large Datasets with IAsyncEnumerable

When you need to process 100,000+ records, .ToList() becomes your enemy. You'll consume gigabytes of memory loading data you'll process once and discard.

The Problem: Memory Exhaustion with Large Datasets

// Processing 500,000 encryption audit records for a batch job
var auditRecords = await dbContext.EncryptionAudits
  .Where(a => a.ProcessedDate == null)
  .ToListAsync(); // Loads 500K records into memory (2.8GB!)

foreach (var record in auditRecords)
{
  await ProcessAuditRecordAsync(record);
}

This works fine for 1,000 records. At 10,000 records, it's slow. At 100,000+ records, you'll get OutOfMemoryException or push your server into memory pressure, triggering garbage collection pauses.

The Solution: Streaming with IAsyncEnumerable

// Stream records one at a time from the database
await foreach (var record in dbContext.EncryptionAudits
  .Where(a => a.ProcessedDate == null)
  .AsAsyncEnumerable())
{
  await ProcessAuditRecordAsync(record);
}

With IAsyncEnumerable, Entity Framework fetches records in batches (default: 1000 records per round-trip) and yields them one at a time. You process each record, then it's immediately eligible for garbage collection.

When to Use Streaming

Use IAsyncEnumerable when:

Processing large datasets (10K+ records)
Building batch jobs or background workers
Memory is constrained
You process each record independently

Don't use streaming when:

You need the total count upfront (requires loading all records)
You need to iterate multiple times (use materialization instead — see Part 2)
You need to sort or group the entire dataset in memory

For more on how deferred execution enables streaming, see Part 1 on deferred execution.

Pattern 5: Compiled Queries for Hot Code Paths

Every time you execute a LINQ query against Entity Framework, EF has to:

Parse your LINQ expression tree
Translate it to SQL
Cache the query plan (hopefully)
Execute the query

For most queries, this overhead is negligible — 5–10ms. But when you're executing the same query 10,000 times per minute in a high-traffic API endpoint, that overhead adds up.

The Problem: Query Translation Overhead

// This method gets called 10,000+ times per minute
public async Task<Document?> GetDocumentByIdAsync(int documentId)
{
  return await dbContext.Documents
    .Include(d => d.EncryptionKey)
    .FirstOrDefaultAsync(d => d.Id == documentId);
  // EF translates this expression tree on every call
}

Even with EF's query caching, there's still expression tree traversal and cache lookup overhead.

The Solution: Compiled Queries

// Compile the query once, reuse thousands of times
private static readonly Func<AppDbContext, int, Task<Document?>>
  GetDocumentByIdQuery = EF.CompileAsyncQuery(
    (AppDbContext db, int documentId) =>
      db.Documents
        .Include(d => d.EncryptionKey)
        .FirstOrDefault(d => d.Id == documentId)
);

public async Task<Document?> GetDocumentByIdAsync(int documentId)
{
  return await GetDocumentByIdQuery(dbContext, documentId);
}

The query is compiled once when your application starts. After that, EF skips the expression tree parsing and directly executes the pre-compiled SQL.

When to Use Compiled Queries

Use compiled queries for:

High-traffic API endpoints (1000+ requests/minute)
Queries executed in tight loops
Microservices with high query volume
Performance-critical lookup operations

Don't use compiled queries when:

The query is rarely executed
The query structure changes frequently
You're building dynamic queries with conditional filters
The performance gain isn't worth the code complexity

Trade-offs

Compiled queries have limitations:

Must be static (defined at startup)
Can't use conditional logic in the query
Slightly more complex code
Parameters must be known at compile time

For dynamic queries, consider specification patterns or expression tree composition instead.

Putting It All Together: A Production Example

Here's a real-world API method that combines multiple patterns:

public async Task<DocumentSummaryDto> GetUserDocumentSummaryAsync(int userId)
{
  // Pattern 3: Eager loading to avoid N+1
  // Pattern 2: Project to DTO before materializing
  // Pattern 1: Filter first, materialize last
  var summary = await dbContext.Documents
    .Where(d => d.UserId == userId && d.Status == DocumentStatus.Active)
    .Include(d => d.Versions) // Avoid N+1
    .Select(d => new DocumentInfoDto
    {
      DocumentId = d.Id,
      FileName = d.FileName,
      EncryptionStatus = d.EncryptionStatus,
      VersionCount = d.Versions.Count,
      LastModified = d.Versions.Max(v => v.CreatedDate),
      TotalSize = d.Versions.Sum(v => v.FileSizeBytes)
    })
    .OrderByDescending(d => d.LastModified)
    .Take(50) // Limit results in database
    .ToListAsync(); // Materialize after all transformations

  return new DocumentSummaryDto
  {
    UserId = userId,
    Documents = summary,
    TotalDocuments = summary.Count,
    TotalSizeBytes = summary.Sum(d => d.TotalSize)
  };
}

Before optimization:

127 database queries (N+1 problem)
3,200ms response time
450MB memory per request

After applying patterns:

1 database query
187ms response time
28MB memory per request

Improvement: 94% faster, 93% less memory 🚀

Performance Optimization Checklist

When reviewing or writing LINQ queries, ask yourself:

Am I filtering before calling .ToList()? (Pattern 1)
Do I really need all properties, or can I project? (Pattern 2)
Am I loading related data, or will I trigger N+1 queries? (Pattern 3)
Is this dataset large enough to stream instead of loading? (Pattern 4)
Is this a high-frequency query that would benefit from compilation? (Pattern 5)
Have I profiled this code path? Use Application Insights, MiniProfiler, or SQL Profiler
What does the generated SQL look like? Enable EF logging: dbContext.Database.Log = Console.WriteLine;

Common Mistakes to Avoid

1. Materializing inside a loop

// BAD: Queries database on every iteration
foreach (var userId in userIds)
{
  var documents = dbContext.Documents
    .Where(d => d.UserId == userId)
    .ToList(); // Separate query per user
}

// GOOD: Single query for all users
var documents = dbContext.Documents
  .Where(d => userIds.Contains(d.UserId))
  .ToList();

2. Mixing IEnumerable and IQueryable

// BAD: AsEnumerable() forces in-memory processing
var results = dbContext.Documents
  .AsEnumerable() // Everything after this runs in C# memory
  .Where(d => d.IsEncrypted) // No SQL WHERE clause
  .ToList();

// GOOD: Keep it IQueryable until materialization
var results = dbContext.Documents
  .Where(d => d.IsEncrypted) // SQL WHERE clause
  .ToList();

3. Over-eager loading

// BAD: Loading entire object graph
var document = dbContext.Documents
  .Include(d => d.Versions)
    .ThenInclude(v => v.EncryptionKeys)
    .ThenInclude(k => k.AuditLogs)
  .FirstOrDefault(d => d.Id == id);
// Might load 10,000+ entities if document has many versions

// GOOD: Load only what you need
var document = dbContext.Documents
  .Include(d => d.Versions.Where(v => v.IsActive))
    .ThenInclude(v => v.EncryptionKeys)
  .FirstOrDefault(d => d.Id == id);

Conclusion

LINQ is one of the most powerful features of C#, but power without understanding creates performance problems. These 5 patterns aren't complex — they're about understanding data flow and making intentional decisions about when and how to execute queries.

Start with Pattern 1 and 2. Filter before materializing and project only what you need. These two patterns alone will solve 80% of your LINQ performance issues.

Then profile your application to find N+1 problems, memory pressure from large datasets, or hot paths that need optimization. Use Application Insights, SQL Profiler, or MiniProfiler to see what's really happening.

The difference between a 200ms API response and a 3-second timeout often comes down to these patterns. Master them, and you'll write LINQ queries that scale.

Series Navigation

Part 1: Deferred Execution - The Essence of LINQ in C#
Part 2: Multiple Enumerations in LINQ Expressions
Part 3: LINQ Performance Optimization: 5 Patterns (You are here)

Coming next: More advanced LINQ techniques, IEnumerable vs IQueryable deep-dive, and anti-patterns to avoid!

Follow me on dev.to for the next parts in this series!

What's your experience with LINQ performance? What patterns have you found effective? Share your biggest performance gotcha in the comments below!

Originally published on Medium.

DEV Community