When working with Entity Framework Core (EF Core), developers often notice that queries involving collection navigation properties (especially when using .Include() — generate an additional ORDER BY clause in the SQL output. This behaviour can be puzzling, particularly since the relationship keys are already available in the result set. This article explains why EF Core does this, demonstrates the behaviour in practice, and provides references for further reading.
Example Model
Consider the following simple domain model:
public class Blog
{
public int Id { get; set; }
public string Title { get; set; }
public List<Post> Posts { get; set; } = new();
}
public class Post
{
public int Id { get; set; }
public string Content { get; set; }
public int BlogId { get; set; }
public Blog Blog { get; set; }
}
The Query
A straightforward EF Core query to eagerly load related posts might look like this:
var blogs = context.Blogs
.Include(b => b.Posts)
.ToList();
The Generated SQL
When logging is enabled (via ToQueryString() or a database profiler), EF Core might generate a query similar to the following:
SELECT [b].[Id], [b].[Title],
[p].[Id], [p].[Content], [p].[BlogId]
FROM [Blogs] AS [b]
LEFT JOIN [Posts] AS [p] ON [b].[Id] = [p].[BlogId]
ORDER BY [b].[Id]
Why EF Core Adds ORDER BY
At first glance, this extra ordering may appear unnecessary — after all, the BlogId field in the Posts table already indicates the relationship between posts and blogs. However, EF Core's query pipeline is not simply returning raw rows; it is materialising an object graph.
When a LEFT JOIN is performed for eager loading:
- The result set is flattened — each blog is repeated for each of its posts.
- EF Core must group dependent rows (Posts) under their principal (Blog) in memory.
- Without a guaranteed order, rows could arrive interleaved, requiring EF Core to maintain a more complex in-memory lookup (such as a dictionary) to correctly assign posts to blogs.
By ordering the results by the primary key(s) of the principal entity (Blog.Id in this case), EF Core ensures:
- Rows belonging to the same blog are contiguous in the result set.
- Materialisation is faster and requires less memory overhead.
- The grouping operation becomes a simple sequential process.
Without Ordering – Potential Issue
Without ordering, the query result from the database could look like this:
In this case, EF Core would have to detect and match entities in a non-contiguous manner, increasing processing complexity.
With Ordering – Efficient Grouping
With ordering by Blog.Id:
EF Core can now process the result set sequentially, attaching posts to blogs without additional lookups.
Official Notes and References
The EF Core team has acknowledged this behaviour in discussions and issue trackers:
"The ORDER BY clause is added when materializing 1-to-many relations, so that the principal is grouped together while loading the dependent. Without this, rows can arrive in an arbitrary order..."
— EF Core GitHub Issue #19571
Further Reading
For deeper exploration of EF Core query behaviour and performance:
"Entity Framework Core in Action" – Jon P Smith
Detailed coverage of loading strategies and query optimisation."Pro Entity Framework Core 7" – Adam Freeman
Includes explanations of query translation, eager loading, and materialisation performance considerations.
Conclusion
The ORDER BY clause added by EF Core during eager loading of collection navigation properties is not an arbitrary decision. It is a deliberate design choice aimed at optimising the materialisation process by ensuring contiguous grouping of related data. While it may seem redundant when relationship keys are present, this approach simplifies EF Core's internal processing and can improve performance for large datasets.
Top comments (1)
I like this kind of academic article👍