What do these three things have in common?
They are frequently used together. Is that a good idea though? No, not at all, and that's what I'm going to convince you of in this blog post.
But first lets talk little bit about what an ORM is, what it's for, and what is lazy loading in the context of ORMs.
ORM is an acronym for "Object Relational Mapper". Two popular examples of ORMs are Entity Framework and Hibernate.
In relational databases (like Sql Server, MySql, etc) the data is represented using tables, columns and rows. There is also the concept of a constraint like primary and foreign keys, indexes and many other things that have no direct equivalent in an object oriented programming language.
Before ORMs were popular if you needed to read data from a database you would write a SQL query and that SQL code would live somewhere together with your source code.
ORMs changed that by providing us with a way to declaratively specify how the data in a database maps to object oriented constructs. For example, how a table maps to a class and how a column maps to property.
By having this intermediate layer it was now possible to write code that looks like it's only manipulating objects, but that under the covers, is converted to SQL queries that are transparently sent to the database.
This is great because it simplifies how we interact with data. Previously we'd have to write SQL statements, which effectively were strings in our code. Concatenating SQL with user input was very common, and that is basically what enables SQL injection. SQL injection happens when a user writes SQL statements instead of normal input in a textbox that end up being executed in your database.
So that is roughly what an ORM is about.
What about lazy loading?
To really understand lazy loading we need to talk a little bit more about a concept in ORMs called Context (or Session). Whenever data is loaded and converted to objects in memory by an ORM and these objects are stored in a Context.
Usually an interaction that involves fetching data from the database starts by creating a Context and then performing the operation that triggers the fetching of the data. Here's an example using Entity Framework that fetches all the Costumers from the database:
var myContext = new MyContext();
var allCostumers = myContext.Customers.ToList();
//...
After accessing the Customers property in the context and calling .ToList() on it (which is the operation that triggers the fetching of the data from the database), the customer data is stored in the context itself (for example in Entity Framework you would be able to access it in myContext.Customers.Local). This is so that if you make changes to a customer the ORM can figure out what changed and generate the appropriate SQL statements.
The context also has the ability to give you a slightly altered version of a Customer. And this is particularly relevant when Lazy Loading is enabled. Imagine the Customer table is represented in the database as having many orders. The corresponding Customer class could look like this:
public class Customer
{
public int CustomerId {get; set;}
//...
public virtual ICollection<Order> Orders { get; set; }
}
Notice the virtual keyword in the Orders' collection. It just means that you can create a subclass of customer and override what that property does. And that is precisely what the context will do if you enable lazy loading.
With lazy loading enabled the context will keep track if that property was accessed or not. When it is accessed for the first time the context will transparently fetch the associated data (in this case the customer's orders).
This allows us to write code like this:
var myContext = new MyContext();
var johnDoe = myContext.Customers.Single(customer => customer.Name == "John Doe");
foreach(var order in johnDoe.Orders)
{
//do something with John Doe's order
}
This all looks very well. It is easy to read and understand, however it could be made more efficient. However, it is not obvious why unless you are familiar with how the particular ORM that you are using works.
In this case this is Entity Framework where the data will be fetched on calling .Single(...) and when enumerating over the Orders. So in this example there are two database calls.
It is very easy to make this much worse and create what is called the N+1 problem. Here's an example:
var myContext = new MyContext();
var thisYearCustomers = myContext.Customers.Where(customer => customer.JoinYear == 2017);
foreach(var customer in thisYearsCustomers)
{
foreach(var order in customer.Orders)
{
//do something with the customer's order
}
}
This will trigger a database request when the the customers are enumerated (foreach over customers) and then for each customer's orders. Hence the N+1, or 1+N if you prefer.
In case you're thinking this that lazy loading is terrible and never makes sense, in this situation you're probably right. A situation that leads to a N+1 problem is likely always a mistake. However that doesn't mean that lazy loading is never useful.
Imagine a desktop application where the context is created and lives through several user interactions. For example the user opens a customer screen, looks at it and then decides to look at that customer's orders.
In this scenario lazy loading is very convenient and makes sense.
When it does not is in a web application. That's because the context will not exist during more than one user interaction. It's just not possible.
In a web application the user's actions result in an HTTP request being sent from the user's browser to the server. The server then does all the required processing for that request and sends a response back to the user. And then this process repeats for every user action.
Between requests the server forgets about the user, so if a context is created in response to a user's action it will be gone after the response is sent. That is just the stateless nature of the web.
The only thing you can achieve using lazy loading in a web application is extra database calls you could avoid. That's because if user asks for the orders of a particular customer, the code that runs in the server will have to load the customer and the orders all during the handling of the user's request, it can either do it in a single database call or two. Lazy loading just makes it really easy to end up in the two database call scenario without realizing it.
For completeness I'll just mention what happens if you have lazy loading disabled in most ORMs. In the example above the Orders property would be null. You would have to instruct the ORM to fetch it together with the Customers all in one go. This is called eager loading:
var myContext = new MyContext();
var johnDoe = myContext.Customers.Include(customer => customer.Orders).Single(customer => customer.Name == "John Doe");
foreach(var order in johnDoe.Orders) //no extra db access
{
//do something with John Doe's order
}
There's nothing to be gained by using lazy loading in web applications, however it's so much more common to see it being used in "the wild" than not. Even if we use the argument that it might be more convenient at times, the likelihood of having serious performance problems (like the N+1) offsets any possible benefits.
Top comments (6)
I think that, given what you've illustrated (and my experience), EF got it right. Back when we'd write our own SQL, if I wanted the customer information and the orders, I'd have written two queries - the first would be a PK hit, and the second would be a FK hit. Pretty efficient, especially if I'm just selecting the columns I want to display.
In the EF realm, if I want customer info, I don't have to load the orders; this is the "avoiding extra database calls" scenario. However, I can still have my object model with an Orders collection (and the Orders have a property for the owning Customer), and if I need to edit a specific order, I can do
var order = ctx.Orders.Include(o => o.Customer).Single(o => o.Id == myId)
.EF also has the concept of tracked vs. non-tracked entities, and with non-tracked entities, it's either eager-loaded or it's null. I'd wager that most web app calls could get away with sticking an
.AsNoTracking()
right after the DbSet.I believe, at least in the EF realm, if you were to loop
foreach (var order in cust.Orders)
, you're going to end up with a fetch per iteration; if you want to constrain it to two fetches, you'd need to loopforeach (var order in cust.Orders.ToList())
. If there are 5 orders, it would be negligible, but you could end up with N+1 without needing to go 2 levels deep. (source: SQL debug on an app whose slowness I was diagnosing; they could have changed that since then.)All that being said - you have to really dig into EF to learn these things, but you don't have to dig in that deep to get something that works. It would be nice if there were some way to include these concepts in beginner tutorials. I don't know that I'd judge the potential issues as offsetting the benefits, though; in many cases, the efficiency using the defaults is acceptable, and keeps down the allocations for the instantiated classes. And, with the simplified syntax, ORMs (with their associated downsides) can be an accessible way for people to learn data persistence.
Hi Daniel,
.AsNoTracking()
isn't related to lazy-loading, it's a way of saying to EF that you don't want it to keep track of those entities. This means that if you make changes to them, those changes won't be persisted when you call.SaveChanges()
.Also, if you do something like
foreach (var order in cust.Orders)
you are not going to end up with a fetch per iteration. There's only going to be 1 db call (adding .ToList() makes no difference in this case). The way you end up with N+1 is if Orders is lazy loaded and you do something like this:You are absolutely right about having to really dig deep into the ORM that you are using. Lazy loading and N+1 problems are just one of the ways you can shoot yourself in the foot. Another that comes to mind is validation introducing performance issues when doing bulk inserts.
You are assuming that the context is always attached to the request cycle. But there are several use cases for web applications to attach the context to the user session, in which case lazy loading is perfectly acceptable.
Hi Jonathan,
Yes, you could store the context in the user session. However, I've never seen that being done and I suspect that is because that wouldn't scale very well (a lot of overhead per user) and it would introduce difficulties if you were to have load balancing.
First, arguably nothing scales very well if badly implemented.
And second, not every software is Twitter. There is a huge realm of enterprise software where the domain and the user base is well defined enough that throwing out the context after every request would be a waste. Like I said, use cases.
Furthermore, the application doesn't even have to stick to one model or another, large systems can define a context for entities that are to be discarded after every request, like reporting data. And have another context attached to session for data related to the user. It is even possible to have a specific context that is application scoped, and never gets destroyed, that can hold general configuration data.
That's what profiling is for.