1. What is the N+1 problem?
N+1 query problem is a common performance antipattern when you use the ORM library. Specifically with Java is Hibernate
Assume we have two entities in our system Cat and Hat. One Cat has many Hat, each Hat has only one Cat. Cat and Hat is one of many relationships.
Cat 1-N Hats
public class Cat {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "cat_seq_generator")
@SequenceGenerator(name = "cat_seq_generator", sequenceName = "cat_seq_seq")
private Long id;
@Column(name = "name", columnDefinition = "TEXT")
private String name;
@OneToMany(fetch = FetchType.LAZY, mappedBy = "cat", cascade = CascadeType.ALL)
private List<Hat> hats;
}
public class Hat {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "hat_seq_generator")
@SequenceGenerator(name = "hat_seq_generator", sequenceName = "hat_seq_seq")
private Long id;
@Column(name = "color", columnDefinition = "TEXT")
private String color;
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "cat_id", nullable = false)
private Cat cat;
}
How many queries do we need to send to the database for N cats to get the data about N cats and all hat that belongs to this cat?
With naive implements, we will fetch all the cats, and after that, with each cat, we will get all hats belonging to this hat.
@Transactional(readOnly = true)
public void getAllCatInfo() {
catRepository.findAll().forEach(cat -> {
log.info("Cat name: {}", cat.getName());
log.info("Hats color: {}",
cat.getHats().stream().map(Hat::getColor).collect(Collectors.joining(",")));
});;
}
@Test
@DisplayName("Given cat and hats, get all, Should return ok")
void givenCatAndHat_getAll_shouldReturnOk() {
var cat1 = new Cat();
cat1.setName("Tom");
var cat2 = new Cat();
cat2.setName("Margot");
catRepository.saveAll(List.of(cat1, cat2));
var hat1 = new Hat();
var hat2 = new Hat();
var hat3 = new Hat();
hat1.setColor("Red");
hat2.setColor("Blue");
hat3.setColor("Yellow");
hat1.setCat(cat1);
hat2.setCat(cat2);
hat3.setCat(cat2);
hatRepository.saveAll(List.of(hat1, hat2, hat3));
catService.getAllCatInfo();
}
And look at the log, we have:
Hibernate: select c1_0.id,c1_0.name from cats c1_0
2024-05-01T23:23:46.159+07:00 INFO 20972 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: Tom
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id=?
2024-05-01T23:23:46.171+07:00 INFO 20972 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hats color: Red
2024-05-01T23:23:46.171+07:00 INFO 20972 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: Margot
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id=?
2024-05-01T23:23:46.172+07:00 INFO 20972 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hats color: Blue,Yellow
We need one query to fetch all cats, after that we need more N queries for each cat to get the hat information. Totaly is N+1 queries.
As you can see, we have not to query so many to do that. Hibernate provides some solutions for that problem. Which helps us reduce the time call to DB.
2. Specify the batch size.
A solution is to specify @BatchSize()
for relations in Cat. With that annotation, hibernate will not fetch sub-entity one by one anymore. Each proxy object is involved, it will fetch all hats belonging to a size
cat. It will not be very efficient in the case you want to access the list not in sequential order. For example, the size is 3, and you only want access to cat number 1 and number 50, because for now, still takes 3 queries instead of 2 queries as I best solution. But don't worry, we can limit the number of cats we fetch each time. This case rarely happened in real life, I think so.
Okay, let's add @BatchSize
to declare of hats
field.
@OneToMany(fetch = FetchType.LAZY, mappedBy = "cat", cascade = CascadeType.ALL)
@BatchSize(size = 3)
private List<Hat> hats;
For you to see the difference. I will create many cats and hats and see how Hibernate handles that.
@Test
@DisplayName("Given cat and hats, get all, Should return ok")
public void givenManyCatAndHat_getAll_shouldReturnOk() {
IntStream.range(1, 10).forEach((i) -> {
var cat = new Cat();
cat.setName(RandomStringUtils.random(4, "qwertyuio"));
var hats = IntStream.range(1, 10).mapToObj((j) -> new Hat())
.peek((hat) -> hat.setColor(RandomStringUtils.random(4, "qwertyuio")))
.peek((hat -> hat.setCat(cat)))
.toList();
cat.setHats(hats);
catRepository.save(cat);
});
catService.getAllCatInfo();
}
And lock up the log. We can see Hibernate only need one query with 3 cats.
Hibernate: select c1_0.id,c1_0.name from cats c1_0
2024-05-02T21:40:58.962+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: iwoo
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id = any (?)
2024-05-02T21:40:58.971+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hats color: iqtr,ueou,ewwi,iito,iuqe,reqy,yiwr,yeoy,weru
2024-05-02T21:40:58.971+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: ouoq
2024-05-02T21:40:58.971+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hats color: uywy,eeqo,rtye,yiee,qwye,tury,towy,wwii,oeit
2024-05-02T21:40:58.972+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: iqri
2024-05-02T21:40:58.972+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hats color: qiuq,wuio,trwu,wiqe,oieo,tyet,ruew,uoyt,itri
2024-05-02T21:40:58.972+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: yuyy
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id = any (?)
2024-05-02T21:40:58.975+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hats color: wueu,qwoi,uotu,eqei,rwuo,teti,oiyq,yeqt,owuq
2024-05-02T21:40:58.975+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: itoi
2024-05-02T21:40:58.975+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hats color: yoqq,rure,oqoi,eoeq,etou,utyt,reew,itqw,uoqo
2024-05-02T21:40:58.975+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: tqww
2024-05-02T21:40:58.975+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hats color: riio,uqyi,tqoi,itut,rwwu,twou,ryew,oqeo,wiiy
2024-05-02T21:40:58.975+07:00 INFO 9545 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: ewwi
One more test case about random access
@Transactional(readOnly = true)
public void getAllCatAndRandomAccess() {
List<Cat> cats = catRepository.findAll();
IntStream.of(2, 9, 20, 30)
.forEach((index) -> {
Cat cat = cats.get(index);
printCatAndHat(cat);
});
}
private void printCatAndHat(Cat cat) {
log.info("Cat name: {}", cat.getName());
log.info("Hat colors: {}",
cat.getHats().stream().map(Hat::getColor).collect(Collectors.joining(",")));
}
See the log.
Hibernate: select c1_0.id,c1_0.name from cats c1_0
2024-05-02T22:03:53.925+07:00 INFO 13132 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: iytr
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id = any (?)
2024-05-02T22:03:53.926+07:00 INFO 13132 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: qeww,iwoo,yyor,ertq,yrwr,etyi,errq,uwrq,iewt
2024-05-02T22:03:53.926+07:00 INFO 13132 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: qqtu
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id = any (?)
2024-05-02T22:03:53.927+07:00 INFO 13132 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: euir,yweo,yruq,eiou,eqei,quiu,yroy,tuwe,yuoy
2024-05-02T22:03:53.927+07:00 INFO 13132 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: qiyr
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id = any (?)
2024-05-02T22:03:53.928+07:00 INFO 13132 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: rroi,toti,wquq,iquu,rtui,qiti,uuqo,qeiq,yqrw
2024-05-02T22:03:53.928+07:00 INFO 13132 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: ywqe
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id = any (?)
2024-05-02T22:03:53.929+07:00 INFO 13132 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: yirw,eiue,wwyw,qroo,iqwq,uuuu,qttt,uttt,owir
We can see each time the proxy class is invoked, and one more time Hibernate reaches DB for us.
3. Using join fetch.
You can also use fetch query like this.
@Query("""
SELECT a FROM Cat a left join fetch a.hats
""")
List<Cat> findAllCatAndPopulateHat();
@Transactional(readOnly = true)
public void getAllCatAndHatByJoinFetchQuery() {
catRepository.findAllCatAndPopulateHat().forEach(this::printCatAndHat);
}
And see the log
Hibernate: select c1_0.id,h1_0.cat_id,h1_0.id,h1_0.color,c1_0.name from cats c1_0 left join hats h1_0 on c1_0.id=h1_0.cat_id
2024-05-02T22:12:49.092+07:00 INFO 14415 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: tweo
2024-05-02T22:12:49.092+07:00 INFO 14415 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: iuww,wiqi,eirt,yoow,woqo,itre,rruq,ywqu,wooe
2024-05-02T22:12:49.093+07:00 INFO 14415 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: etue
2024-05-02T22:12:49.093+07:00 INFO 14415 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: wqtq,eqqt,iuyy,uqyw,iiyo,yqyt,teqo,euuo,eooo
2024-05-02T22:12:49.093+07:00 INFO 14415 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: qqoq
2024-05-02T22:12:49.093+07:00 INFO 14415 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: iiqy,ytrr,ioiy,treo,tuee,teii,truq,uyyy,tequ
2024-05-02T22:12:49.093+07:00 INFO 14415 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: wqyr
Hibernate will use fetch join to get all data about cat and hat in one query.
I see it as more effective than specifying the @BatchSize
but this way makes the query very difficult when using page and paginations.
4. Specify the fetch mode.
Hibernate provided us 3 fetch modes is
- SELECT
- JOIN
- SUBSELECT
- SELECT
It uses a secondary query to load a single associated entity. The behavior is the same as above.
- JOIN
This is the default behavior for fetch type EAGER
Let's change the declaration of hats and see the difference.
@OneToMany(mappedBy = "cat", cascade = CascadeType.ALL)
@Fetch(FetchMode.JOIN)
private List<Hat> hats;
And the log
Hibernate: select c1_0.id,c1_0.name from cats c1_0
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id=?
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id=?
2024-05-02T22:34:14.088+07:00 INFO 16360 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: Tom
2024-05-02T22:34:14.088+07:00 INFO 16360 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: Red
2024-05-02T22:34:14.088+07:00 INFO 16360 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: Margot
2024-05-02T22:34:14.088+07:00 INFO 16360 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: Blue,Yellow
We see that it will fetch all the data at the beginning of the method. This means having no subquery, and data be fetched at once. But with so many queries as needed, the n+1 problem still happened.
- SUBSELECT
Let's change the code and see.
@OneToMany(fetch = FetchType.LAZY, mappedBy = "cat", cascade = CascadeType.ALL)
@Fetch(FetchMode.SUBSELECT)
private List<Hat> hats;
And the log
Hibernate: select c1_0.id,c1_0.name from cats c1_0
2024-05-02T22:38:30.796+07:00 INFO 16974 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: Tom
Hibernate: select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id in (select c1_0.id from cats c1_0)
2024-05-02T22:38:30.802+07:00 INFO 16974 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: Red
2024-05-02T22:38:30.802+07:00 INFO 16974 --- [Demo] [ Test worker] com.example.demo.services.CatService : Cat name: Margot
2024-05-02T22:38:30.802+07:00 INFO 16974 --- [Demo] [ Test worker] com.example.demo.services.CatService : Hat colors: Blue,Yellow
It will issue two queries. The first one is select c1_0.id,c1_0.name from cats c1_0
for fetch all cat, the second one is select h1_0.cat_id,h1_0.id,h1_0.color from hats h1_0 where h1_0.cat_id in (select c1_0.id from cats c1_0)
for select all hat from the query. And for now, no N+1 problems anymore.
5. Conclude.
By understanding the N+1 problem and the available solutions, you can significantly improve the performance of your Hibernate applications, especially when dealing with large datasets and complex relationships.
I hope you can choose the right solutions for your work.
Top comments (2)
Excellent article.
However, I've been thinking about:
join fetch
solution with query, I'm not sure I understand what you mean when you say that "this way makes the query very difficult when using page and paginations"; after all, you use the raw query, so you're free to add thelimit
andoffset
, right?Sorry for my slow response.
=> It depends on the situation if I usually get all relations or not. But to be honest, I prefer SUBSELECT because through my expertise we rarely need a part of the list sub-entity, almost in case we need all the list sub-entities to calculate.
I don't use raw query, it is a JPQL query.
What I mentioned is when you want to use JPA to fetch page data like that.
And write some test to figure out what happening internal.
And we have log like that.
Do you see the abnormal thing in here? We only use 10 entities of cat but hibernate fetch and managed 99 objects in persistence context for us. The reason is that hibernate will fetch and evaluate the value of the page query in memory. And imagine if someone gets the object in position 1000, 1000 objects will be fetched for each type, leading to OOM. That all.