AnkitDevCode

Posted on Mar 14

JPA Mapping with Hibernate- Many-to-Many Relationship

#java #jpa #hibernate #springboot

In the previous section, we discussed the One-to-Many and Many-to-One Relationship Now, let’s look at the Many-to-many relationship

Introduction
The Relational Model Behind @ManyToMany
Unidirectional @ManyToMany — Simplest Form
Bidirectional @ManyToMany
equals() and hashCode() — The Critical Foundation
Intermediate Entity for Join Table With Extra Columns
Fetch Strategies & The N+1 Problem
Cascade Types — What to Use and When
Serialization — Avoiding Infinite Recursion
Performance Best Practices
Quick Reference — Best Practices vs Pitfalls
Conclusion

1. Introduction

A many-to-many (M:N) relationship is one of the most common yet most misunderstood associations in relational modeling. When mapped carelessly in JPA/Hibernate, it becomes a prime source of N+1 query problems, infinite JSON recursion, unnecessary eager loading, and subtle data-integrity bugs.

This article walks through every important aspect of @ManyToMany, from the simplest unidirectional form to a full intermediate-entity approach, and pairs every concept with the best practices that keep your application performant and maintainable.

2. The Relational Model Behind @ManyToMany

In a relational database, an M:N relationship is always implemented via a join table (also called a bridge or association table). For example, a Student ↔ Course relationship requires a student_course join table:

students           student_course         courses
──────────────     ─────────────────      ──────────────────
student_id (PK)    student_id  (FK) ─▶    course_id (PK)
name               course_id   (FK) ─▶    title
email              enrolled_at            credits
                   grade

If the join table carries extra columns (enrolled_at, grade), you must model it as a separate entity — a plain @JoinTable annotation cannot capture those columns.

3. Unidirectional @ManyToMany — Simplest Form

Use this when only one side needs to navigate to the other, and the join table has no extra columns.

3.1 Mapping

@Entity
public class Student {
    @Id @GeneratedValue
    private Long id;
    private String name;

    @ManyToMany
    @JoinTable(
        name = "student_course",
        joinColumns = @JoinColumn(name = "student_id"),
        inverseJoinColumns = @JoinColumn(name = "course_id")
    )
    private Set<Course> courses = new HashSet<>();   // ← Set, never List
}

@Entity
public class Course {
    @Id @GeneratedValue
    private Long id;
    private String title;
    // No back-reference here → unidirectional
}

✅ Best Practice — Use Set, not List

Always use Set<> for @ManyToMany collections. Hibernate's handling of List<> in many-to-many associations can throw MultipleBagFetchException when fetching multiple bag collections in the same query, and may produce duplicate records.

4. Bidirectional @ManyToMany

Bidirectional mapping lets both sides navigate to each other. Exactly one side must be the owning side (holds @JoinTable); the other is the inverse side (uses mappedBy).

4.1 Mapping

@Entity
public class Student {                            // OWNING SIDE
    @ManyToMany
    @JoinTable(
        name = "student_course",
        joinColumns = @JoinColumn(name = "student_id"),
        inverseJoinColumns = @JoinColumn(name = "course_id")
    )
    private Set<Course> courses = new HashSet<>();
}

@Entity
public class Course {                            // INVERSE SIDE
    @ManyToMany(mappedBy = "courses")            // mappedBy is mandatory
    private Set<Student> students = new HashSet<>();
}

❌ Pitfall — Forgetting mappedBy

Without mappedBy on the inverse side, JPA creates two independent join tables and double-inserts every link row. Always declare mappedBy on exactly one side.

4.2 Keeping Both Sides in Sync

In a bidirectional relationship you must update both sides programmatically in your helper methods, because the in-memory state is independent of the database state until flush:

// Add a convenience method on the owning side
public void enroll(Course course) {
    this.courses.add(course);
    course.getStudents().add(this);   // keep inverse in sync
}

public void unenroll(Course course) {
    this.courses.remove(course);
    course.getStudents().remove(this);
}

5. equals() and hashCode() — The Critical Foundation

Hibernate uses equals() and hashCode() to determine whether two entity instances represent the same row, especially when adding/removing from Set collections and when merging detached entities. The default Object identity implementation breaks all of this.

5.1 Correct Implementation (Business Key or UUID)

@Entity
public class Course {
    @Id @GeneratedValue
    private Long id;

    @NaturalId                     // Hibernate annotation
    @Column(nullable = false, unique = true)
    private String courseCode;     // e.g. "CS-101" — stable business key

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Course)) return false;
        Course other = (Course) o;
        return Objects.equals(courseCode, other.courseCode);
    }

    @Override
    public int hashCode() {
        return Objects.hashCode(courseCode);  // must be stable across states
    }
}

❌ Pitfall — Using id for hashCode

Never base hashCode on @Id if entities can be in a Set before being persisted. A transient entity has id = null, so its hashCode changes on persist, which silently corrupts any Set or HashMap that contained it.

6. Intermediate Entity for Join Table With Extra Columns

When the join table needs to store data (enrollment date, grade, seat number, etc.), replace the @ManyToMany shortcut with an explicit intermediate entity. This is the most robust and recommended pattern in production systems.

6.1 Composite Key Class

@Embeddable
public class EnrollmentId implements Serializable {
    @Column(name = "student_id")
    private Long studentId;

    @Column(name = "course_id")
    private Long courseId;

    // equals() + hashCode() required for @Embeddable PKs
    @Override public boolean equals(Object o) { ... }
    @Override public int hashCode() { ... }
}

6.2 Enrollment (Intermediate) Entity

@Entity
@Table(name = "student_course")
public class Enrollment {

    @EmbeddedId
    private EnrollmentId id = new EnrollmentId();

    @ManyToOne(fetch = FetchType.LAZY)
    @MapsId("studentId")
    private Student student;

    @ManyToOne(fetch = FetchType.LAZY)
    @MapsId("courseId")
    private Course course;

    @Column(nullable = false)
    private LocalDate enrolledAt;

    private BigDecimal grade;
}

6.3 Parent Entities

@Entity
public class Student {
    @OneToMany(mappedBy = "student", cascade = CascadeType.ALL, orphanRemoval = true)
    private Set<Enrollment> enrollments = new HashSet<>();

    public void enroll(Course course, LocalDate date) {
        Enrollment e = new Enrollment();
        e.setStudent(this);
        e.setCourse(course);
        e.setEnrolledAt(date);
        enrollments.add(e);
    }
}

@Entity
public class Course {
    @OneToMany(mappedBy = "course")   // no cascade from Course side
    private Set<Enrollment> enrollments = new HashSet<>();
}

✅ Best Practice — Cascade only from the aggregate root

Apply CascadeType.ALL + orphanRemoval only on the owning aggregate root side (Student). Do not cascade from Course — it is a separate aggregate and should not delete enrollments when a course is touched.

7. Fetch Strategies & The N+1 Problem

Fetch strategy is the single most impactful performance decision in any JPA application.

7.1 Always Use LAZY — Never EAGER

// ✅ Correct — LAZY is the safe default
@ManyToMany(fetch = FetchType.LAZY)
private Set<Course> courses = new HashSet<>();

// ❌ Wrong — loads ALL courses for ALL students every time a Student is loaded
@ManyToMany(fetch = FetchType.EAGER)
private Set<Course> courses;

7.2 Solving N+1 With JOIN FETCH

Even with LAZY loading, iterating a collection inside a loop produces one SQL query per iteration. Fix this with a JOIN FETCH JPQL query:

// N+1 — fires one extra query per student
List<Student> students = em.createQuery("SELECT s FROM Student s", Student.class)
                           .getResultList();
students.forEach(s -> s.getCourses().size()); // N hits

// ✅ Fixed — single JOIN query
List<Student> students = em.createQuery(
    "SELECT DISTINCT s FROM Student s JOIN FETCH s.courses",
    Student.class).getResultList();

7.3 Using @BatchSize as a Middle Ground

@ManyToMany(fetch = FetchType.LAZY)
@BatchSize(size = 25)   // loads 25 students' courses in one IN (...) query
private Set<Course> courses = new HashSet<>();

❌ Pitfall — MultipleBagFetchException

You cannot JOIN FETCH two List<> collections in the same JPQL query. Hibernate throws MultipleBagFetchException. Fix: change both to Set<>, or fetch one in JPQL and use @BatchSize for the second.

8. Cascade Types — What to Use and When

Cascade types control which JPA lifecycle operations (PERSIST, MERGE, REMOVE, etc.) are propagated from parent to child.

8.1 Recommended Cascade Matrix

Scenario	Cascade	orphanRemoval	Notes
Simple `@ManyToMany` (no extra cols)	`PERSIST, MERGE`	`false`	Do NOT use `REMOVE`
Intermediate entity (owned)	`ALL`	`true`	Only from aggregate root
Intermediate entity (shared)	`PERSIST, MERGE`	`false`	Shared = don't remove
`Course → Enrollment` (inverse)	(none)	`false`	Let `Student` own it

❌ Pitfall — CascadeType.REMOVE on @ManyToMany

Using CascadeType.REMOVE (or ALL) on a plain @ManyToMany will delete the related entities themselves — not just the join row. Removing one Student will delete all their Course records from the courses table, affecting every other enrolled student.

9. Serialization — Avoiding Infinite Recursion

Bidirectional relationships create circular object graphs. When Jackson (or any JSON library) tries to serialize a Student that contains Courses, which contain Students, which contain Courses... it throws a StackOverflowError.

9.1 Jackson Annotations

// On the owning side (Student)
@JsonManagedReference
private Set<Course> courses;

// On the inverse side (Course)
@JsonBackReference
private Set<Student> students;  // this side is NOT serialized

9.2 Better: Use DTOs (Recommended)

Never serialize JPA entities directly to your API layer. Use dedicated DTO/response classes:

// DTO — safe, no cycles, no Hibernate proxies
public record CourseDTO(Long id, String title, int credits) {
    public static CourseDTO from(Course c) {
        return new CourseDTO(c.getId(), c.getTitle(), c.getCredits());
    }
}

public record StudentDTO(Long id, String name, Set<CourseDTO> courses) {
    public static StudentDTO from(Student s) {
        return new StudentDTO(
            s.getId(), s.getName(),
            s.getCourses().stream().map(CourseDTO::from).collect(Collectors.toSet())
        );
    }
}

10. Performance Best Practices

10.1 Projections and DTO Queries

For read-heavy endpoints, skip entity loading entirely and query directly into DTOs:

@Query("SELECT new com.example.dto.StudentCourseDTO(s.name, c.title) " +
       "FROM Student s JOIN s.courses c WHERE s.id = :studentId")
List<StudentCourseDTO> findCoursesByStudent(@Param("studentId") Long id);

10.2 Pagination — Never Paginate With JOIN FETCH

// ❌ Wrong — Hibernate loads ALL rows into memory, then paginates
@Query("SELECT DISTINCT s FROM Student s JOIN FETCH s.courses")
Page<Student> findAll(Pageable pageable);   // issues HHH90003004 warning

// ✅ Correct — paginate the root entity, load collection separately
@Query(value = "SELECT s FROM Student s",
       countQuery = "SELECT COUNT(s) FROM Student s")
Page<Student> findAll(Pageable pageable);
// Then use @BatchSize or a second query to load courses

10.3 Use @Transactional on Service, Not Repository

Keep your transactions at the service layer where the full unit of work is clear. Opening a transaction in a repository method gives you no control over lazy loading in the service.

11. Quick Reference — Best Practices vs Pitfalls

✅ Best Practice	❌ Pitfall to Avoid
Use `@ManyToMany` with intermediate entity for extra columns	Using plain `@JoinTable` when join table has extra data
Set `fetch = FetchType.LAZY` on both sides	Using `FetchType.EAGER` (causes N+1 queries)
Define owning side clearly with `mappedBy` on inverse	Bidirectional mapping without `mappedBy`
Use `Set<>` instead of `List<>` to avoid duplicates	Using `List<>` and getting `MultipleBagFetchException`
Use `orphanRemoval` + `CascadeType.ALL` on parent side only	Cascading `ALL` on both sides (infinite loops / dual deletes)
Implement `equals()`/`hashCode()` based on business key	Using default `Object` identity for `equals`/`hashCode`
Use `@BatchSize` or `JOIN FETCH` to load related data	Loading collections in a loop (classic N+1 problem)
Use DTOs and projections for read-heavy queries	Serializing full entity graphs to JSON (`StackOverflow` risk)

12. Conclusion

Many-to-many associations are powerful but require deliberate design. The key takeaways are:

Use Set<> — always. List<> in M:N leads to bags, duplicates, and MultipleBagFetchException.
Prefer intermediate entity — as soon as the join table has any extra column, model it explicitly.
Keep fetch=LAZY everywhere — solve loading problems with JOIN FETCH or @BatchSize, not EAGER.
Define equals()/hashCode() on a stable business key — never rely on the database-generated id.
Cascade carefully — REMOVE and orphanRemoval belong only on aggregate-root-owned children.
Use DTOs at the API layer — never serialize entity graphs directly to JSON.
Measure first — use Hibernate's statistics or a query logger (P6Spy/datasource-proxy) to confirm you have no N+1 queries before shipping.

DEV Community