DEV Community

Li
Li

Posted on

Subquery to GROUP JOIN — A Method for Merging Multiple Subqueries

Problem Scenario

Consider the following entity relationships:

  • User: has many comments
  • Comment: belongs to a user, references a post
  • Post: belongs to a category
  • Category

Business requirement: Find the top 5 users with the most comments in the ".NET" category over the past 7 days.


Standard Approach: Multiple Subqueries

ORMs typically generate a separate subquery for each condition:

SELECT * FROM t_user t 
WHERE EXISTS (SELECT 1 FROM t_comment WHERE user_id = t.id)
ORDER BY (
    SELECT COUNT(*) FROM t_comment c
    JOIN t_post p ON p.id = c.post_id
    JOIN t_category cat ON cat.id = p.category_id
    WHERE c.user_id = t.id 
      AND c.create_at > '7 days ago'
      AND cat.name = '.NET'
) DESC
LIMIT 5
Enter fullscreen mode Exit fullscreen mode

This query contains two subqueries:

  1. EXISTS subquery: checks if a user has any comments
  2. COUNT subquery: counts comments matching the criteria

Each user requires both subqueries to execute, causing the comment table to be scanned repeatedly. More conditions mean more subqueries and worse performance.


Optimization Strategy

Key Insight: One-to-Many Can Only Do Aggregation

For a one-to-many relationship (User → Comments), the only operations possible in WHERE and ORDER BY are:

Operation Essence
Has comments? EXISTS = COUNT > 0
How many comments? COUNT
Latest comment time? MAX
No comments of type X? NOT EXISTS = COUNT = 0

All aggregations. Since they're all aggregations, why compute them separately?

Merge Strategy: One GROUP BY, Multiple CASE WHENs

Group the comment table by user ID and compute all required values in a single aggregation:

SELECT 
    user_id,
    (COUNT(*) > 0) AS has_comments,
    COUNT(CASE WHEN create_at > '7 days ago' AND category.name = '.NET' THEN 1 END) AS net_count
FROM t_comment
LEFT JOIN t_post ON ...
LEFT JOIN t_category ON ...
GROUP BY user_id
Enter fullscreen mode Exit fullscreen mode

This produces an "aggregation table" with one row per user, containing all aggregated values needed for the conditions.

Then LEFT JOIN back to the user table and use the aggregated results in WHERE and ORDER BY:

SELECT t.* 
FROM t_user t
LEFT JOIN (aggregation subquery above) agg ON agg.user_id = t.id
WHERE IFNULL(agg.has_comments, false) = true
ORDER BY IFNULL(agg.net_count, 0) DESC
LIMIT 5
Enter fullscreen mode Exit fullscreen mode

The comment table is scanned only once.


Why Doesn't the Database Do This Automatically?

Because the database sees two independent subqueries and doesn't know:

  1. Both subqueries operate on the same table (t_comment)
  2. They share the same join condition (user_id = t.id)
  3. They can be merged into a single GROUP BY

Database optimizers don't understand "object relationships" — they only see SQL text.


The ORM Advantage

The ORM knows that user.comments() is a one-to-many relationship with user_id as the foreign key.

When developers write:

u.comments().any();                           // Condition 1: has comments
u.comments().where(...).count().desc();       // Condition 2: order by count
Enter fullscreen mode Exit fullscreen mode

The ORM can recognize: these are two aggregation operations on the same comments relationship — they can be merged.


Implementation in easy-query

Standard Approach

LocalDateTime dateTime = LocalDateTime.now().plusDays(-7);
List<User> list = entityQuery.queryable(User.class)
    .where(u -> {
        u.comments().any();
    })
    .orderBy(u -> {
        u.comments().where(c -> {
            c.createAt().isAfter(dateTime);
            c.post().category().name().eq(".NET");
        }).count().desc();
    })
    .limit(5).toList();
Enter fullscreen mode Exit fullscreen mode

Enabling GROUP JOIN Optimization

Just add one line of configuration:

List<User> list = entityQuery.queryable(User.class)
    .configure(s -> s.getBehavior().add(EasyBehaviorEnum.ALL_SUB_QUERY_GROUP_JOIN))  // Add this line
    .where(u -> {
        u.comments().any();
    })
    .orderBy(u -> {
        u.comments().where(c -> {
            c.createAt().isAfter(dateTime);
            c.post().category().name().eq(".NET");
        }).count().desc();
    })
    .limit(5).toList();
Enter fullscreen mode Exit fullscreen mode

Generated SQL:

SELECT t.`id`, t.`username` 
FROM `t_user` t 
LEFT JOIN (
    SELECT 
        t1.`user_id` AS `userId`, 
        (COUNT(*) > 0) AS `__any2__`, 
        COUNT(CASE 
            WHEN t1.`create_at` > '2025-10-19 22:30:12' AND t4.`name` = '.NET' 
            THEN 1 ELSE NULL 
        END) AS `__count3__` 
    FROM `t_comment` t1 
    LEFT JOIN `t_post` t3 ON t3.`id` = t1.`post_id` 
    LEFT JOIN `t_category` t4 ON t4.`id` = t3.`category_id` 
    GROUP BY t1.`user_id`
) t2 ON t2.`userId` = t.`id` 
WHERE IFNULL(t2.`__any2__`, false) = true 
ORDER BY IFNULL(t2.`__count3__`, 0) DESC 
LIMIT 5
Enter fullscreen mode Exit fullscreen mode

Other Ways to Enable

// Option 1: Query-level configuration
.configure(s -> s.getBehavior().add(EasyBehaviorEnum.ALL_SUB_QUERY_GROUP_JOIN))

// Option 2: Specify the relationship
.subQueryToGroupJoin(user -> user.comments())

// Option 3: Annotation
@Navigate(subQueryToGroupJoin = true)
private List<Comment> comments;
Enter fullscreen mode Exit fullscreen mode

More Complex Scenarios

If you add another subquery in SELECT (e.g., total comment count), GROUP JOIN still requires only one scan:

entityQuery.queryable(User.class)
    .configure(s -> s.getBehavior().add(EasyBehaviorEnum.ALL_SUB_QUERY_GROUP_JOIN))
    .where(u -> u.comments().any())
    .orderBy(u -> {
        u.comments().where(c -> {
            c.createAt().isAfter(dateTime);
            c.post().category().name().eq(".NET");
        }).count().desc();
    })
    .limit(5)
    .select(u -> Select.DRAFT.of(
        u.id(),
        u.username(),
        u.comments().count()  // Additional: total comment count
    )).toList();
Enter fullscreen mode Exit fullscreen mode

All three aggregations (any, conditional count, total count) are computed in a single GROUP BY subquery.


Performance Test: easy-query vs EF Core

Test Environment

easy-query with Standard Subqueries

LocalDateTime dateTime = LocalDateTime.now().plusDays(-7);
List<User> list = entityQuery.queryable(User.class)
    .where(u -> {
        u.comments().any();
    })
    .orderBy(u -> {
        u.comments().where(c -> {
            c.createAt().isAfter(dateTime);
            c.post().category().name().eq(".NET");
        }).count().desc();
    })
    .limit(5).toList();
Enter fullscreen mode Exit fullscreen mode

Generated SQL:

SELECT t.`id`, t.`username` 
FROM `t_user` t 
WHERE EXISTS (
    SELECT 1 FROM `t_comment` t1 
    WHERE t1.`user_id` = t.`id` LIMIT 1
) 
ORDER BY (
    SELECT COUNT(*) FROM `t_comment` t2 
    LEFT JOIN `t_post` t3 ON t3.`id` = t2.`post_id` 
    LEFT JOIN `t_category` t4 ON t4.`id` = t3.`category_id` 
    WHERE t2.`user_id` = t.`id` 
      AND t2.`create_at` > '2025-10-19 22:28:18' 
      AND t4.`name` = '.NET'
) DESC 
LIMIT 5
Enter fullscreen mode Exit fullscreen mode

Duration: ~11 seconds

EF Core

var dateTime = DateTime.Now.AddDays(-7);
var users = context.Set<User>()
    .AsNoTracking()
    .Where(u => u.Comments.Any())
    .OrderByDescending(u => u.Comments
        .Count(c =>
            c.CreatedAt >= dateTime &&
            c.Post.Category.Name == ".NET")
    )
    .Take(5)
    .ToList();
Enter fullscreen mode Exit fullscreen mode

Generated SQL:

SELECT `t`.`id`, `t`.`username`
FROM `t_user` AS `t`
WHERE EXISTS (
    SELECT 1 FROM `t_comment` AS `t0`
    WHERE `t`.`id` = `t0`.`user_id`
)
ORDER BY (
    SELECT COUNT(*)
    FROM `t_comment` AS `t1`
    INNER JOIN `t_post` AS `t2` ON `t1`.`post_id` = `t2`.`id`
    INNER JOIN `t_category` AS `t3` ON `t2`.`category_id` = `t3`.`id`
    WHERE (`t`.`id` = `t1`.`user_id`) 
      AND ((`t1`.`create_at` >= @__dateTime_0) AND (`t3`.`name` = '.NET'))
) DESC
LIMIT @__p_1
Enter fullscreen mode Exit fullscreen mode

Duration: ~11 seconds

easy-query with GROUP JOIN

List<User> list = entityQuery.queryable(User.class)
    .configure(s -> s.getBehavior().add(EasyBehaviorEnum.ALL_SUB_QUERY_GROUP_JOIN))
    .where(u -> u.comments().any())
    .orderBy(u -> {
        u.comments().where(c -> {
            c.createAt().isAfter(dateTime);
            c.post().category().name().eq(".NET");
        }).count().desc();
    })
    .limit(5).toList();
Enter fullscreen mode Exit fullscreen mode

Duration: ~2.7 seconds

Performance Comparison

Framework Approach Duration
easy-query Standard subqueries ~11s
EF Core Subqueries ~11s
easy-query GROUP JOIN ~2.7s

4x performance improvement!


Summary

The essence of Subquery to GROUP JOIN:

  1. Identify: Multiple subqueries operating on the same one-to-many relationship
  2. Merge: Use CASE WHEN to compute all conditions in a single GROUP BY
  3. Join: LEFT JOIN the aggregated result back to the main table

This optimization requires semantic information at the ORM level — database optimizers can't do it. You write subqueries the same way as before, but the framework automatically merges multiple subqueries (in WHERE / ORDER BY / SELECT) into one — this is "Implicit Group", solving both readability and performance issues.


Links

Top comments (0)