Problem Scenario
Consider the following entity relationships:
- User: has many comments
- Comment: belongs to a user, references a post
- Post: belongs to a category
- Category
Business requirement: Find the top 5 users with the most comments in the ".NET" category over the past 7 days.
Standard Approach: Multiple Subqueries
ORMs typically generate a separate subquery for each condition:
SELECT * FROM t_user t
WHERE EXISTS (SELECT 1 FROM t_comment WHERE user_id = t.id)
ORDER BY (
SELECT COUNT(*) FROM t_comment c
JOIN t_post p ON p.id = c.post_id
JOIN t_category cat ON cat.id = p.category_id
WHERE c.user_id = t.id
AND c.create_at > '7 days ago'
AND cat.name = '.NET'
) DESC
LIMIT 5
This query contains two subqueries:
-
EXISTSsubquery: checks if a user has any comments -
COUNTsubquery: counts comments matching the criteria
Each user requires both subqueries to execute, causing the comment table to be scanned repeatedly. More conditions mean more subqueries and worse performance.
Optimization Strategy
Key Insight: One-to-Many Can Only Do Aggregation
For a one-to-many relationship (User → Comments), the only operations possible in WHERE and ORDER BY are:
| Operation | Essence |
|---|---|
| Has comments? | EXISTS = COUNT > 0 |
| How many comments? | COUNT |
| Latest comment time? | MAX |
| No comments of type X? | NOT EXISTS = COUNT = 0 |
All aggregations. Since they're all aggregations, why compute them separately?
Merge Strategy: One GROUP BY, Multiple CASE WHENs
Group the comment table by user ID and compute all required values in a single aggregation:
SELECT
user_id,
(COUNT(*) > 0) AS has_comments,
COUNT(CASE WHEN create_at > '7 days ago' AND category.name = '.NET' THEN 1 END) AS net_count
FROM t_comment
LEFT JOIN t_post ON ...
LEFT JOIN t_category ON ...
GROUP BY user_id
This produces an "aggregation table" with one row per user, containing all aggregated values needed for the conditions.
Then LEFT JOIN back to the user table and use the aggregated results in WHERE and ORDER BY:
SELECT t.*
FROM t_user t
LEFT JOIN (aggregation subquery above) agg ON agg.user_id = t.id
WHERE IFNULL(agg.has_comments, false) = true
ORDER BY IFNULL(agg.net_count, 0) DESC
LIMIT 5
The comment table is scanned only once.
Why Doesn't the Database Do This Automatically?
Because the database sees two independent subqueries and doesn't know:
- Both subqueries operate on the same table (t_comment)
- They share the same join condition (
user_id = t.id) - They can be merged into a single GROUP BY
Database optimizers don't understand "object relationships" — they only see SQL text.
The ORM Advantage
The ORM knows that user.comments() is a one-to-many relationship with user_id as the foreign key.
When developers write:
u.comments().any(); // Condition 1: has comments
u.comments().where(...).count().desc(); // Condition 2: order by count
The ORM can recognize: these are two aggregation operations on the same comments relationship — they can be merged.
Implementation in easy-query
Standard Approach
LocalDateTime dateTime = LocalDateTime.now().plusDays(-7);
List<User> list = entityQuery.queryable(User.class)
.where(u -> {
u.comments().any();
})
.orderBy(u -> {
u.comments().where(c -> {
c.createAt().isAfter(dateTime);
c.post().category().name().eq(".NET");
}).count().desc();
})
.limit(5).toList();
Enabling GROUP JOIN Optimization
Just add one line of configuration:
List<User> list = entityQuery.queryable(User.class)
.configure(s -> s.getBehavior().add(EasyBehaviorEnum.ALL_SUB_QUERY_GROUP_JOIN)) // Add this line
.where(u -> {
u.comments().any();
})
.orderBy(u -> {
u.comments().where(c -> {
c.createAt().isAfter(dateTime);
c.post().category().name().eq(".NET");
}).count().desc();
})
.limit(5).toList();
Generated SQL:
SELECT t.`id`, t.`username`
FROM `t_user` t
LEFT JOIN (
SELECT
t1.`user_id` AS `userId`,
(COUNT(*) > 0) AS `__any2__`,
COUNT(CASE
WHEN t1.`create_at` > '2025-10-19 22:30:12' AND t4.`name` = '.NET'
THEN 1 ELSE NULL
END) AS `__count3__`
FROM `t_comment` t1
LEFT JOIN `t_post` t3 ON t3.`id` = t1.`post_id`
LEFT JOIN `t_category` t4 ON t4.`id` = t3.`category_id`
GROUP BY t1.`user_id`
) t2 ON t2.`userId` = t.`id`
WHERE IFNULL(t2.`__any2__`, false) = true
ORDER BY IFNULL(t2.`__count3__`, 0) DESC
LIMIT 5
Other Ways to Enable
// Option 1: Query-level configuration
.configure(s -> s.getBehavior().add(EasyBehaviorEnum.ALL_SUB_QUERY_GROUP_JOIN))
// Option 2: Specify the relationship
.subQueryToGroupJoin(user -> user.comments())
// Option 3: Annotation
@Navigate(subQueryToGroupJoin = true)
private List<Comment> comments;
More Complex Scenarios
If you add another subquery in SELECT (e.g., total comment count), GROUP JOIN still requires only one scan:
entityQuery.queryable(User.class)
.configure(s -> s.getBehavior().add(EasyBehaviorEnum.ALL_SUB_QUERY_GROUP_JOIN))
.where(u -> u.comments().any())
.orderBy(u -> {
u.comments().where(c -> {
c.createAt().isAfter(dateTime);
c.post().category().name().eq(".NET");
}).count().desc();
})
.limit(5)
.select(u -> Select.DRAFT.of(
u.id(),
u.username(),
u.comments().count() // Additional: total comment count
)).toList();
All three aggregations (any, conditional count, total count) are computed in a single GROUP BY subquery.
Performance Test: easy-query vs EF Core
Test Environment
- Database: MySQL 8 (local)
- Java 8 / EF Core 9
- Data: 16 users, 9 categories, 150K posts, 1M comments
- Source code: https://github.com/xuejmnet/eqefcoresamples
easy-query with Standard Subqueries
LocalDateTime dateTime = LocalDateTime.now().plusDays(-7);
List<User> list = entityQuery.queryable(User.class)
.where(u -> {
u.comments().any();
})
.orderBy(u -> {
u.comments().where(c -> {
c.createAt().isAfter(dateTime);
c.post().category().name().eq(".NET");
}).count().desc();
})
.limit(5).toList();
Generated SQL:
SELECT t.`id`, t.`username`
FROM `t_user` t
WHERE EXISTS (
SELECT 1 FROM `t_comment` t1
WHERE t1.`user_id` = t.`id` LIMIT 1
)
ORDER BY (
SELECT COUNT(*) FROM `t_comment` t2
LEFT JOIN `t_post` t3 ON t3.`id` = t2.`post_id`
LEFT JOIN `t_category` t4 ON t4.`id` = t3.`category_id`
WHERE t2.`user_id` = t.`id`
AND t2.`create_at` > '2025-10-19 22:28:18'
AND t4.`name` = '.NET'
) DESC
LIMIT 5
Duration: ~11 seconds
EF Core
var dateTime = DateTime.Now.AddDays(-7);
var users = context.Set<User>()
.AsNoTracking()
.Where(u => u.Comments.Any())
.OrderByDescending(u => u.Comments
.Count(c =>
c.CreatedAt >= dateTime &&
c.Post.Category.Name == ".NET")
)
.Take(5)
.ToList();
Generated SQL:
SELECT `t`.`id`, `t`.`username`
FROM `t_user` AS `t`
WHERE EXISTS (
SELECT 1 FROM `t_comment` AS `t0`
WHERE `t`.`id` = `t0`.`user_id`
)
ORDER BY (
SELECT COUNT(*)
FROM `t_comment` AS `t1`
INNER JOIN `t_post` AS `t2` ON `t1`.`post_id` = `t2`.`id`
INNER JOIN `t_category` AS `t3` ON `t2`.`category_id` = `t3`.`id`
WHERE (`t`.`id` = `t1`.`user_id`)
AND ((`t1`.`create_at` >= @__dateTime_0) AND (`t3`.`name` = '.NET'))
) DESC
LIMIT @__p_1
Duration: ~11 seconds
easy-query with GROUP JOIN
List<User> list = entityQuery.queryable(User.class)
.configure(s -> s.getBehavior().add(EasyBehaviorEnum.ALL_SUB_QUERY_GROUP_JOIN))
.where(u -> u.comments().any())
.orderBy(u -> {
u.comments().where(c -> {
c.createAt().isAfter(dateTime);
c.post().category().name().eq(".NET");
}).count().desc();
})
.limit(5).toList();
Duration: ~2.7 seconds
Performance Comparison
| Framework | Approach | Duration |
|---|---|---|
| easy-query | Standard subqueries | ~11s |
| EF Core | Subqueries | ~11s |
| easy-query | GROUP JOIN | ~2.7s |
4x performance improvement!
Summary
The essence of Subquery to GROUP JOIN:
- Identify: Multiple subqueries operating on the same one-to-many relationship
- Merge: Use CASE WHEN to compute all conditions in a single GROUP BY
- Join: LEFT JOIN the aggregated result back to the main table
This optimization requires semantic information at the ORM level — database optimizers can't do it. You write subqueries the same way as before, but the framework automatically merges multiple subqueries (in WHERE / ORDER BY / SELECT) into one — this is "Implicit Group", solving both readability and performance issues.
Links
- GitHub: https://github.com/dromara/easy-query
- Documentation: https://www.easy-query.com/easy-query-doc/en
Top comments (0)