If you work a lot with data in SQL, sooner or later you will have a task to find top N results/row per group. Window functions provide great help there. But MySQL 5.7 does not support them, only MySQL 8.0.
There are several ways we can solve this task.
Let's say we have the following task: we have a list of tweets/posts (with user_id, pub date, likes count) and we want to get SUM likes for top-N tweets for each user.
Table structure:
CREATE TABLE twitter_comments (
id INT AUTO_INCREMENT NOT NULL PRIMARY KEY,
user_id INT,
likes INT,
published_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
There are at least three ways we can do that.
Solution 1
SELECT
tw.user_id,
SUM(tw.likes) AS total_likes
FROM twitter_comments AS tw
JOIN (
SELECT
user_id,
GROUP_CONCAT(id ORDER BY published_at DESC) AS comments_ids
FROM twitter_comments
GROUP BY user_id
) AS t ON t.user_id = tw.user_id
WHERE FIND_IN_SET(tw.id, t.comments_ids) BETWEEN 1 AND 3
GROUP BY tw.user_id;
GROUP_CONCAT will return a list of comments ids per user:
+---------+----------------+
| user_id | comments_ids |
+---------+----------------+
| 1 | 6,5,4,3,2,1 |
| 2 | 12,11,10,9,8,7 |
+---------+----------------+
FIND_IN_SET will return the index of a current comment id
in a list of all comments(for a specific user). In our case, an index value BETWEEN 1 AND 3
should match the latest three comments.
Possible issues: GROUP_CONCAT has a maximum length and if a user has many comments, you can easily hit the limit. But since we are interested only in the top 3,5 or 10 results, I do not think it should be a problem.
Solution 2
Similar to solution#1, but this time we'll use SUBSTRING_INDEX and LOCATE.
SELECT
tw.user_id,
SUM(tw.likes) AS total_likes
FROM twitter_comments AS tw
JOIN (
SELECT
user_id,
SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY published_at DESC), ',', 3) AS comments_ids
FROM twitter_comments
GROUP BY user_id
) AS t ON t.user_id = tw.user_id
WHERE LOCATE(tw.id, t.comments_ids) > 0
GROUP BY tw.user_id;
SUBSTRING_INDEX in combination with GROUP_CONCAT will return already trimmed result(latest comments ids)
+---------+--------------+
| user_id | comments_ids |
+---------+--------------+
| 1 | 6,5,4 |
| 2 | 12,11,10 |
+---------+--------------+
And LOCATE will help us to check if a current comment id is in that list:
SELECT LOCATE('6', '6,5,4');
+----------------------+
| LOCATE('6', '6,5,4') |
+----------------------+
| 1 |
+----------------------+
Solution 3
Use session variables to generate rank/row number for each comment(per user). Smth like that:
+----+---------+-------+---------------------+------+--------------+
| id | user_id | likes | published_at | rank | curr_user_id |
+----+---------+-------+---------------------+------+--------------+
| 6 | 1 | 200 | 2021-04-05 14:01:00 | 1 | 1 |
| 5 | 1 | 100 | 2021-04-05 13:01:00 | 2 | 1 |
| 4 | 1 | 200 | 2021-04-05 12:01:00 | 3 | 1 |
| 3 | 1 | 300 | 2021-04-05 11:01:00 | 4 | 1 |
| 2 | 1 | 200 | 2021-04-05 10:01:00 | 5 | 1 |
| 1 | 1 | 100 | 2021-04-05 09:01:00 | 6 | 1 |
| 12 | 2 | 100 | 2021-05-05 14:01:00 | 1 | 2 |
| 11 | 2 | 200 | 2021-05-05 13:01:00 | 2 | 2 |
| 10 | 2 | 300 | 2021-05-05 12:01:00 | 3 | 2 |
| 9 | 2 | 100 | 2021-05-05 11:01:00 | 4 | 2 |
| 8 | 2 | 300 | 2021-05-05 10:01:00 | 5 | 2 |
| 7 | 2 | 200 | 2021-05-05 09:01:00 | 6 | 2 |
+----+---------+-------+---------------------+------+--------------+
Full example:
SET @curr_count = 0, @curr_user_id = 0;
SELECT
user_id,
SUM(likes) AS total_likes
FROM (
SELECT
*,
( @curr_count := IF ( @curr_value = user_id, @curr_count + 1, 1 ) ) AS rank,
( @curr_value := user_id ) AS curr_user_id
FROM twitter_comments
ORDER BY user_id ASC, published_at DESC
) AS comments
WHERE rank <= 3
GROUP BY user_id;
Here is how we can achieve the same result using Window Functions in Postgres: Practical example for Ranking with Window Functions
Helpful links:
- stackoverflow discussion
- ranking example
- many people recommend this article How to Select the First/Least/Max Row per Group in SQL
Top comments (0)