DEV Community

Max Borysov
Max Borysov

Posted on • Edited on

Mysql 5.7 Window Functions Workaround

If you work a lot with data in SQL, sooner or later you will have a task to find top N results/row per group. Window functions provide great help there. But MySQL 5.7 does not support them, only MySQL 8.0.

There are several ways we can solve this task.

Let's say we have the following task: we have a list of tweets/posts (with user_id, pub date, likes count) and we want to get SUM likes for top-N tweets for each user.

Table structure:

CREATE TABLE twitter_comments (
  id INT AUTO_INCREMENT NOT NULL PRIMARY KEY,
  user_id INT,
  likes INT,
  published_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Enter fullscreen mode Exit fullscreen mode

There are at least three ways we can do that.

Solution 1

SELECT
  tw.user_id,
  SUM(tw.likes) AS total_likes
  FROM twitter_comments AS tw
  JOIN (
    SELECT
      user_id,
      GROUP_CONCAT(id ORDER BY published_at DESC) AS comments_ids
      FROM twitter_comments
    GROUP BY user_id
  ) AS t ON t.user_id = tw.user_id
 WHERE FIND_IN_SET(tw.id, t.comments_ids) BETWEEN 1 AND 3
 GROUP BY tw.user_id;
Enter fullscreen mode Exit fullscreen mode

GROUP_CONCAT will return a list of comments ids per user:

+---------+----------------+
| user_id | comments_ids   |
+---------+----------------+
|       1 | 6,5,4,3,2,1    |
|       2 | 12,11,10,9,8,7 |
+---------+----------------+
Enter fullscreen mode Exit fullscreen mode

FIND_IN_SET will return the index of a current comment id in a list of all comments(for a specific user). In our case, an index value BETWEEN 1 AND 3 should match the latest three comments.

Possible issues: GROUP_CONCAT has a maximum length and if a user has many comments, you can easily hit the limit. But since we are interested only in the top 3,5 or 10 results, I do not think it should be a problem.

SQL Fiddle example

Solution 2

Similar to solution#1, but this time we'll use SUBSTRING_INDEX and LOCATE.

SELECT
  tw.user_id,
  SUM(tw.likes) AS total_likes
  FROM twitter_comments AS tw
  JOIN (
    SELECT
      user_id,
      SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY published_at DESC), ',', 3) AS comments_ids
      FROM twitter_comments
    GROUP BY user_id
  ) AS t ON t.user_id = tw.user_id
 WHERE LOCATE(tw.id, t.comments_ids) > 0
 GROUP BY tw.user_id;
Enter fullscreen mode Exit fullscreen mode

SUBSTRING_INDEX in combination with GROUP_CONCAT will return already trimmed result(latest comments ids)

+---------+--------------+
| user_id | comments_ids |
+---------+--------------+
|       1 | 6,5,4        |
|       2 | 12,11,10     |
+---------+--------------+
Enter fullscreen mode Exit fullscreen mode

And LOCATE will help us to check if a current comment id is in that list:

SELECT LOCATE('6', '6,5,4');
+----------------------+
| LOCATE('6', '6,5,4') |
+----------------------+
|                    1 |
+----------------------+
Enter fullscreen mode Exit fullscreen mode

SQL Fiddle example

Solution 3

Use session variables to generate rank/row number for each comment(per user). Smth like that:

+----+---------+-------+---------------------+------+--------------+
| id | user_id | likes | published_at        | rank | curr_user_id |
+----+---------+-------+---------------------+------+--------------+
|  6 |       1 |   200 | 2021-04-05 14:01:00 |    1 |            1 |
|  5 |       1 |   100 | 2021-04-05 13:01:00 |    2 |            1 |
|  4 |       1 |   200 | 2021-04-05 12:01:00 |    3 |            1 |
|  3 |       1 |   300 | 2021-04-05 11:01:00 |    4 |            1 |
|  2 |       1 |   200 | 2021-04-05 10:01:00 |    5 |            1 |
|  1 |       1 |   100 | 2021-04-05 09:01:00 |    6 |            1 |
| 12 |       2 |   100 | 2021-05-05 14:01:00 |    1 |            2 |
| 11 |       2 |   200 | 2021-05-05 13:01:00 |    2 |            2 |
| 10 |       2 |   300 | 2021-05-05 12:01:00 |    3 |            2 |
|  9 |       2 |   100 | 2021-05-05 11:01:00 |    4 |            2 |
|  8 |       2 |   300 | 2021-05-05 10:01:00 |    5 |            2 |
|  7 |       2 |   200 | 2021-05-05 09:01:00 |    6 |            2 |
+----+---------+-------+---------------------+------+--------------+
Enter fullscreen mode Exit fullscreen mode

Full example:

SET @curr_count = 0, @curr_user_id = 0;
SELECT
  user_id,
  SUM(likes) AS total_likes
  FROM (
    SELECT
      *,
      ( @curr_count := IF ( @curr_value = user_id, @curr_count + 1, 1 ) ) AS rank,
      ( @curr_value := user_id ) AS curr_user_id
      FROM twitter_comments
     ORDER BY user_id ASC, published_at DESC
  ) AS comments
 WHERE rank <= 3
 GROUP BY user_id;
Enter fullscreen mode Exit fullscreen mode

SQL Fiddle example


Here is how we can achieve the same result using Window Functions in Postgres: Practical example for Ranking with Window Functions


Helpful links:

Top comments (0)