DEV Community

Max Borysov
Max Borysov

Posted on • Edited on

6 4

Mysql 5.7 Window Functions Workaround

If you work a lot with data in SQL, sooner or later you will have a task to find top N results/row per group. Window functions provide great help there. But MySQL 5.7 does not support them, only MySQL 8.0.

There are several ways we can solve this task.

Let's say we have the following task: we have a list of tweets/posts (with user_id, pub date, likes count) and we want to get SUM likes for top-N tweets for each user.

Table structure:

CREATE TABLE twitter_comments (
  id INT AUTO_INCREMENT NOT NULL PRIMARY KEY,
  user_id INT,
  likes INT,
  published_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Enter fullscreen mode Exit fullscreen mode

There are at least three ways we can do that.

Solution 1

SELECT
  tw.user_id,
  SUM(tw.likes) AS total_likes
  FROM twitter_comments AS tw
  JOIN (
    SELECT
      user_id,
      GROUP_CONCAT(id ORDER BY published_at DESC) AS comments_ids
      FROM twitter_comments
    GROUP BY user_id
  ) AS t ON t.user_id = tw.user_id
 WHERE FIND_IN_SET(tw.id, t.comments_ids) BETWEEN 1 AND 3
 GROUP BY tw.user_id;
Enter fullscreen mode Exit fullscreen mode

GROUP_CONCAT will return a list of comments ids per user:

+---------+----------------+
| user_id | comments_ids   |
+---------+----------------+
|       1 | 6,5,4,3,2,1    |
|       2 | 12,11,10,9,8,7 |
+---------+----------------+
Enter fullscreen mode Exit fullscreen mode

FIND_IN_SET will return the index of a current comment id in a list of all comments(for a specific user). In our case, an index value BETWEEN 1 AND 3 should match the latest three comments.

Possible issues: GROUP_CONCAT has a maximum length and if a user has many comments, you can easily hit the limit. But since we are interested only in the top 3,5 or 10 results, I do not think it should be a problem.

SQL Fiddle example

Solution 2

Similar to solution#1, but this time we'll use SUBSTRING_INDEX and LOCATE.

SELECT
  tw.user_id,
  SUM(tw.likes) AS total_likes
  FROM twitter_comments AS tw
  JOIN (
    SELECT
      user_id,
      SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY published_at DESC), ',', 3) AS comments_ids
      FROM twitter_comments
    GROUP BY user_id
  ) AS t ON t.user_id = tw.user_id
 WHERE LOCATE(tw.id, t.comments_ids) > 0
 GROUP BY tw.user_id;
Enter fullscreen mode Exit fullscreen mode

SUBSTRING_INDEX in combination with GROUP_CONCAT will return already trimmed result(latest comments ids)

+---------+--------------+
| user_id | comments_ids |
+---------+--------------+
|       1 | 6,5,4        |
|       2 | 12,11,10     |
+---------+--------------+
Enter fullscreen mode Exit fullscreen mode

And LOCATE will help us to check if a current comment id is in that list:

SELECT LOCATE('6', '6,5,4');
+----------------------+
| LOCATE('6', '6,5,4') |
+----------------------+
|                    1 |
+----------------------+
Enter fullscreen mode Exit fullscreen mode

SQL Fiddle example

Solution 3

Use session variables to generate rank/row number for each comment(per user). Smth like that:

+----+---------+-------+---------------------+------+--------------+
| id | user_id | likes | published_at        | rank | curr_user_id |
+----+---------+-------+---------------------+------+--------------+
|  6 |       1 |   200 | 2021-04-05 14:01:00 |    1 |            1 |
|  5 |       1 |   100 | 2021-04-05 13:01:00 |    2 |            1 |
|  4 |       1 |   200 | 2021-04-05 12:01:00 |    3 |            1 |
|  3 |       1 |   300 | 2021-04-05 11:01:00 |    4 |            1 |
|  2 |       1 |   200 | 2021-04-05 10:01:00 |    5 |            1 |
|  1 |       1 |   100 | 2021-04-05 09:01:00 |    6 |            1 |
| 12 |       2 |   100 | 2021-05-05 14:01:00 |    1 |            2 |
| 11 |       2 |   200 | 2021-05-05 13:01:00 |    2 |            2 |
| 10 |       2 |   300 | 2021-05-05 12:01:00 |    3 |            2 |
|  9 |       2 |   100 | 2021-05-05 11:01:00 |    4 |            2 |
|  8 |       2 |   300 | 2021-05-05 10:01:00 |    5 |            2 |
|  7 |       2 |   200 | 2021-05-05 09:01:00 |    6 |            2 |
+----+---------+-------+---------------------+------+--------------+
Enter fullscreen mode Exit fullscreen mode

Full example:

SET @curr_count = 0, @curr_user_id = 0;
SELECT
  user_id,
  SUM(likes) AS total_likes
  FROM (
    SELECT
      *,
      ( @curr_count := IF ( @curr_value = user_id, @curr_count + 1, 1 ) ) AS rank,
      ( @curr_value := user_id ) AS curr_user_id
      FROM twitter_comments
     ORDER BY user_id ASC, published_at DESC
  ) AS comments
 WHERE rank <= 3
 GROUP BY user_id;
Enter fullscreen mode Exit fullscreen mode

SQL Fiddle example


Here is how we can achieve the same result using Window Functions in Postgres: Practical example for Ranking with Window Functions


Helpful links:

Image of Stellar post

From Hackathon to Funded - Stellar Dev Diaries Ep. 1 🎥

Ever wondered what it takes to go from idea to funding? In episode 1 of the Stellar Dev Diaries, we hear how the Freelii team did just that. Check it out and follow along to see the rest of their dev journey!

Watch the video

Top comments (0)

Image of Stellar post

🚀 Stellar Dev Diaries Series: Episode 1 is LIVE!

Ever wondered what it takes to build a web3 startup from scratch? In the Stellar Dev Diaries series, we follow the journey of a team of developers building on the Stellar Network as they go from hackathon win to getting funded and launching on mainnet.

Read more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay