Mysql 5.7 Window Functions Workaround

#sql #mysql #database

If you work a lot with data in SQL, sooner or later you will have a task to find top N results/row per group. Window functions provide great help there. But MySQL 5.7 does not support them, only MySQL 8.0.

There are several ways we can solve this task.

Let's say we have the following task: we have a list of tweets/posts (with user_id, pub date, likes count) and we want to get SUM likes for top-N tweets for each user.

Table structure:

CREATE TABLE twitter_comments (
  id INT AUTO_INCREMENT NOT NULL PRIMARY KEY,
  user_id INT,
  likes INT,
  published_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

There are at least three ways we can do that.

Solution 1

SELECT
  tw.user_id,
  SUM(tw.likes) AS total_likes
  FROM twitter_comments AS tw
  JOIN (
    SELECT
      user_id,
      GROUP_CONCAT(id ORDER BY published_at DESC) AS comments_ids
      FROM twitter_comments
    GROUP BY user_id
  ) AS t ON t.user_id = tw.user_id
 WHERE FIND_IN_SET(tw.id, t.comments_ids) BETWEEN 1 AND 3
 GROUP BY tw.user_id;

GROUP_CONCAT will return a list of comments ids per user:

+---------+----------------+
| user_id | comments_ids   |
+---------+----------------+
|       1 | 6,5,4,3,2,1    |
|       2 | 12,11,10,9,8,7 |
+---------+----------------+

FIND_IN_SET will return the index of a current comment id in a list of all comments(for a specific user). In our case, an index value BETWEEN 1 AND 3 should match the latest three comments.

Possible issues: GROUP_CONCAT has a maximum length and if a user has many comments, you can easily hit the limit. But since we are interested only in the top 3,5 or 10 results, I do not think it should be a problem.

SQL Fiddle example

Solution 2

Similar to solution#1, but this time we'll use SUBSTRING_INDEX and LOCATE.

SELECT
  tw.user_id,
  SUM(tw.likes) AS total_likes
  FROM twitter_comments AS tw
  JOIN (
    SELECT
      user_id,
      SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY published_at DESC), ',', 3) AS comments_ids
      FROM twitter_comments
    GROUP BY user_id
  ) AS t ON t.user_id = tw.user_id
 WHERE LOCATE(tw.id, t.comments_ids) > 0
 GROUP BY tw.user_id;

SUBSTRING_INDEX in combination with GROUP_CONCAT will return already trimmed result(latest comments ids)

+---------+--------------+
| user_id | comments_ids |
+---------+--------------+
|       1 | 6,5,4        |
|       2 | 12,11,10     |
+---------+--------------+

And LOCATE will help us to check if a current comment id is in that list:

SELECT LOCATE('6', '6,5,4');
+----------------------+
| LOCATE('6', '6,5,4') |
+----------------------+
|                    1 |
+----------------------+

SQL Fiddle example

Solution 3

Use session variables to generate rank/row number for each comment(per user). Smth like that:

+----+---------+-------+---------------------+------+--------------+
| id | user_id | likes | published_at        | rank | curr_user_id |
+----+---------+-------+---------------------+------+--------------+
|  6 |       1 |   200 | 2021-04-05 14:01:00 |    1 |            1 |
|  5 |       1 |   100 | 2021-04-05 13:01:00 |    2 |            1 |
|  4 |       1 |   200 | 2021-04-05 12:01:00 |    3 |            1 |
|  3 |       1 |   300 | 2021-04-05 11:01:00 |    4 |            1 |
|  2 |       1 |   200 | 2021-04-05 10:01:00 |    5 |            1 |
|  1 |       1 |   100 | 2021-04-05 09:01:00 |    6 |            1 |
| 12 |       2 |   100 | 2021-05-05 14:01:00 |    1 |            2 |
| 11 |       2 |   200 | 2021-05-05 13:01:00 |    2 |            2 |
| 10 |       2 |   300 | 2021-05-05 12:01:00 |    3 |            2 |
|  9 |       2 |   100 | 2021-05-05 11:01:00 |    4 |            2 |
|  8 |       2 |   300 | 2021-05-05 10:01:00 |    5 |            2 |
|  7 |       2 |   200 | 2021-05-05 09:01:00 |    6 |            2 |
+----+---------+-------+---------------------+------+--------------+

Full example:

SET @curr_count = 0, @curr_user_id = 0;
SELECT
  user_id,
  SUM(likes) AS total_likes
  FROM (
    SELECT
      *,
      ( @curr_count := IF ( @curr_value = user_id, @curr_count + 1, 1 ) ) AS rank,
      ( @curr_value := user_id ) AS curr_user_id
      FROM twitter_comments
     ORDER BY user_id ASC, published_at DESC
  ) AS comments
 WHERE rank <= 3
 GROUP BY user_id;

SQL Fiddle example

Here is how we can achieve the same result using Window Functions in Postgres: Practical example for Ranking with Window Functions

Helpful links:

stackoverflow discussion
ranking example
many people recommend this article How to Select the First/Least/Max Row per Group in SQL

From Hackathon to Funded - Stellar Dev Diaries Ep. 1 🎥

Ever wondered what it takes to go from idea to funding? In episode 1 of the Stellar Dev Diaries, we hear how the Freelii team did just that. Check it out and follow along to see the rest of their dev journey!

Watch the video

Top comments (0)

🚀 Stellar Dev Diaries Series: Episode 1 is LIVE!

Ever wondered what it takes to build a web3 startup from scratch? In the Stellar Dev Diaries series, we follow the journey of a team of developers building on the Stellar Network as they go from hackathon win to getting funded and launching on mainnet.