DEV Community

Discussion on: Digging into Postgresql and DEV

Collapse
 
wiseai profile image
Mahmoud Harmouch • Edited

This is a good starting point. But I think you can improve it by avoiding correlated subqueries(user_id appears in the inner and outer tables); your query will result in the following:

SELECT 
  user_comment_counts_by_week.user_id, 
  array_to_string(
    array_agg(weeks_ago), 
    ','
  ) AS serialized_weeks_ago, 
  array_agg(weeks_ago) AS weeks_ago_array, 
  array_to_string(
    array_agg(
      number_of_comments_with_positive_reaction
    ), 
    ','
  ) AS serialized_comment_counts 
FROM 
  (
    SELECT 
      comment_counts.user_id, 
      COUNT(comment_counts.user_id) AS number_of_comments_with_positive_reaction, 
      (
        trunc(
          (
            EXTRACT(
              epoch 
              FROM 
                (current_timestamp - created_at)
            )
          ) / 604800
        )
      ) AS weeks_ago 
    FROM 
      comments 
      INNER JOIN (
        SELECT 
          DISTINCT reactions.reactable_id 
        FROM 
          reactions 
        WHERE 
          reactions.reactable_type = 'Comment' 
          AND reactions.created_at > (now() - INTERVAL '224' day) 
        EXCEPT 
        SELECT 
          DISTINCT reactions.reactable_id 
        FROM 
          reactions 
        WHERE 
          reactions.reactable_type = 'Comment' 
          AND reactions.created_at > (now() - INTERVAL '224' day) 
          AND reactions.category IN ('thumbsdown', 'vomit')
      ) AS positve_reactions ON comments.id = positve_reactions.reactable_id 
      INNER JOIN (
        SELECT 
          count(comments.id) AS number_of_comments, 
          comments.user_id AS comment_counts_user_id 
        FROM 
          comments 
        WHERE 
          comments.created_at >= (now() - INTERVAL '7' day) 
        GROUP BY 
          comments.user_id
      ) AS comment_counts ON comments.user_id = comment_counts_user_id 
      AND comment_counts.number_of_comments > 1 
    WHERE 
      created_at > (now() - INTERVAL '224' day) 
    GROUP BY 
      comment_counts.user_id, 
      weeks_ago
  ) AS user_comment_counts_by_week 
GROUP BY 
  user_comment_counts_by_week.user_id
Enter fullscreen mode Exit fullscreen mode

I didn't run it cause I don't have the data to test it, but I think it should do the trick. Before running that query, you can create indexes for faster data retrieval:

CREATE INDEX comments_idx_user_id ON "comments" ("user_id");
CREATE INDEX comments_idx_created_at ON "comments" ("created_at");
CREATE INDEX reactions_idx_reactable_type_created_at ON "reactions" ("reactable_type","created_at");
Enter fullscreen mode Exit fullscreen mode

I am looking forward to knowing the result of the above queries.

Collapse
 
jeremyf profile image
Jeremy Friesen

I transcribed the above query into the following and ran it locally:

SELECT comment_counts_user_id,
       array_to_string(array_agg(weeks_ago),',') AS serialized_weeks_ago,
       array_agg(weeks_ago) AS weeks_ago_array,
       array_to_string(array_agg(number_of_comments_with_positive_reaction),',') AS serialized_comment_counts 
FROM (SELECT
    comment_counts_user_id,
        COUNT(comment_counts_user_id) AS number_of_comments_with_positive_reaction,
        (trunc((EXTRACT(epoch FROM (current_timestamp - created_at))) / 604800)) AS weeks_ago 
    FROM comments 
        INNER JOIN (
                SELECT DISTINCT reactions.reactable_id 
                FROM reactions 
                WHERE reactions.reactable_type = 'Comment' 
                    AND reactions.created_at > (now() - INTERVAL '224' day)
        EXCEPT SELECT DISTINCT reactions.reactable_id 
        FROM reactions 
        WHERE reactions.reactable_type = 'Comment' 
                    AND reactions.created_at > (now() - INTERVAL '224' day) 
            AND reactions.category IN ('thumbsdown', 'vomit')
            )
        AS positve_reactions 
            ON comments.id = positve_reactions.reactable_id 
    INNER JOIN (
                SELECT count(comments.id) AS number_of_comments,
                       comments.user_id AS comment_counts_user_id 
                FROM comments 
                WHERE comments.created_at >= (now() - INTERVAL '7' day) 
                GROUP BY comments.user_id
                )
        AS comment_counts 
                ON comments.user_id = comment_counts_user_id 
                AND comment_counts.number_of_comments > 1 
    WHERE created_at > (now() - INTERVAL '224' day) 
    GROUP BY comment_counts_user_id, weeks_ago)
AS user_comment_counts_by_week 
GROUP BY comment_counts_user_id
Enter fullscreen mode Exit fullscreen mode

The query you provided EXPLAIN ANALYZE had an almost negligible slow-down but appears to be generally as performant.

The indexes we have are in github.com/forem/forem/blob/8b37e3.... We don't have the multi-column index but each of those columns are indexed.

Collapse
 
wiseai profile image
Mahmoud Harmouch

That looks neat. Now, I am wondering if it is possible to do data partitioning(using the OVER clause instead of GROUP BY). What do you think?

Thread Thread
 
jeremyf profile image
Jeremy Friesen

I haven't used the OVER clause before. Looking at the docs now.

Thread Thread
 
wiseai profile image
Mahmoud Harmouch

It is something beyond our understanding. I feel like aliens have contacted us after inventing this clause. It is pretty funny to mention that I totally forget how it works whenever I decide to use it. Oh my! My brain hurts.

Thread Thread
 
jeremyf profile image
Jeremy Friesen

It is pretty funny to mention that I totally forget how it works whenever I decide to use it. Oh my! My brain hurts.

Any sufficiently advanced technology is indistinguishable from magic.

Thread Thread
 
wiseai profile image
Mahmoud Harmouch

So true!