Andy Zhao (he/him)

Posted on Apr 18, 2019

How I do solve this impossible SQL query?

#sql #postgres #help

So I'm trying to gather some data from dev.to's database. Unfortunately, SQL is not my strong suit and I've been mostly working on the Ruby + Rails side of the app. Impossible is probably not the right word, but it seems not-so-possible to me since I've been working on this for 4 hours now. 😭

I have a few tables I'm trying to get data from:

articles table,
collections table,
reactions table, and
page_views table.

The articles table has a few columns I need: id, title, published_at, positive_reactions_count, and cached_tag_list.

The collections table has only one column I need: slug. It has a direct relationship with articles where articles.collection_id = collections.id.

The reactions table polymorphic relationship with the articles table, where a reaction has a reactable_id = articles.id and reactable_type = 'Article'. I'll need to SUM up the total number of likes, unicorns, and reading list reactions for each post.

The page_views table has a three columns I need: counts_for_number_of_views, time_tracked_in_seconds, referrer. It has a direct relationship with articles where page_views.article_id = articles.id. I'll need to SUM up the counts_for_number_of_views and AVG the time_tracked_in_seconds.

I'm trying to display a table result that looks like this:

article_id	title	published_at	cached_tag_list	series_title	total_reactions	total_likes	total_unicorns	total_bookmarks	total_views	average_time_spent_in_seconds	referrer
1	this post	2019-04-18	sql,postgresql,help	"SQL help"	1	1	1	5	30	https://dev.to

So far, I have an invalid query -- and some other awful variations that I don't understand -- that looks like this:

SELECT articles.id,
      articles.title,
      articles.published_at,
      articles.positive_reactions_count AS total_reactions,
      articles.cached_tag_list,
      collections.slug AS series_title,
      (SELECT COUNT(reactions.id) from reactions where reactions.category = 'like' AND reactions.reactable_id = articles.id AND reactions.reactable_type = 'Article') AS total_likes,
      (SELECT COUNT(reactions.id) from reactions where reactions.category = 'unicorn' AND reactions.reactable_id = articles.id AND reactions.reactable_type = 'Article') AS total_unicorns,
      (SELECT COUNT(reactions.id) from reactions where reactions.category = 'readinglist' AND reactions.reactable_id = articles.id AND reactions.reactable_type = 'Article') AS total_bookmarks,
      (SELECT SUM(page_views.counts_for_number_of_views) from page_views where page_views.article_id = articles.id) AS total_views,
      (SELECT AVG(page_views.time_tracked_in_seconds) from page_views where page_views.article_id = articles.id) AS total_views,
FROM articles
FULL JOIN collections ON collections.id = articles.collection_id
INNER JOIN reactions ON reactions.reactable_id = articles.id AND reactions.reactable_type = 'Article'
INNER JOIN page_views ON page_views.article_id = articles.id
WHERE articles.user_id = my_id
  AND articles.published = TRUE

I think the query and the example result table helps with what I'm trying to portray. This is definitely out of my league but perhaps it's much easier than I think it is?

Update:

I added an additional AND articles.id = my_article_id to make the query faster. Looks like it runs! However, I'm getting 952 rows for some reason. I think my JOIN usage might be incorrect...

Latest comments (14)

Periklis Gkolias • Apr 20 '19

Just curious, why full join and not inner?

rhymes • Apr 19 '19 • Edited

Hi @andy , I think it can be simplified.

First thing first, there are a few odd things at first glance.

There's a FULL JOIN between articles and collections but I don't think the query is interested in collections that have no correspondent articles (also because I don't think they can exist in the DB right now). A full join (also called a full outer join) returns all rows of the first table, all rows of the second table and use NULL to fill the gaps.

So the first thing we can do is change

FULL JOIN collections ON collections.id = articles.collection_id

to a simple join

JOIN collections ON collections.id = articles.collection_id

The second odd thing is that the query is joining articles with reactions and page_views but then it disregards the join. How do I know you're not interested in the result of this additional joins? In the outermost SELECT:

SELECT articles.id,
       articles.title,
       articles.published_at,
       articles.positive_reactions_count AS total_reactions,
       articles.cached_tag_list,
       collections.slug AS series_title,

the query only retrieves fields from articles and collections.

At this point one might say: "but rhymes, the query is also counting, summing and averaging" data from reactions and page_views. Yes, but this data is loaded with subqueries that are already filtering with a crucial WHERE statement, without having to join to anything. Now that we know this we can remove the outermost joins as well:

INNER JOIN reactions ON reactions.reactable_id = articles.id AND reactions.reactable_type = 'Article'
INNER JOIN page_views ON page_views.article_id = articles.id

The last odd thing is about the aliasing of the last two aggregated queries, they currently have the same name, I would change the last one to:

(SELECT AVG(page_views.time_tracked_in_seconds)
     from page_views
     where page_views.article_id = articles.id) AS average_time_spent_on_article

change the name to something more meaningful, I'm just guessing what it does

So, at this point our query is:

SELECT articles.id,
       articles.title,
       articles.published_at,
       articles.positive_reactions_count AS total_reactions,
       articles.cached_tag_list,
       collections.slug AS series_title,

    (SELECT COUNT(reactions.id)
     from reactions
     where reactions.category = 'like'
         AND reactions.reactable_id = articles.id
         AND reactions.reactable_type = 'Article') AS total_likes,

    (SELECT COUNT(reactions.id)
     from reactions
     where reactions.category = 'unicorn'
         AND reactions.reactable_id = articles.id
         AND reactions.reactable_type = 'Article') AS total_unicorns,

    (SELECT COUNT(reactions.id)
     from reactions
     where reactions.category = 'readinglist'
         AND reactions.reactable_id = articles.id
         AND reactions.reactable_type = 'Article') AS total_bookmarks,

    (SELECT SUM(page_views.counts_for_number_of_views)
     from page_views
     where page_views.article_id = articles.id) AS total_views,

    (SELECT AVG(page_views.time_tracked_in_seconds)
     from page_views
     where page_views.article_id = articles.id) AS average_time_spent_on_article
FROM articles
LEFT JOIN collections ON collections.id = articles.collection_id
WHERE articles.user_id = 13
    AND articles.published = TRUE;

Much more readable isn't it?

Is there more room for optimization? Certainly. Should we do it? Well, it depends on the performance. Subqueries are not super efficient but they are certainly more readable.

I'm going to leave here the query plan for reference:

+--------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                   |
|--------------------------------------------------------------------------------------------------------------|
| Hash Join  (cost=8.17..60.18 rows=1 width=163)                                                               |
|   Hash Cond: (collections.id = articles.collection_id)                                                       |
|   ->  Seq Scan on collections  (cost=0.00..13.70 rows=370 width=36)                                          |
|   ->  Hash  (cost=8.15..8.15 rows=1 width=71)                                                                |
|         ->  Index Scan using index_articles_on_user_id on articles  (cost=0.14..8.15 rows=1 width=71)        |
|               Index Cond: (user_id = 13)                                                                     |
|               Filter: published                                                                              |
|   SubPlan 1                                                                                                  |
|     ->  Aggregate  (cost=11.30..11.31 rows=1 width=8)                                                        |
|           ->  Bitmap Heap Scan on reactions  (cost=4.17..11.29 rows=1 width=4)                               |
|                 Recheck Cond: ((reactable_type)::text = 'Article'::text)                                     |
|                 Filter: (((category)::text = 'like'::text) AND (reactable_id = articles.id))                 |
|                 ->  Bitmap Index Scan on index_reactions_on_reactable_type  (cost=0.00..4.17 rows=3 width=0) |
|                       Index Cond: ((reactable_type)::text = 'Article'::text)                                 |
|   SubPlan 2                                                                                                  |
|     ->  Aggregate  (cost=11.30..11.31 rows=1 width=8)                                                        |
|           ->  Bitmap Heap Scan on reactions reactions_1  (cost=4.17..11.29 rows=1 width=4)                   |
|                 Recheck Cond: ((reactable_type)::text = 'Article'::text)                                     |
|                 Filter: (((category)::text = 'unicorn'::text) AND (reactable_id = articles.id))              |
|                 ->  Bitmap Index Scan on index_reactions_on_reactable_type  (cost=0.00..4.17 rows=3 width=0) |
|                       Index Cond: ((reactable_type)::text = 'Article'::text)                                 |
|   SubPlan 3                                                                                                  |
|     ->  Aggregate  (cost=11.30..11.31 rows=1 width=8)                                                        |
|           ->  Bitmap Heap Scan on reactions reactions_2  (cost=4.17..11.29 rows=1 width=4)                   |
|                 Recheck Cond: ((reactable_type)::text = 'Article'::text)                                     |
|                 Filter: (((category)::text = 'readinglist'::text) AND (reactable_id = articles.id))          |
|                 ->  Bitmap Index Scan on index_reactions_on_reactable_type  (cost=0.00..4.17 rows=3 width=0) |
|                       Index Cond: ((reactable_type)::text = 'Article'::text)                                 |
|   SubPlan 4                                                                                                  |
|     ->  Aggregate  (cost=1.49..1.50 rows=1 width=8)                                                          |
|           ->  Seq Scan on page_views  (cost=0.00..1.48 rows=6 width=4)                                       |
|                 Filter: (article_id = articles.id)                                                           |
|   SubPlan 5                                                                                                  |
|     ->  Aggregate  (cost=1.49..1.50 rows=1 width=32)                                                         |
|           ->  Seq Scan on page_views page_views_1  (cost=0.00..1.48 rows=6 width=4)                          |
|                 Filter: (article_id = articles.id)                                                           |
+--------------------------------------------------------------------------------------------------------------+

There's a lot of "sub plans" (each sub query is indicated as a "sub plan") and at least two sequential scans we might want to investigate later on: the one on collections and the one on page views.
But let's move on from the sequential scans for now, we can maybe come back to them later.

There's a possible improvement for the query in its current state, using... GROUP BY :D.

GROUP BY

What the query is essentially asking is to return a list of all published articles of the given user with some aggregates. The word aggregation is a giant clue in SQL, whenever you have to aggregate data from one or multiple tables you might want to investigate using grouping.

So, what do we want to aggregate? As we've seen we want the count of all likes, of all unicorns, of all bookmarks, the total of page views and the average time spent on an article.

How to do it?

If likes, unicorns, bookmarks had been in different tables we could have done something like and called it a day:

SELECT articles.id, articles.title, COUNT(likes.id), COUNT(reactions.id)
FROM articles
LEFT JOIN likes ON articles.id = likes.article_id,
LEFT JOIN unicorns ON articles.id = unicorns.article_id
WHERE articles.user_id = 13 AND articles.published = TRUE
GROUP BY articles.id, articles.title;

(the LEFT JOIN is because we want to return articles without likes or unicorns I guessed, it will add 0 to the column)

You can see it working if we simplify the real query to:

SELECT articles.id, articles.title, SUM(page_views.counts_for_number_of_views) AS total_views
FROM articles
LEFT JOIN page_views ON articles.id = page_views.article_id
WHERE articles.user_id = 13 AND articles.published = TRUE
GROUP BY articles.id, articles.title;

This is a quick verification that it's working from my local copy:

PracticalDeveloper_development> SELECT articles.id, articles.title, SUM(page_views.counts_for_number_of_views) AS total_views
 FROM articles
 LEFT JOIN page_views ON articles.id = page_views.article_id
 WHERE articles.user_id = 13 AND articles.published = TRUE
 GROUP BY articles.id, articles.title;

+------+--------------+---------------+
| id   | title        | total_views   |
|------+--------------+---------------|
| 26   | test title   | 7             |
| 27   | Test         | 2             |
| 28   | Test with v2 | 2             |
| 29   | title        | 3             |
+------+--------------+---------------+
SELECT 4
Time: 0.062s
PracticalDeveloper_development> select SUM(page_views.counts_for_number_of_views) from page_views where article_id = 26;
+-------+
| sum   |
|-------|
| 7     |
+-------+
SELECT 1
Time: 0.017s

Since they are in the same table we need to resort to either subqueries as you've done or we need to unearth filtered aggregates. What are they?

As we've seen in this query above aggregates expression enable the database to compute an aggregate on a list of selected rows (ie. counting all likes that have the requested article id). A filtered aggregate allows you to ask for an aggregate but filter on some condition. Let's try:

SELECT articles.id, articles.title,
  COUNT(reactions.id) FILTER (WHERE reactions.category = 'like') AS total_likes
FROM articles
LEFT JOIN reactions ON reactions.reactable_id = articles.id AND reactions.reactable_type = 'Article'
WHERE articles.user_id = 13 AND articles.published = TRUE
GROUP BY articles.id, articles.title;

Example result:

PracticalDeveloper_development> SELECT articles.id, articles.title,
   COUNT(reactions.id) FILTER (WHERE reactions.category = 'like') AS total_likes
 FROM articles
 LEFT JOIN reactions ON reactions.reactable_id = articles.id AND reactions.reactable_type = 'Article'
 WHERE articles.user_id = 13 AND articles.published = TRUE
 GROUP BY articles.id, articles.title;
+------+--------------+---------------+
| id   | title        | total_likes   |
|------+--------------+---------------|
| 26   | test title   | 1             |
| 27   | Test         | 0             |
| 28   | Test with v2 | 0             |
| 29   | title        | 0             |
+------+--------------+---------------+

With this in mind we can rewrite the original query like this:

SELECT articles.id,
       articles.title,
       articles.published_at,
       articles.positive_reactions_count AS total_reactions,
       articles.cached_tag_list,
       collections.slug AS series_title,
       COUNT(reactions.id) FILTER (
                                   WHERE reactions.category = 'like') AS total_likes,
       COUNT(reactions.id) FILTER (
                                   WHERE reactions.category = 'unicorn') AS total_unicorns,
       COUNT(reactions.id) FILTER (
                                   WHERE reactions.category = 'readinglist') AS total_bookmarks,
       SUM(page_views.counts_for_number_of_views) AS total_views,
       AVG(page_views.time_tracked_in_seconds) AS average_time_spent_on_article
FROM articles
LEFT JOIN collections ON collections.id = articles.collection_id
LEFT JOIN reactions ON reactions.reactable_id = articles.id
AND reactions.reactable_type = 'Article'
LEFT JOIN page_views ON page_views.article_id = articles.id
WHERE articles.user_id = 13
    AND articles.published = TRUE
GROUP BY articles.id,
         articles.title,
         articles.published_at,
         articles.positive_reactions_count,
         articles.cached_tag_list,
         collections.slug;

This is the result:

+------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
| id   | title        | published_at               | total_reactions   | cached_tag_list   | series_title   | total_likes   | total_unicorns   | total_bookmarks   | total_views   | average_time_spent_on_article   |
|------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------|
| 26   | test title   | 2019-04-08 11:01:37.354794 | 0                 | tags              | <null>         | 7             | 0                | 0                 | 7             | 192.8571428571428571            |
| 27   | Test         | 2019-04-10 10:50:43.777291 | 0                 | test              | <null>         | 0             | 0                | 0                 | 2             | 97.5000000000000000             |
| 28   | Test with v2 | 2019-04-10 10:51:26.636466 | 0                 | test              | <null>         | 0             | 0                | 0                 | 2             | 15.0000000000000000             |
| 29   | title        | 2019-04-10 19:57:58.779827 | 0                 |                   | <null>         | 0             | 0                | 0                 | 3             | 15.0000000000000000             |
+------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+

As we you can see all of the user's article are there, this is what happens if instead of using a LEFT JOIN I use a normal join with the collections:

+------+---------+----------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
| id   | title   | published_at   | total_reactions   | cached_tag_list   | series_title   | total_likes   | total_unicorns   | total_bookmarks   | total_views   | average_time_spent_on_article   |
|------+---------+----------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------|
+------+---------+----------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
SELECT 0

Since I have no articles present in collections. I have no results.

Let's see the query plan:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                                              |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| GroupAggregate  (cost=36.45..36.49 rows=1 width=163) (actual time=0.188..0.194 rows=4 loops=1)                                                                          |
|   Group Key: articles.id, collections.slug                                                                                                                              |
|   ->  Sort  (cost=36.45..36.45 rows=1 width=143) (actual time=0.174..0.175 rows=14 loops=1)                                                                             |
|         Sort Key: articles.id, collections.slug                                                                                                                         |
|         Sort Method: quicksort  Memory: 26kB                                                                                                                            |
|         ->  Nested Loop Left Join  (cost=12.34..36.44 rows=1 width=143) (actual time=0.077..0.162 rows=14 loops=1)                                                      |
|               Join Filter: (page_views.article_id = articles.id)                                                                                                        |
|               Rows Removed by Join Filter: 154                                                                                                                          |
|               ->  Nested Loop Left Join  (cost=12.34..34.58 rows=1 width=135) (actual time=0.067..0.101 rows=4 loops=1)                                                 |
|                     Join Filter: (reactions.reactable_id = articles.id)                                                                                                 |
|                     Rows Removed by Join Filter: 3                                                                                                                      |
|                     ->  Hash Right Join  (cost=8.17..23.27 rows=1 width=99) (actual time=0.046..0.050 rows=4 loops=1)                                                   |
|                           Hash Cond: (collections.id = articles.collection_id)                                                                                          |
|                           ->  Seq Scan on collections  (cost=0.00..13.70 rows=370 width=36) (actual time=0.004..0.004 rows=0 loops=1)                                   |
|                           ->  Hash  (cost=8.15..8.15 rows=1 width=71) (actual time=0.028..0.028 rows=4 loops=1)                                                         |
|                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                            |
|                                 ->  Index Scan using index_articles_on_user_id on articles  (cost=0.14..8.15 rows=1 width=71) (actual time=0.018..0.023 rows=4 loops=1) |
|                                       Index Cond: (user_id = 13)                                                                                                        |
|                                       Filter: published                                                                                                                 |
|                     ->  Bitmap Heap Scan on reactions  (cost=4.17..11.28 rows=3 width=40) (actual time=0.008..0.008 rows=1 loops=4)                                     |
|                           Recheck Cond: ((reactable_type)::text = 'Article'::text)                                                                                      |
|                           Heap Blocks: exact=4                                                                                                                          |
|                           ->  Bitmap Index Scan on index_reactions_on_reactable_type  (cost=0.00..4.17 rows=3 width=0) (actual time=0.006..0.006 rows=1 loops=4)        |
|                                 Index Cond: ((reactable_type)::text = 'Article'::text)                                                                                  |
|               ->  Seq Scan on page_views  (cost=0.00..1.38 rows=38 width=16) (actual time=0.003..0.007 rows=42 loops=4)                                                 |
| Planning Time: 0.576 ms                                                                                                                                                 |
| Execution Time: 0.332 ms                                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I would stop here, I don't have a huge dataset to test so you might have to run this second query and see for yourself (let me know if everything works).

rhymes • Apr 19 '19 • Edited

(I just noticed there's a bug in the GROUP BY function so I suggest using the subselects at the top of my comment in the meantime. I'll fix it later :D)

rhymes • Apr 19 '19

Found it, I missed the DISTINCT. So, the final queries:

With sub selects

PracticalDeveloper_development> SELECT articles.id,
        articles.title,
        articles.published_at,
        articles.positive_reactions_count AS total_reactions,
        articles.cached_tag_list,
        collections.slug AS series_title,

     (SELECT COUNT(reactions.id)
      from reactions
      where reactions.category = 'like'
          AND reactions.reactable_id = articles.id
          AND reactions.reactable_type = 'Article') AS total_likes,

     (SELECT COUNT(reactions.id)
      from reactions
      where reactions.category = 'unicorn'
          AND reactions.reactable_id = articles.id
          AND reactions.reactable_type = 'Article') AS total_unicorns,

     (SELECT COUNT(reactions.id)
      from reactions
      where reactions.category = 'readinglist'
          AND reactions.reactable_id = articles.id
          AND reactions.reactable_type = 'Article') AS total_bookmarks,

     (SELECT SUM(page_views.counts_for_number_of_views)
      from page_views
      where page_views.article_id = articles.id) AS total_views,

     (SELECT AVG(page_views.time_tracked_in_seconds)
      from page_views
      where page_views.article_id = articles.id) AS average_time_spent_on_article
 FROM articles
 LEFT JOIN collections ON collections.id = articles.collection_id
 WHERE articles.user_id = 13
     AND articles.published = TRUE;
+------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
| id   | title        | published_at               | total_reactions   | cached_tag_list   | series_title   | total_likes   | total_unicorns   | total_bookmarks   | total_views   | average_time_spent_on_article   |
|------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------|
| 26   | test title   | 2019-04-08 11:01:37.354794 | 0                 | tags              | <null>         | 1             | 0                | 0                 | 7             | 192.8571428571428571            |
| 27   | Test         | 2019-04-10 10:50:43.777291 | 0                 | test              | <null>         | 0             | 0                | 0                 | 2             | 97.5000000000000000             |
| 28   | Test with v2 | 2019-04-10 10:51:26.636466 | 0                 | test              | <null>         | 0             | 0                | 0                 | 2             | 15.0000000000000000             |
| 29   | title        | 2019-04-10 19:57:58.779827 | 0                 |                   | <null>         | 0             | 0                | 0                 | 3             | 15.0000000000000000             |
+------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
SELECT 4

With GROUP by

PracticalDeveloper_development> SELECT articles.id,
        articles.title,
        articles.published_at,
        articles.positive_reactions_count AS total_reactions,
        articles.cached_tag_list,
        collections.slug AS series_title,
        COUNT(DISTINCT reactions.id) FILTER (
                                             WHERE reactions.category = 'like') AS total_likes,
        COUNT(DISTINCT reactions.id) FILTER (
                                             WHERE reactions.category = 'unicorn') AS total_unicorns,
        COUNT(DISTINCT reactions.id) FILTER (
                                             WHERE reactions.category = 'readinglist') AS total_bookmarks,
        SUM(page_views.counts_for_number_of_views) AS total_views,
        AVG(page_views.time_tracked_in_seconds) AS average_time_spent_on_article
 FROM articles
 LEFT JOIN collections ON collections.id = articles.collection_id
 LEFT JOIN reactions ON reactions.reactable_id = articles.id
 AND reactions.reactable_type = 'Article'
 LEFT JOIN page_views ON page_views.article_id = articles.id
 WHERE articles.user_id = 13
     AND articles.published = TRUE
 GROUP BY articles.id,
          articles.title,
          articles.published_at,
          articles.positive_reactions_count,
          articles.cached_tag_list,
          collections.slug;
+------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
| id   | title        | published_at               | total_reactions   | cached_tag_list   | series_title   | total_likes   | total_unicorns   | total_bookmarks   | total_views   | average_time_spent_on_article   |
|------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------|
| 26   | test title   | 2019-04-08 11:01:37.354794 | 0                 | tags              | <null>         | 1             | 0                | 0                 | 7             | 192.8571428571428571            |
| 27   | Test         | 2019-04-10 10:50:43.777291 | 0                 | test              | <null>         | 0             | 0                | 0                 | 2             | 97.5000000000000000             |
| 28   | Test with v2 | 2019-04-10 10:51:26.636466 | 0                 | test              | <null>         | 0             | 0                | 0                 | 2             | 15.0000000000000000             |
| 29   | title        | 2019-04-10 19:57:58.779827 | 0                 |                   | <null>         | 0             | 0                | 0                 | 3             | 15.0000000000000000             |
+------+--------------+----------------------------+-------------------+-------------------+----------------+---------------+------------------+-------------------+---------------+---------------------------------+
SELECT 4

As you can see, same result.

Hope this helps Andy!

Andy Zhao (he/him) • Apr 19 '19 • Edited

Wow Rhymes, thanks for such an excellent answer!!! The query plans really help to see how they are performing. Interesting that FILTER allows you to use aggregates much faster than a subquery. Definitely have to read a book or two about SQL now!

In case you're wondering, getting 119 rows took about 4 minutes to run the GROUP BY query. We're not looking to put in this app by any means, so the time is totally fine. Definitely helps me think about how to structure AnalyticsService.rb queries though.

rhymes • Apr 19 '19

Glad it helped!

Interesting that FILTER allows you to use aggregates much faster than a subquery.

I think it's because the DB is able to do it with less work by using multiple conditions on the same set of data (the joined table on the right of the join, reactions in this example), whereas with a subquery, it has to execute a different subquery for each row in the articles table. Not much to optimize there.

In case you're wondering, getting 119 rows took about 4 minutes to run the GROUP BY query.

Mmm not great though :D how long was it taking with the subqueries? Anyhow, it's something. I have a feeling it can be further optimized by checking if and which indexes are being used and the other idea I had (untested and unexplored) was to use window functions but I haven't tried, nor I know if it can be done or if it's faster :D

We're not looking to put in this app by any means, so the time is totally fine

Good :D

Definitely helps me think about how to structure AnalyticsService.rb queries though

Let me know if you find a different approach for that. After all you have the great advantage of having a working version that you can use as a comparison. You extract a set of data and then play with grouping until you get the same result. :D

Francois LAGIER • Apr 19 '19 • Edited

Obviously, I'm not 100% sure since I don't access to the table but I will try something like this:

  SELECT 
        articles.id,
        articles.title,
        articles.published_at,
        articles.positive_reactions_count AS total_reactions,
        articles.cached_tag_list,
        collections.slug AS series_title,
        SUM(CASE WHEN reactions.category = 'like' THEN 1 ELSE 0 END) AS total_likes,
        SUM(CASE WHEN reactions.category = 'unicorn' THEN 1 ELSE 0 END) AS total_unicorns,
        SUM(CASE WHEN reactions.category = 'readinglist' THEN 1 ELSE 0 END) AS total_bookmarks,
        SUM(page_views.counts_for_number_of_views) from total_views,
        AVG(page_views.time_tracked_in_seconds) from total_views
  FROM articles
  FULL JOIN collections ON collections.id = articles.collection_id
  INNER JOIN reactions ON reactions.reactable_id = articles.id AND reactions.reactable_type = 'Article'
  INNER JOIN page_views ON page_views.article_id = articles.id
  WHERE articles.user_id = my_id
  AND articles.published = TRUE
  GROUP BY 1,2,3,4,5,6

This query should work if the reactions is a snapshot and not just a log.

I always love playing with weird SQL so happy to help. Feel free to reach out at francois@cloudforecast.io

Sebastian • Apr 19 '19

Do the aggregation functions work in that query? Looks like you could add a group by at the end, like:
"
group by
articles.id,
articles.title,
articles.published_at,
articles.positive_reactions_count,
articles.cached_tag_list,
collections.slug
"

Francois LAGIER • Apr 19 '19

Of course not ... I forgot the group by. Thanks for pointing that.

Andrew (he/him) • Apr 18 '19

Have y'all had any discussions about moving to NoSQL? Or does DEV not gather enough user metadata / have enough users to make that a sensible option?

rhymes • Apr 19 '19

Changing DB to fix a query seems a little bit overkill :D

Andrew (he/him) • Apr 19 '19

Lol I should have mentioned that this was sort of unrelated to the original post.

Andy Zhao (he/him) • Apr 19 '19

I'm not sure what the use cases are for NoSQL, and because I don't know that I wouldn't know why we would use a NoSQL database. Rails 6 will support multiple databases, so that might be something we could explore.

Andy Zhao (he/him) • Apr 18 '19

uhh apparently this is valid SQL and the query is running, so I'll report back in a bit. 🙈

View full discussion (14 comments)