DEV Community

Discussion on: Cascade of doom: JIT, and how a Postgres update led to 70% failure on a critical national service

Collapse
 
greenreaper profile image
Laurence 'GreenReaper' Parry

Yeah, I can appreciate the frustration there. It's probably frustrating for the developers, too!

My issue was not materialized views, but WITH x AS MATERIALISED, which is required to force PG 12+ to use non-recursive CTEs without side-effects as an optimization fence.

Half a decade ago we had an issue with mailing list code that generated a list of recommended users who had recently posted work that was liked by users that the person being mailed had watched, who they did not themselves watch (if they did, they would've seen the work already).

Many users did not actually watch anyone and so the check was quick, but it fell apart when the planner decided to compute the list of liked work (which could be expensive) before it considered whether there were any watched users who might have made those likes.

I hoisted watched users into a CTE, forcing it to quickly find out whether it should not execute the rest (apologies for 'legacy' code):

        $tmp_user_id = $this->sanitize($user_id);
        $tmp_count = $this->sanitize($count);
        $tmp_date = $this->sanitize($date);

-       $arr = $this->runQuery("select u.username from watches w, submission_favorites sf, submissions s, user_statistics t, users u where w.user_id = '$tmp_user_id' and w.watched_user_id = sf.user_id and sf.create_datetime > '$tmp_date' and sf.submission_id = s.submission_id and s.deleted != 't' and s.public = 't' {$this->acl_filter} and s.user_id not in (select watched_user_id from watches where user_id = '".$tmp_user_id."') and s.user_id != '$tmp_user_id' and s.user_id = u.user_id and u.user_id = t.user_id group by t.submission_views_count, u.username order by t.submission_views_count desc, u.username asc limit '$tmp_count';");
+       #The with clause here is necessary to avoid an unduly expensive query plan
+       $arr = $this->runQuery("with watched_users as (select watched_user_id from watches where user_id = '$tmp_user_id') select u.username from watches w, submission_favorites sf, submissions s, user_statistics t, users u where w.user_id = '$tmp_user_id' and w.watched_user_id = sf.user_id and sf.create_datetime > '$tmp_date' and sf.submission_id = s.submission_id and s.deleted != 't' and s.public = 't' {$this->acl_filter} and s.user_id not in (select * from watched_users) and s.user_id != '$tmp_user_id' and s.user_id = u.user_id and u.user_id = t.user_id group by t.submission_views_count, u.username order by t.submission_views_count desc, u.username asc limit '$tmp_count';");

        return $arr;
    }
Enter fullscreen mode Exit fullscreen mode

This fell apart again when we switched to PostgreSQL 12 - though we didn't notice until the weekly mailer, when things got noticeably slower for a long time. Thankfully it was limited to one CPU core, but that's on a quad-core which normally hit an average of three at peak - and it wasn't the only place we'd used that pattern, just one of the most expensive.

Having said that, usually version upgrades being benefits. But they're definitely a two-edged sword - I'm still not sure about upgrading to 14 because it means we'll lose SSL compression (used to avoid overages due to replication transfers), or have to hack OpenSSL more to ignore it being explicitly disabled. A replacement feature is still stuck in the conmitfest process. Perhaps we'll stick with 13 for now - we have another four years to think about migration.

Thread Thread
 
xenatisch profile image
Pouria Hadjibagheri

Gotcha. Don't get me wrong though, PG14 is great, and it's faster... just make sure you turn JIT off if you compile with LLVM. :))

Thanks again for sharing your thoughts. Great to hear about your experience. Some of these edge cases are quite rare, and it's great to have them documented in a consolidated format - even as a comment.

I guess the reason why they are edge cases is - at least to some extent - because nowadays most devs rely almost exclusively on ORMs to deal with databases, which means they never really get to know or learn about the true power of an RDBMS.

Once again, thanks Laurence.