re: SQL 201: 10 Ways to Tweak Slow Running Queries VIEW POST

re: Great article. Definitely didn't know about the performance difference between NOT EXISTS and NOT IN. Could you elaborate on this? Thank you!

Yes! Absolutely, I'll write up some real-life examples and come back to you :)


Hey rhymes apologies for the late reply.


I came across this while reviewing others code and refactoring it for a migration we did recently. It's not something I had come across until moving to a Postgres DB.

I like the explanation from the PostgresWiki copy and pasted here:

Don't use NOT IN

Don't use NOT IN, or any combination of NOT and IN such as NOT (x IN (select…)).

(If you think you wanted NOT IN (select …) then you should rewrite to use NOT EXISTS instead.)

Why not?

Two reasons:

1 - NOT IN behaves in unexpected ways if there is a null present:

select * from foo where col not in (1,null); -- always returns 0 rows
select * from foo where col not in (select x from bar);
  -- returns 0 rows if any value of bar.x is null

This happens because col IN (1,null) returns TRUE if col=1, and NULL otherwise (i.e. it can never return FALSE). Since NOT (TRUE) is FALSE, but NOT (NULL) is still NULL, there is no way that NOT (col IN (1,null)) (which is the same thing as col NOT IN (1,null)) can return TRUE under any circumstances.

2 - Because of point 1 above, NOT IN (SELECT ...) does not optimize very well. In particular, the planner can't transform it into an anti-join, and so it becomes either a hashed Subplan or a plain Subplan. The hashed subplan is fast, but the planner only allows that plan for small result sets; the plain subplan is horrifically slow (in fact O(N²)). This means that the performance can look good in small-scale tests but then slow down by 5 or more orders of magnitude once a size threshold is crossed; you do not want this to happen.

When should you?

NOT IN (list,of,values,...) is mostly safe unless you might have a null in the list (via a parameter or otherwise). So it's sometimes natural and even advisable to use it when excluding specific constant values from a query result.

Thank you Helen! I really didn't know about this shortcoming. The failure to optimize makes sense given the issues with NULL. I'll definitely keep this in mind :)

The quirks of moving from database to database. I'm sure there will be more of these that cause queries to grind to a halt. :D

code of conduct - report abuse