DEV Community

[Comment from a deleted post]
Collapse
 
helenanders26 profile image
Helen Anderson

Hey rhymes apologies for the late reply.

re: NOT EXISTS v NOT IN

I came across this while reviewing others code and refactoring it for a migration we did recently. It's not something I had come across until moving to a Postgres DB.

I like the explanation from the PostgresWiki copy and pasted here:

Don't use NOT IN

Don't use NOT IN, or any combination of NOT and IN such as NOT (x IN (select…)).

(If you think you wanted NOT IN (select …) then you should rewrite to use NOT EXISTS instead.)

Why not?

Two reasons:

1 - NOT IN behaves in unexpected ways if there is a null present:

select * from foo where col not in (1,null); -- always returns 0 rows
select * from foo where col not in (select x from bar);
  -- returns 0 rows if any value of bar.x is null

This happens because col IN (1,null) returns TRUE if col=1, and NULL otherwise (i.e. it can never return FALSE). Since NOT (TRUE) is FALSE, but NOT (NULL) is still NULL, there is no way that NOT (col IN (1,null)) (which is the same thing as col NOT IN (1,null)) can return TRUE under any circumstances.

2 - Because of point 1 above, NOT IN (SELECT ...) does not optimize very well. In particular, the planner can't transform it into an anti-join, and so it becomes either a hashed Subplan or a plain Subplan. The hashed subplan is fast, but the planner only allows that plan for small result sets; the plain subplan is horrifically slow (in fact O(N²)). This means that the performance can look good in small-scale tests but then slow down by 5 or more orders of magnitude once a size threshold is crossed; you do not want this to happen.

When should you?

NOT IN (list,of,values,...) is mostly safe unless you might have a null in the list (via a parameter or otherwise). So it's sometimes natural and even advisable to use it when excluding specific constant values from a query result.

 
rhymes profile image
rhymes

Thank you Helen! I really didn't know about this shortcoming. The failure to optimize makes sense given the issues with NULL. I'll definitely keep this in mind :)

 
helenanders26 profile image
Helen Anderson

The quirks of moving from database to database. I'm sure there will be more of these that cause queries to grind to a halt. :D