DEV Community

Cover image for The R.A.G (Redshift Analyst Guide): Understanding the Query Plan (Explain)

The R.A.G (Redshift Analyst Guide): Understanding the Query Plan (Explain)

Alex Antra on November 20, 2019

Welcome to the R.A.G, a guide about Amazon's Redshift Database written for the Analyst's out there in the world who use it. Previously...
Collapse
 
davedecahedron profile image
David Howell

"The inner table is scanned first, and appears nearer the bottom of the query plan." Gold information, thanks. The only place I could find in the docs that vaguely hints that the lower-down query is the inner is here docs.amazonaws.cn/en_us/redshift/l...

In terms of the join operations, nested-loop, hash, merge, it's not so much that nested-loop is always bad. Sure the algorithm is quadratic, but the complexity of finding the optimal query plan is (probably) NP hard. There's a threshold where creating a plan can take too long, perhaps longer than just running a sub-optimal plan. If the estimated rowcount ( statistics are king! run vacuum & analyse on your tables often!) of a table is small, and/or too much time has been spent exploring the space of possible plans it's better for it to just get on with it and do the nested-loop.

Also, it's not good to just assume merge joins are the best always. It totally depends, a lot of the time a hash join is just fine. For example compare joining 2 massive tables with a merge join, on Redshift, with colocated slices because the distkeys and sortkeys align perfectly. Not compare that with filtering both tables a lot and such that the results of the sort are not guaranteed to be sorted. In that case to achieve the merge-sort implies having to do a sort operation, which is one of the most costly operations you can do ( apart from network bcast AKA shuffling between nodes).

If you have a star schema data warehouse, a hash of a small(ish) dimension table joined to a large fact table is going to perform just fine. Once you join a second dimension, it's going to be on a different field and so the happy rainbow unicorn land of merge-joins goes out the window.

I'm just saying don't get fixated on merge joins being ideal, it's just sometimes true.

Collapse
 
alexantra profile image
Alex Antra

Great points!!!