DEV Community

Kamban
Kamban

Posted on • Originally published at kambanthemaker.com on

Redshift slow WHERE IN performance

Redshift is an amazing service when you need to run analytics on your data. Redshift uses columnar data storage, which means, it is very good when you run aggregation queries for reporting purposes.

Considering anaytical reports are generally genarated on a daily or weekly basis, Redshift uses query compilation to optimize query timing. This could impact the some dynamic queries (that you need to run very often with dynamic set of conditions). Say, you want to run a query with big set of static WHERE IN conditions, at this time, query compilation will slow down the query execution time even though the same query will not be run again.

Example:

SELECT SUM(score) from results WHERE student_id IN (LONG_LIST_OF_IDS)

You can find the compilation state of any query using the following query,

select userid, xid, pid, query, segment, locus, datediff(ms, starttime, endtime) as duration, compilefrom svl_compilewhere query = QUERY_IDorder by query, segment;

If you see compile=1, it means this query's some part of execution is being compiled for future use. This internal feature may not be prevented. As an workaround, you can use temporary table and do inner join.

CREATE TEMPORARY TABLE temp_student_ids (id int); INSERT INTO temp_student_ids values ((ID_1),)((ID_2)))SELECT SUM(SCORE) from results INNER JOIN temp_student_ids ON (student_id=id);

Top comments (0)